fix(browse): per-process state-file temp path to fix concurrent-write ENOENT

The daemon writes `.gstack/browse.json` via the standard atomic-rename
pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four
sites in server.ts use this pattern (initial daemon-startup state at
:2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel
update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all
four hard-code the same temp filename `${stateFile}.tmp`.

Under concurrent writers the shared filename races on the rename:

    t0  Writer A: writeFileSync(stateFile + '.tmp', payloadA)
    t1  Writer B: writeFileSync(stateFile + '.tmp', payloadB)   // overwrites A
    t2  Writer A: renameSync(stateFile + '.tmp', stateFile)    // moves B's payload
    t3  Writer B: renameSync(stateFile + '.tmp', stateFile)    // ENOENT — file gone

Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`:

    [browse] Failed to start: ENOENT: no such file or directory,
    rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'

Pre-fix success rate: **0 / 15** under cold-start race.
Post-fix success rate: **15 / 15**, zero ENOENT.

Fix:
  - New `tmpStatePath()` helper (server.ts:333) returns
    `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}`
  - All 4 call sites use `tmpStatePath()` instead of the shared literal
  - Atomic rename still gives last-writer-wins semantics on the final
    state.json content; only behavior change is that concurrent writers
    no longer kill each other on the rename step

Source-level guard test (browse/test/server-tmp-state-path.test.ts)
locks two invariants: (1) no remaining `stateFile + '.tmp'` literals,
(2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same
read-source-as-text pattern as terminal-agent.test.ts and
dual-listener.test.ts — no daemon required, runs in tier-1 free.

Test plan:
  - [x] Targeted source-level guard test passes (3 / 0)
  - [x] `bun run build` clean
  - [x] Live regression: 15 concurrent CLIs against cold state →
        15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix)
  - [x] No `.tmp.*` orphans left behind after rename succeeds
  - [x] Related test cluster (server-auth, dual-listener, cdp-mutex,
        findport) — same pre-existing flakes as `main`, no new
        regressions introduced

🤖 Generated with [Claude Code](https://claude.com/claude-code)
This commit is contained in:
Yashwant Kotipalli 2026-05-03 17:37:24 -07:00
parent bf65487162
commit a1eb6c37a1
2 changed files with 135 additions and 4 deletions

View File

@ -313,6 +313,27 @@ const CONSOLE_LOG_PATH = config.consoleLog;
const NETWORK_LOG_PATH = config.networkLog;
const DIALOG_LOG_PATH = config.dialogLog;
/**
* Per-process state-file temp path. The state-file write pattern is
* `writeFileSync(tmp, ...) → renameSync(tmp, stateFile)` for atomicity,
* but a shared `${stateFile}.tmp` filename means two concurrent writers
* (cold-start race when N CLIs hit a fresh repo simultaneously, parallel
* /tunnel/start handlers, or a combination) collide on the rename: the
* first writer's renameSync moves the shared temp file out of the way,
* the second writer's writeFileSync re-creates it, the second rename
* then races with the first writer's already-renamed state. Worst case
* the second renameSync throws ENOENT mid-air, killing one of the
* spawning daemons during startup.
*
* Per-process suffix (pid + 4 random bytes) makes each writer's temp
* path unique. The atomic rename still gives last-writer-wins semantics
* for the final state.json content; the only behavior change is that
* concurrent writers no longer kill each other on the rename.
*/
function tmpStatePath(): string {
return `${config.stateFile}.tmp.${process.pid}.${crypto.randomBytes(4).toString('hex')}`;
}
// ─── Sidebar agent / chat state ripped ──────────────────────────────
// ChatEntry, SidebarSession, TabAgentState interfaces; chatBuffer,
@ -1476,7 +1497,7 @@ async function start() {
// Update state file
const stateContent = JSON.parse(fs.readFileSync(config.stateFile, 'utf-8'));
stateContent.tunnel = { url: tunnelUrl, domain: domain || null, startedAt: new Date().toISOString() };
const tmpState = config.stateFile + '.tmp';
const tmpState = tmpStatePath();
fs.writeFileSync(tmpState, JSON.stringify(stateContent, null, 2), { mode: 0o600 });
fs.renameSync(tmpState, config.stateFile);
@ -1999,7 +2020,7 @@ async function start() {
binaryVersion: readVersionHash() || undefined,
mode: browserManager.getConnectionMode(),
};
const tmpFile = config.stateFile + '.tmp';
const tmpFile = tmpStatePath();
fs.writeFileSync(tmpFile, JSON.stringify(state, null, 2), { mode: 0o600 });
fs.renameSync(tmpFile, config.stateFile);
@ -2080,7 +2101,7 @@ async function start() {
// Update state file with tunnel URL
const stateContent = JSON.parse(fs.readFileSync(config.stateFile, 'utf-8'));
stateContent.tunnel = { url: tunnelUrl, domain: domain || null, startedAt: new Date().toISOString() };
const tmpState = config.stateFile + '.tmp';
const tmpState = tmpStatePath();
fs.writeFileSync(tmpState, JSON.stringify(stateContent, null, 2), { mode: 0o600 });
fs.renameSync(tmpState, config.stateFile);
} catch (err: any) {
@ -2110,7 +2131,7 @@ async function start() {
console.log(`[browse] Tunnel listener bound (local-only test mode) on 127.0.0.1:${tunnelPort}`);
const stateContent = JSON.parse(fs.readFileSync(config.stateFile, 'utf-8'));
stateContent.tunnelLocalPort = tunnelPort;
const tmpState = config.stateFile + '.tmp';
const tmpState = tmpStatePath();
fs.writeFileSync(tmpState, JSON.stringify(stateContent, null, 2), { mode: 0o600 });
fs.renameSync(tmpState, config.stateFile);
} catch (err: any) {

View File

@ -0,0 +1,110 @@
/**
* Regression: state-file temp path uniqueness.
*
* The daemon writes `.gstack/browse.json` via the standard atomic-rename
* pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. The
* pattern is correct for a single writer. It breaks for *concurrent*
* writers when they share a single temp filename:
*
* t0 Writer A: writeFileSync(stateFile + '.tmp', payloadA)
* t1 Writer B: writeFileSync(stateFile + '.tmp', payloadB) // overwrites A
* t2 Writer A: renameSync(stateFile + '.tmp', stateFile) // moves B's payload
* t3 Writer B: renameSync(stateFile + '.tmp', stateFile) // ENOENT — file gone
*
* A 15-CLI cold-start race against a fresh repo reproduces this in the
* wild one of the spawned daemons dies with:
*
* [browse] Failed to start: ENOENT: no such file or directory,
* rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'
*
* Fix: per-process temp path via `tmpStatePath()` (pid + 4 random bytes
* of suffix). Each concurrent writer gets a unique path; the atomic
* rename still gives last-writer-wins semantics on the final state file
* content, but writers no longer kill each other on the rename step.
*
* This source-level guard locks two invariants:
* 1. No remaining `stateFile + '.tmp'` literals in server.ts (regression
* catch a future copy-paste or revert would re-introduce the bug)
* 2. The 4 known state-write call sites all use `tmpStatePath()`
* (positive coverage)
*
* Same pattern as terminal-agent.test.ts and dual-listener.test.ts:
* read source as text, assert invariant, no daemon required.
*/
import { describe, test, expect } from 'bun:test';
import { readFileSync } from 'fs';
import * as path from 'path';
const SERVER_TS = readFileSync(
path.resolve(import.meta.dir, '../src/server.ts'),
'utf-8',
);
describe('server.ts — state-file temp-path uniqueness', () => {
test('no remaining `stateFile + \'.tmp\'` literals (regression catch)', () => {
// The shared-temp-filename pattern that caused the cold-start ENOENT
// race. A future contributor that copy-pastes the old pattern (or a
// revert) will fail this test.
const sharedTempLiterals = [
...SERVER_TS.matchAll(/stateFile\s*\+\s*['"`]\.tmp['"`]/g),
];
expect(
sharedTempLiterals.length,
`Found ${sharedTempLiterals.length} reference(s) to the shared ` +
`\`stateFile + '.tmp'\` pattern. Use \`tmpStatePath()\` instead — ` +
`the shared pattern races on rename when two daemons spawn ` +
`concurrently (cold-start race + parallel /tunnel/start).`,
).toBe(0);
});
test('every state-file writeFileSync call uses tmpStatePath()', () => {
// Find every `writeFileSync(X, JSON.stringify(stateContent...` or
// `…(state, …)` call and verify X is `tmpStatePath()` or a variable
// assigned from `tmpStatePath()`.
const writeCalls = [
...SERVER_TS.matchAll(
/fs\.writeFileSync\s*\(\s*(\w+)\s*,\s*JSON\.stringify\(\s*(state|stateContent)/g,
),
];
expect(
writeCalls.length,
'expected at least one state-file write site',
).toBeGreaterThan(0);
for (const m of writeCalls) {
const varName = m[1]!;
// Walk back to the assignment of varName — must come from tmpStatePath()
const assignRe = new RegExp(
`(?:const|let)\\s+${varName}\\s*=\\s*tmpStatePath\\(\\)`,
);
expect(
assignRe.test(SERVER_TS),
`state-file writeFileSync uses \`${varName}\` but no \`const ${varName} = tmpStatePath()\` ` +
`assignment was found upstream. Either assign from tmpStatePath() ` +
`or pass tmpStatePath() inline — the shared \`stateFile + '.tmp'\` ` +
`pattern races under concurrent daemon startup`,
).toBe(true);
}
});
test('tmpStatePath() declaration includes a per-process unique suffix', () => {
// Lock the suffix shape so a future contributor doesn't accidentally
// strip the uniqueness back out by simplifying the helper.
const declMatch = SERVER_TS.match(
/function tmpStatePath\(\)[^{]*\{([\s\S]*?)\n\}/,
);
expect(declMatch, 'tmpStatePath() declaration not found').not.toBeNull();
const body = declMatch![1]!;
// Must reference both process.pid and crypto.randomBytes — two
// independent sources of uniqueness.
expect(body, 'tmpStatePath() must include process.pid in the suffix').toContain(
'process.pid',
);
expect(
body,
'tmpStatePath() must include a random suffix via crypto.randomBytes',
).toContain('crypto.randomBytes');
});
});