Commit Graph

2 Commits

Author SHA1 Message Date
Garry Tan b315ccb0d4
feat(terminal-agent): scrollback ring buffer + detach state machine + re-attach
The agent side of Commit 3 — the "magic" feature. A network blip (wifi
hiccup, MV3 panel suspend, brief Chromium pause) now silently reconnects
the sidebar to the SAME claude session with scrollback intact. No more
"Session ended" message + manual Restart click + losing your tool-call
output. Server-side /pty-session/reattach (25ef24e9) and the extension
re-attach loop (next commit) close the loop end-to-end.

Ring buffer (T10):
  * Per-session frames: Buffer[] capped at 1 MB (env-overridable via
    GSTACK_PTY_RING_BUFFER_BYTES). Each PTY write is one frame, so
    eviction is at frame boundaries and never cuts a UTF-8 sequence or
    ANSI CSI in half.
  * appendToRingBuffer eviction loop keeps at least one frame even at
    extreme caps — a single oversized frame can't empty the buffer.
  * Alt-screen tracking via canonical xterm CSI ?1049h / CSI ?1049l
    sequences. lastIndexOf comparison so trailing state wins when both
    appear in one render frame (quick tool-call open+close).

Replay payload (T5 — codex outside-voice):
  * buildReplayPayload prefixes DECSTR soft reset (\x1b[!p) and
    conditionally re-enters alt-screen if claude was in a tool call at
    detach. The client writes RIS (\x1bc) FIRST to clear pre-blip xterm
    content; the server's prelude resets character attributes; the ring
    buffer replays cleanly on top.
  * Order is enforced by the {type:"reattach-begin"} text frame the
    agent sends right before the binary replay — client waits for it,
    writes RIS, then treats the next binary frame as the replay payload.

Detach state machine (T9):
  * PtySession.liveWs decouples the PTY callback from the original ws
    closure. On re-attach, swapping session.liveWs is enough — the
    on-data callback writes to the new ws automatically.
  * close(ws, code, _reason): codes 4001 (intentional restart), 4404
    (no-claude), and 1000 (clean exit) trigger immediate dispose.
    Anything else (1006 abnormal, 1001 going-away from network blip /
    panel suspend) starts a 60s detach timer instead. claude keeps
    running, output keeps accumulating in the ring buffer.
  * Detach timer is unref'd so the bun process can still exit cleanly
    on natural shutdown.
  * Sessions without a sessionId (legacy single-shot grants) can't
    re-attach by definition — those fall through to immediate dispose.

Re-attach lookup (T9):
  * WS open() checks sessionsById[sessionId] FIRST. If a detached
    session is sitting there, cancel its detach timer, swap liveWs,
    rebind the WS-keyed map, restart keepalive, send reattach-begin
    + replay payload. The PTY process is unchanged.
  * /internal/restart now cancels any pending detach timer before
    disposal — otherwise the timer would later try to dispose an
    already-disposed session.

Env knobs for e2e:
  * GSTACK_PTY_RING_BUFFER_BYTES — compress to 256 for eviction tests.
  * GSTACK_PTY_DETACH_WINDOW_MS — compress to 1000 for "did the timer
    fire?" tests without waiting a minute per assertion.

Tests:
  * browse/test/terminal-agent-detach-reattach.test.ts — 10 static-grep
    tripwires for the load-bearing properties: interface shape, env
    knobs, eviction floor, alt-screen tracking, replay prelude
    composition, re-attach lookup, close-code routing, detach timer
    unref, /internal/restart timer cancellation, on-data through
    session.liveWs.
  * browse/test/terminal-agent-session-routing.test.ts test 7 widened
    to match the new close(ws, code, _reason) signature.
  * browse/test/terminal-agent-keepalive.test.ts test 3 widened
    similarly. Both stay regressions for the prior contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:21:22 -07:00
Garry Tan d8751e91df
feat(terminal-agent): 25s WS keepalive ping/pong + client keepalive frames
PTY connections were dying silently after NAT idle timeouts (30-60s on most
home routers, even shorter on some carrier-grade NAT) and Chrome MV3 panel
suspension. Neither side noticed until the user's next keystroke produced
no output. Both sides now drive a 25s keepalive cycle.

Server side (browse/src/terminal-agent.ts):
  * New ws.open handler constructs the PtySession eagerly and starts a
    setInterval that sends `{type:"ping",ts:Date.now()}` every 25s.
    Interval handle stored on session.pingInterval so close() can clear it.
  * PtySession.pingInterval field added; cleared in ws.close before
    disposeSession runs. Prevents timer leak across reconnects.
  * Message handler accepts `{type:"ping"|"pong"|"keepalive"}` silently —
    keepalive frames are a liveness signal at the TCP layer, no state to
    update. Existing resize/tabSwitch/tabState handling unchanged.
  * GSTACK_PTY_KEEPALIVE_INTERVAL_MS env knob (default 25000) lets the
    upcoming e2e tests compress idle assertions without 30s waits.

Client side (extension/sidepanel-terminal.js):
  * Belt-and-suspenders: client also runs a 25s setInterval that sends
    `{type:"keepalive"}`. Defends against Chrome pausing our timers if
    the server-side ping ever gets dropped (rare but possible in MV3).
  * Ping reply: on `{type:"ping",ts}` from the server, immediately send
    `{type:"pong",ts}`. Lets the agent observe round-trip latency for
    free and confirms the channel is bidirectional.
  * Interval cleared in three teardown paths: ws.close handler,
    teardown(), forceRestart(). Three paths exist because the sidebar
    can exit the LIVE state through any of them; all three must clean up
    or we leak timers across reconnects.

Test (browse/test/terminal-agent-keepalive.test.ts):
  * Static-grep tripwires for the 7-point protocol contract: agent has
    a configurable interval, open() starts the ping, close() clears it,
    message handler accepts keepalive vocabulary, client sends keepalive
    + replies pong, and all three client teardown paths clear the timer.
  * Wire-level tests (actually observe a ping after 25s) belong in the
    e2e tier — adding them here would either flake on slow CI or require
    a real Bun.serve listener per test which we don't want to pay for
    in the free tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:09:23 -07:00