mirror of https://github.com/garrytan/gstack.git
Merge origin/main into garrytan/trunk-land-skill
Reconcile VERSION (1.56.0.0 stays above main's 1.55.0.0), package.json, and CHANGELOG (1.56.0.0 entry on top of main's 1.54/1.55 entries). Regenerated all host SKILL.md against main's resolver changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
commit
fa7ced73a8
245
CHANGELOG.md
245
CHANGELOG.md
|
|
@ -40,6 +40,95 @@ If you only want to merge, run `/land` and stop. If you want merge plus deploy p
|
|||
#### For contributors
|
||||
- `lib/merge.ts` holds the pure regime logic (detection precedence, submit planning, landing classification, handoff schema + validation); `test/gstack-merge.test.ts` (30) and `test/gstack-merge-cli.test.ts` (11) pin it. A generated-doc scrub test fails CI if `/land`'s SKILL.md ever grows deploy/canary machinery. The merge SHA → revert handoff and the never-blind-retry invariant (cli/cli#3442, cli/cli#13380) moved into `/land` with their tests.
|
||||
|
||||
## [1.55.0.0] - 2026-05-30
|
||||
|
||||
## **`/sync-gbrain` can no longer be the trigger that lets gbrain delete your repo. The headed browser stops crash-looping, and gbrain installs the current release instead of a pin 23 versions stale.**
|
||||
|
||||
gbrain can rm-rf a working tree when its autopilot daemon reclones mid-cycle. `/sync-gbrain` used to call gbrain's `sources remove` and `sync --strategy code` as if they were safe, so it could be the thing that set that race off. Now every destructive gbrain call sits behind feature-detected guards: the orchestrator refuses to run while autopilot is active, refuses to remove a user-managed source it can't storage-protect (it fails closed), canonicalizes paths with realpath so a symlink can't smuggle a delete outside gbrain's own clones, and requires an explicit `--allow-reclone` before a URL-managed source's code walk. Shipped in the same wave: the headed browser's self-inflicted crash-loop is gone, big-brain memory ingests stop getting killed at a fixed 30 minutes, and the gbrain installer moves off its frozen v0.18.2 pin onto the latest release behind a version floor and a `doctor` self-test.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
From the shipped diff and its regression suites (`bun test test/gbrain-*.test.ts browse/test/restart-env.test.ts test/memory-ingest-timeout.test.ts`):
|
||||
|
||||
| Metric | Before | After | Δ |
|
||||
|--------|--------|-------|---|
|
||||
| Destructive gbrain ops behind guards | 0 | 4 | +4 |
|
||||
| gbrain / brain-sync spawns that work on Windows | 0/8 | 8/8 | +8 |
|
||||
| gbrain version installed | v0.18.2 (pinned, ~23 behind) | latest + min-version floor + doctor gate | — |
|
||||
| Memory-ingest timeout | hardcoded 30 min | configurable, checkpoint preserved on timeout | — |
|
||||
| Generated SKILL.md that parse under strict YAML | partial (colons broke Codex) | all (quoted) | — |
|
||||
|
||||
The guard that matters most: a `sources remove` on a source whose files live outside `~/.gbrain/clones/` and can't be storage-protected now refuses instead of proceeding. The path that ate a repo no longer runs unattended.
|
||||
|
||||
### What this means for you
|
||||
|
||||
If you use `/sync-gbrain`, you are protected from the data-loss race even before gbrain ships its own root fix. "Don't run `/sync-gbrain` while `gbrain autopilot` is active" is now enforced, not just advised, and nothing gets deleted that can't be proven safe. Headed-browser QA against beacon-heavy pages (analytics, live extensions) no longer crash-loops, leaks Chromium, or silently drops to an invisible headless window. New gbrain installs track the current release. Codex and OpenAI can load every gstack skill again.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Added
|
||||
- `/sync-gbrain` destructive-op guards (`lib/gbrain-guards.ts`): multi-signal autopilot detection, fail-closed `sources remove`, realpath `remote_url` pre-flight audit, and a `--allow-reclone` gate before URL-managed code walks.
|
||||
- Install-time gbrain gate (`bin/gstack-gbrain-install`): a minimum-version floor and a `gbrain doctor --fast` self-test, both hard-fail with remediation.
|
||||
- `GSTACK_INGEST_TIMEOUT_MS` to configure the memory-ingest timeout; on timeout the gbrain checkpoint is preserved so the next run resumes.
|
||||
|
||||
#### Changed
|
||||
- gbrain installs at the latest default-branch HEAD by default; pin a commit with `gstack-gbrain-install --pinned-commit <sha>` for reproducibility.
|
||||
- Generated SKILL.md descriptions with interior colons are now quoted, so strict YAML loaders (Codex/OpenAI) parse them.
|
||||
- `/sync-gbrain` guidance: do not run during autopilot; prefer `gbrain sources add --path` over URL-managed sources.
|
||||
|
||||
#### Fixed
|
||||
- `/sync-gbrain` no longer races gbrain's autopilot into a destructive reclone or remove (#1734). Report by @mvanhorn.
|
||||
- `gstack-jsonl-merge` resolves equal-timestamp entries deterministically across machines, so append-only logs converge instead of re-conflicting forever (#1769). Contributed by @jbetala7.
|
||||
- Generated SKILL.md frontmatter parses under strict YAML loaders (#1778). Reported by @GilbertzzzZZ, @genisis0x, @cathrynlavery, and @sator-imaging.
|
||||
- The headed browser daemon no longer crash-loops under load, leaks Chromium processes, or silently downgrades a headed session to headless (#1781).
|
||||
- `/sync-gbrain --full` memory ingests on large brains are no longer killed at a fixed 30-minute timeout (#1611).
|
||||
- The gbrain CLI and `gstack-brain-sync` spawn correctly on Windows (#1731).
|
||||
|
||||
#### For contributors
|
||||
- `lib/gbrain-guards.ts` with hermetic tests for every guard branch (autopilot signals, fail-closed remove, reclone gate, realpath containment).
|
||||
- `parseSourcesList` centralizes `gbrain sources list --json` shape handling across all readers (#1576, whose crash was already fixed in v1.42.0.0 — this removes the last divergent reader).
|
||||
- Static-grep tripwire (`test/gbrain-spawn-windows-shell.test.ts`) fails CI if a gbrain spawn drops the Windows shell flag.
|
||||
- gbrain-side requirements for the root fixes (ungated reclone, `--keep-storage`, a cooperative remove-lease, a capability command, true ingest-resume, integration CI) are tracked for the gbrain repo.
|
||||
|
||||
## [1.54.0.0] - 2026-05-30
|
||||
|
||||
## **The heaviest skill stopped taxing every session. /ship's always-loaded cost dropped 59%, and its prose now loads only when a step needs it.**
|
||||
|
||||
`/ship` was a 167KB wall that every session paid for in full, whether you were bumping a version or writing a changelog or none of it. It is now a 69KB decision-tree skeleton plus eight `sections/*.md` files the agent opens on demand. The eight steps that are long prose (the test run, coverage audit, plan-completion, the review army, Greptile triage, the adversarial pass, the changelog, the PR body) moved into sections behind STOP-Read pointers, so a run only reads the chapters its situation calls for. The version-bump logic that used to be ~90 lines of inline bash, the single worst re-bump footgun in the workflow, is now the tested `gstack-version-bump` CLI (classify / write / repair). Other hosts (codex, factory, kiro, opencode) keep the full inline skill unchanged, so nothing regresses off Claude. This release dogfooded itself: the version you are reading was bumped by `gstack-version-bump`.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
Measured directly from the generated skill (`wc -c ship/SKILL.md`) and the new section files, regenerated for all hosts:
|
||||
|
||||
| Metric | Before (v1.53) | After (v1.54) | Δ |
|
||||
|--------|----------------|---------------|---|
|
||||
| ship always-loaded | 167 KB (~41.8K tokens) | 69 KB (~17.2K tokens) | -59% |
|
||||
| ship prose loaded per run | all of it | only applicable sections | on-demand |
|
||||
| ship version logic | ~90 lines inline bash | tested CLI, 15 unit tests | extracted |
|
||||
| External-host ship | 167 KB inline | 162 KB inline (unchanged behavior) | no regression |
|
||||
|
||||
The skeleton is what loads the instant `/ship` is invoked, so the ~24.6K-token drop is paid back on every single ship, not just once.
|
||||
|
||||
### What this means for you
|
||||
|
||||
A `/ship` run starts ~3x lighter and pulls in each heavy step's instructions only when it reaches that step, so the agent spends less of its window holding prose it is not using yet. You will not notice any behavior change. The workflow is identical step for step; the difference is what is in context when. If you ever want to read a step in isolation, the chapters live at `~/.claude/skills/gstack/ship/sections/`.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Added
|
||||
- `bin/gstack-version-bump` — tested version-state CLI (classify / write / repair) with 15 unit tests covering the full FRESH / ALREADY_BUMPED / DRIFT_STALE_PKG / DRIFT_UNEXPECTED matrix.
|
||||
- `ship/sections/*.md` — eight on-demand sections (tests, test-coverage, plan-completion, review-army, greptile, adversarial, changelog, pr-body) with a passive `manifest.json` registry.
|
||||
- Section pipeline in `gen-skill-docs`: `{{SECTION:id}}` (STOP-Read pointer on Claude, inline on other hosts) and `{{SECTION_INDEX}}` (situation to section table rendered from the manifest).
|
||||
- `test/helpers/transcript-section-logger.ts` + `required-reads.ts` and section-loading / manifest-consistency / context-parity tests guarding the carve.
|
||||
|
||||
#### Changed
|
||||
- `/ship` is a skeleton + sections on Claude; external hosts still receive the full inline skill (no behavior change off Claude).
|
||||
- Step 12 calls `gstack-version-bump` instead of inline bash.
|
||||
- Parity harness understands carved skills (checks skeleton + sections union; asserts the skeleton actually shrank).
|
||||
|
||||
#### For contributors
|
||||
- `setup` links `sections/` into the prefixed Claude + Kiro skill dirs; `--host all` now fails the build on any host failure, not just claude.
|
||||
- New section templates live at `<skill>/sections/*.md.tmpl`; regenerate with `bun run gen:skill-docs`.
|
||||
## [1.53.1.0] - 2026-05-30
|
||||
|
||||
## **Workspace and scripted setup never hang on a hidden prompt again. Installing the plan-tune hooks is now flag-driven with safe defaults.**
|
||||
|
|
@ -97,9 +186,9 @@ When you `/spec` or `/ship`, you no longer have to remember that the issue body
|
|||
|
||||
#### Added
|
||||
- **Shared redaction engine.** `lib/redact-patterns.ts` (33-pattern, 3-tier taxonomy — the single source of truth) and `lib/redact-engine.ts` (pure `scan()` + `applyRedactions()` with Unicode normalization, ReDoS-safe size cap, Luhn/entropy/RFC1918 validators, safe-masked previews).
|
||||
- **`gstack-redact` CLI**: scan stdin or a file, JSON or human output, exit 0/2/3 to gate skills, `--auto-redact` for the PII one-keystroke path, `--repo-visibility`, `--allowlist`, `--self-email`.
|
||||
- **`gstack-redact` CLI** — scan stdin or a file, JSON or human output, exit 0/2/3 to gate skills, `--auto-redact` for the PII one-keystroke path, `--repo-visibility`, `--allowlist`, `--self-email`.
|
||||
- **Opt-in pre-push hook** (`gstack-redact-prepush` + `gstack-redact install-prepush-hook`) — blocks a credential in the pushed diff (public and private), correct `remote..local` diff direction with new-branch/force-push/delete handling, chains any existing hook, `GSTACK_REDACT_PREPUSH=skip` escape valve.
|
||||
- **`/spec` Phase 4.5a semantic review**: an in-conversation pass (no third party) for named-criticism, customer complaints, unannounced strategy, NDA material, and codename bleed, with a content-free audit trail at `~/.gstack/security/semantic-reviews.jsonl`.
|
||||
- **`/spec` Phase 4.5a semantic review** — an in-conversation pass (no third party) for named-criticism, customer complaints, unannounced strategy, NDA material, and codename bleed, with a content-free audit trail at `~/.gstack/security/semantic-reviews.jsonl`.
|
||||
- **Config keys** `redact_repo_visibility` (local-only override for repos `gh`/`glab` can't read) and `redact_prepush_hook`.
|
||||
|
||||
#### Changed
|
||||
|
|
@ -325,8 +414,8 @@ The next time you leave a gbrowser session running for days, the Bun side holds
|
|||
#### Added
|
||||
- **`$B memory` command** in `browse/src/memory-command.ts` — text mode with sorted top-10 tabs + "and N more" tail; `--json` mode for programmatic consumers and the sidebar footer poll.
|
||||
- **`/memory` HTTP endpoint** in `browse/src/server.ts` — same SSE-session-cookie auth model as `/activity/stream`. Deliberately NOT extending `/health` (which already leaks AUTH_TOKEN in headed mode per TODOS.md "Audit /health token distribution").
|
||||
- **`BrowserManager.getMemorySnapshot()`**: collects Bun process memory + per-tab JS heap via `Performance.getMetrics` (lazy per tracked page, swallows target-died errors) + Chromium process tree via `Browser.newBrowserCDPSession()` + `SystemInfo.getProcessInfo`.
|
||||
- **`browse/src/memory-snapshot.ts`**: shared types (`MemorySnapshot`, `MemoryTabSnapshot`, `MemoryProcess`, `MemoryStructureStats`) plus `formatBytes()` renderer (4 tiers, 2 decimals at GB).
|
||||
- **`BrowserManager.getMemorySnapshot()`** — collects Bun process memory + per-tab JS heap via `Performance.getMetrics` (lazy per tracked page, swallows target-died errors) + Chromium process tree via `Browser.newBrowserCDPSession()` + `SystemInfo.getProcessInfo`.
|
||||
- **`browse/src/memory-snapshot.ts`** — shared types (`MemorySnapshot`, `MemoryTabSnapshot`, `MemoryProcess`, `MemoryStructureStats`) plus `formatBytes()` renderer (4 tiers, 2 decimals at GB).
|
||||
- **`withCdpSession(page, fn)`** and **`getOrCreateCdpSession(page, cache)`** in `browse/src/cdp-bridge.ts` — lifecycle helpers for one-shot and cached CDP work. Every direct `newCDPSession` call site now routes through one of them.
|
||||
- **`createSseEndpoint(req, config)`** in `browse/src/sse-helpers.ts` — owns the SSE cleanup contract (abort + enqueue-throw + heartbeat-throw, all idempotent). Built-in lone-surrogate sanitization on every JSON.stringify.
|
||||
- **Sidebar footer RSS readout** in `extension/sidepanel.{html,js,css}` — polls `/memory` every 30s with 5-minute backoff if response time exceeds 2s. Color-coded thresholds: orange at 2 GB Bun RSS or 50 tabs, red at 8 GB or 200 tabs.
|
||||
|
|
@ -675,31 +764,31 @@ Open the sidebar once. Use it. Close your laptop. Wake up tomorrow. Type a key.
|
|||
|
||||
#### Added
|
||||
|
||||
- **Long-lived PTY connection (`browse/src/terminal-agent.ts`, `extension/sidepanel-terminal.js`)**: 25s WebSocket keepalive ping/pong cycle from both sides. NAT idle drops and Chrome MV3 panel-suspend cycles no longer silently kill the socket. Env-overridable via `GSTACK_PTY_KEEPALIVE_INTERVAL_MS`.
|
||||
- **Session lease + attachToken model (`browse/src/pty-session-lease.ts`)**: Stable non-secret `sessionId` separated from short-lived secret `attachToken`. Re-attach within the lease window refreshes a fresh `attachToken` bound to the same `sessionId`; session identity stays loggable, bearer credential stays out of logs.
|
||||
- **Scrollback replay on re-attach (`browse/src/terminal-agent.ts`)**: 1 MB frame-based ring buffer per session with ESC-boundary scan and alt-screen tracking (`CSI ?1049h/l`). On re-attach, client writes RIS (`\x1bc`) to xterm, server prepends DECSTR soft reset + optional alt-screen re-enter + ring buffer. Replay renders cleanly even mid-tool-call. Env-overridable via `GSTACK_PTY_RING_BUFFER_BYTES`.
|
||||
- **60s detach window with re-attach (`browse/src/terminal-agent.ts`)**: WS close with any code other than 4001 (intentional), 4404 (no-claude), or 1000 (clean exit) keeps the PTY alive for 60s. New WS upgrade matching the same sessionId resumes the same `claude` process. Env-overridable via `GSTACK_PTY_DETACH_WINDOW_MS`.
|
||||
- **Working Restart button (`browse/src/server.ts`, `extension/sidepanel-terminal.js`)**: `POST /pty-restart` is one transaction: dispose old session scope-to-sessionId, revoke old lease, mint fresh sessionId + lease + attachToken, return the 4-tuple. Client sends `{type:"start"}` immediately on the new WS for eager spawn — no keystroke required.
|
||||
- **Explicit dispose on sidebar close (`extension/sidepanel.js`)**: `pagehide` handler fires `navigator.sendBeacon('/pty-dispose', {sessionId, authToken})` so browser quit / panel close / extension reload disposes the session immediately. Server route accepts auth token in the body (sendBeacon-compatible — no custom headers).
|
||||
- **PID-identity terminal-agent kill (`browse/src/terminal-agent-control.ts`)**: Replaces `pkill -f terminal-agent\.ts` regex teardown. Agent writes `<stateDir>/terminal-agent-pid` (JSON `{pid, gen, startedAt}`) at boot; `cli.ts` and `server.ts` use `killAgentByRecord` instead. Static-grep tripwire test fails CI if the regex pattern returns to source.
|
||||
- **Terminal-agent watchdog (`browse/src/server.ts`)**: 60s ticker checks recorded agent PID via `process.kill(pid, 0)`. Respawns on dead PID via shared `spawnTerminalAgent` helper. 3-in-60s crash-loop guard with rolling window. Slow-but-alive agents intentionally fall through (split-brain defense). Env-overridable via `GSTACK_AGENT_WATCHDOG_TICK_MS`.
|
||||
- **Outer browse-server supervisor (`browse/src/cli.ts`)**: `$B connect --supervise` (or `BROWSE_SUPERVISE=1`) keeps the CLI attached, polls server PID every 30s, respawns on unexpected exit with 1s/2s/4s/8s/30s backoff. SIGINT/SIGTERM cleanly teardown the supervised server. Opt-in — default `$B connect` behavior unchanged for every existing caller.
|
||||
- **Patient `tryAutoConnect` (`extension/sidepanel-terminal.js`)**: Replaces the 15s give-up with indefinite 2s polling. Ascending status messages at 15s / 60s / 5min so the user knows we're still trying. Sticky-abort only on 401 (auth invalid), cleared by explicit Restart click.
|
||||
- **`/internal/healthz` route + `internalHandler<T>` helper (`browse/src/terminal-agent.ts`)**: Liveness probe used by the watchdog (returns pid/gen/sessions count, doesn't touch claude binary lookup). Helper collapses four `/internal/*` routes' bearer-auth + X-Browse-Gen check + JSON parse into one-liner calls.
|
||||
- **Long-lived PTY connection (`browse/src/terminal-agent.ts`, `extension/sidepanel-terminal.js`)** — 25s WebSocket keepalive ping/pong cycle from both sides. NAT idle drops and Chrome MV3 panel-suspend cycles no longer silently kill the socket. Env-overridable via `GSTACK_PTY_KEEPALIVE_INTERVAL_MS`.
|
||||
- **Session lease + attachToken model (`browse/src/pty-session-lease.ts`)** — Stable non-secret `sessionId` separated from short-lived secret `attachToken`. Re-attach within the lease window refreshes a fresh `attachToken` bound to the same `sessionId`; session identity stays loggable, bearer credential stays out of logs.
|
||||
- **Scrollback replay on re-attach (`browse/src/terminal-agent.ts`)** — 1 MB frame-based ring buffer per session with ESC-boundary scan and alt-screen tracking (`CSI ?1049h/l`). On re-attach, client writes RIS (`\x1bc`) to xterm, server prepends DECSTR soft reset + optional alt-screen re-enter + ring buffer. Replay renders cleanly even mid-tool-call. Env-overridable via `GSTACK_PTY_RING_BUFFER_BYTES`.
|
||||
- **60s detach window with re-attach (`browse/src/terminal-agent.ts`)** — WS close with any code other than 4001 (intentional), 4404 (no-claude), or 1000 (clean exit) keeps the PTY alive for 60s. New WS upgrade matching the same sessionId resumes the same `claude` process. Env-overridable via `GSTACK_PTY_DETACH_WINDOW_MS`.
|
||||
- **Working Restart button (`browse/src/server.ts`, `extension/sidepanel-terminal.js`)** — `POST /pty-restart` is one transaction: dispose old session scope-to-sessionId, revoke old lease, mint fresh sessionId + lease + attachToken, return the 4-tuple. Client sends `{type:"start"}` immediately on the new WS for eager spawn — no keystroke required.
|
||||
- **Explicit dispose on sidebar close (`extension/sidepanel.js`)** — `pagehide` handler fires `navigator.sendBeacon('/pty-dispose', {sessionId, authToken})` so browser quit / panel close / extension reload disposes the session immediately. Server route accepts auth token in the body (sendBeacon-compatible — no custom headers).
|
||||
- **PID-identity terminal-agent kill (`browse/src/terminal-agent-control.ts`)** — Replaces `pkill -f terminal-agent\.ts` regex teardown. Agent writes `<stateDir>/terminal-agent-pid` (JSON `{pid, gen, startedAt}`) at boot; `cli.ts` and `server.ts` use `killAgentByRecord` instead. Static-grep tripwire test fails CI if the regex pattern returns to source.
|
||||
- **Terminal-agent watchdog (`browse/src/server.ts`)** — 60s ticker checks recorded agent PID via `process.kill(pid, 0)`. Respawns on dead PID via shared `spawnTerminalAgent` helper. 3-in-60s crash-loop guard with rolling window. Slow-but-alive agents intentionally fall through (split-brain defense). Env-overridable via `GSTACK_AGENT_WATCHDOG_TICK_MS`.
|
||||
- **Outer browse-server supervisor (`browse/src/cli.ts`)** — `$B connect --supervise` (or `BROWSE_SUPERVISE=1`) keeps the CLI attached, polls server PID every 30s, respawns on unexpected exit with 1s/2s/4s/8s/30s backoff. SIGINT/SIGTERM cleanly teardown the supervised server. Opt-in — default `$B connect` behavior unchanged for every existing caller.
|
||||
- **Patient `tryAutoConnect` (`extension/sidepanel-terminal.js`)** — Replaces the 15s give-up with indefinite 2s polling. Ascending status messages at 15s / 60s / 5min so the user knows we're still trying. Sticky-abort only on 401 (auth invalid), cleared by explicit Restart click.
|
||||
- **`/internal/healthz` route + `internalHandler<T>` helper (`browse/src/terminal-agent.ts`)** — Liveness probe used by the watchdog (returns pid/gen/sessions count, doesn't touch claude binary lookup). Helper collapses four `/internal/*` routes' bearer-auth + X-Browse-Gen check + JSON parse into one-liner calls.
|
||||
|
||||
#### Changed
|
||||
|
||||
- **`/pty-session` response shape (`browse/src/server.ts`)**: Now returns `{terminalPort, sessionId, attachToken, leaseExpiresAt}`. Legacy `ptySessionToken` + `expiresAt` aliases preserved for one minor release.
|
||||
- **`ServerConfig.ownsTerminalAgent` teardown**: Now runs four side effects (was three): identity-based kill via `killAgentByRecord`, plus unlinks for `terminal-port`, `terminal-internal-token`, and the new `terminal-agent-pid`. Documented in CLAUDE.md.
|
||||
- **`/pty-session` response shape (`browse/src/server.ts`)** — Now returns `{terminalPort, sessionId, attachToken, leaseExpiresAt}`. Legacy `ptySessionToken` + `expiresAt` aliases preserved for one minor release.
|
||||
- **`ServerConfig.ownsTerminalAgent` teardown** — Now runs four side effects (was three): identity-based kill via `killAgentByRecord`, plus unlinks for `terminal-port`, `terminal-internal-token`, and the new `terminal-agent-pid`. Documented in CLAUDE.md.
|
||||
|
||||
#### Fixed
|
||||
|
||||
- **Sibling gstack sessions killed by `pkill -f terminal-agent\.ts`**: Pre-v1.44 the teardown matched argv regex; any process whose command line contained `terminal-agent.ts` got SIGTERM'd. Closes the TODOS.md P3 item filed during v1.41 (`Identity-based terminal-agent kill`).
|
||||
- **Seven pre-existing test failures unrelated to this branch**: Three env-pollution failures (Bun's `Bun.which('bash')` returning null and `Bun.spawn(['bun', ...])` ENOENT after a sibling test mutated `process.env.PATH`), two stale-marker failures in `server-auth.test.ts` (`'Sidebar agent started'` → `'Terminal agent started'`), `setup-codesign.test.ts` looking for the unwrapped `bun run build` string (now `bun_cmd run build`), and `upgrade-migration-v1.test.ts` reading the developer's real config because it didn't override `HOME`. Fixed via a narrow global `test-setup.ts` (restores PATH only after every test) plus targeted marker + env-passing fixes.
|
||||
- **Sibling gstack sessions killed by `pkill -f terminal-agent\.ts`** — Pre-v1.44 the teardown matched argv regex; any process whose command line contained `terminal-agent.ts` got SIGTERM'd. Closes the TODOS.md P3 item filed during v1.41 (`Identity-based terminal-agent kill`).
|
||||
- **Seven pre-existing test failures unrelated to this branch** — Three env-pollution failures (Bun's `Bun.which('bash')` returning null and `Bun.spawn(['bun', ...])` ENOENT after a sibling test mutated `process.env.PATH`), two stale-marker failures in `server-auth.test.ts` (`'Sidebar agent started'` → `'Terminal agent started'`), `setup-codesign.test.ts` looking for the unwrapped `bun run build` string (now `bun_cmd run build`), and `upgrade-migration-v1.test.ts` reading the developer's real config because it didn't override `HOME`. Fixed via a narrow global `test-setup.ts` (restores PATH only after every test) plus targeted marker + env-passing fixes.
|
||||
|
||||
#### For contributors
|
||||
|
||||
- **Test framework `bunfig.toml` + `test-setup.ts`**: Global afterEach restores `process.env.PATH` only. Narrow on purpose — broader snapshot/restore breaks tests that legitimately set `process.env.GSTACK_HOME` at module load (`domain-skills-storage.test.ts`).
|
||||
- **Test framework `bunfig.toml` + `test-setup.ts`** — Global afterEach restores `process.env.PATH` only. Narrow on purpose — broader snapshot/restore breaks tests that legitimately set `process.env.GSTACK_HOME` at module load (`domain-skills-storage.test.ts`).
|
||||
- **12 new test files, 83 new unit-tier tests.** Static-grep tripwires defend the load-bearing protocol contracts (close codes, lease lifecycle, watchdog identity check, supervisor crash-loop guard, ring buffer ESC boundaries) without paying for live WebSocket cycles in CI.
|
||||
- **Eng review + outside voice (codex) ran on this branch.** 17 decisions baked: 10 from the in-review architecture pass (D1-D10), 6 from codex cross-model tension resolution (T1-T6, all adopted in codex's favor — most consequential was T1, separating sessionId from auth token), and 1 from in-PR scope-up of the outer supervisor.
|
||||
|
||||
|
|
@ -1173,10 +1262,10 @@ If you `/sync-gbrain` inside a framework project (Next.js, Prisma, Rails, etc.),
|
|||
#### Added
|
||||
|
||||
- **`/ios-qa`** (770-line SKILL.md.tmpl) — live-device QA flow with warm-start session cache, on-demand daemon spawn, Tailscale opt-in, demo + recording modes, full failure-mode + recovery matrix.
|
||||
- **`/ios-fix`**: autonomous bug fixer that captures a reproducing `/state/snapshot` BEFORE editing source, then rebuilds + redeploys + verifies. Snapshot becomes a regression test fixture.
|
||||
- **`/ios-design-review`**: 10-dimension Apple HIG audit on a real device. 0-10 scores per dimension with "what would make it a 10" framing, mirroring `/plan-design-review`'s rubric for browser.
|
||||
- **`/ios-clean`**: convenience wrapper that strips `DebugBridge` SPM + `#if DEBUG` wiring. Explicitly NOT the safety-critical path — the structural Release-build guard in `Package.swift` is.
|
||||
- **`/ios-sync`**: regenerates accessors against latest upstream gstack templates. Run after upgrading gstack or adding new `@Observable` classes.
|
||||
- **`/ios-fix`** — autonomous bug fixer that captures a reproducing `/state/snapshot` BEFORE editing source, then rebuilds + redeploys + verifies. Snapshot becomes a regression test fixture.
|
||||
- **`/ios-design-review`** — 10-dimension Apple HIG audit on a real device. 0-10 scores per dimension with "what would make it a 10" framing, mirroring `/plan-design-review`'s rubric for browser.
|
||||
- **`/ios-clean`** — convenience wrapper that strips `DebugBridge` SPM + `#if DEBUG` wiring. Explicitly NOT the safety-critical path — the structural Release-build guard in `Package.swift` is.
|
||||
- **`/ios-sync`** — regenerates accessors against latest upstream gstack templates. Run after upgrading gstack or adding new `@Observable` classes.
|
||||
- `ios-qa/templates/StateServer.swift.template` — dual-stack loopback bind (`::1` + `127.0.0.1`), boot token rotation, per-device session lock with mutation-only sliding window, snapshot/restore with schema envelope (`_schema_version` + `_app_build_id` + `_accessor_hash`), validate-then-apply atomicity via a single canonical-state-struct assignment, 1MB body cap.
|
||||
- `ios-qa/templates/DebugOverlay.swift.template` — animated brand-colored border, agent attribution chip (`X-Agent-Identity` header, display-only, never trusted for auth), optional recording-mode watermark for screencasts.
|
||||
- `ios-qa/templates/Package.swift.template` — DebugBridge target gated `.when(configuration: .debug)`. SwiftPM refuses to link in Release config.
|
||||
|
|
@ -1427,20 +1516,20 @@ Page captures with mixed-script Unicode round-trip cleanly to the Claude API now
|
|||
|
||||
#### Fixed
|
||||
|
||||
- **Defense in depth on top of v1.38.0.0's surrogate sanitization (#1440)**: v1.38.0.0 sanitizes at `handleCommandInternal` (the choke point all callers go through). This release adds a second layer at the HTTP-response boundary: `browse/src/sanitize.ts` (new) exports `stripLoneSurrogates`, `stripLoneSurrogateEscapes` (handles `\uXXXX` JSON-escape variants the raw-codepoint regex misses), and `sanitizeBody` (picks the right pass for text/plain vs application/json). `buildCommandResponse` is extracted from `handleCommand` and exported so the response boundary is unit-testable without spinning up the server. `/batch` also gets a per-result + envelope sanitize as belt-and-suspenders. Defense-in-depth wraps at `getCleanText`, `getCleanTextWithStripping`, `html`, `accessibility`, and `snapshot` extraction sites so downstream consumers (datamarking, envelope wrapping) see clean text before any further processing.
|
||||
- **Federation sync drops `/office-hours` and `/plan-eng-review` artifacts (#1452)**: `bin/gstack-artifacts-init` adds `projects/*/*-design-*.md` and `projects/*/*-test-plan-*.md` to all three managed blocks: `.brain-allowlist`, `.brain-privacy-map.json` (class `artifact`), and `.gitattributes` (`merge=union`).
|
||||
- **`/setup-gbrain` wrong config key (#1441)**: verified already-fixed in v1.27.0.0; closed the issue with a comment citing the migration script that aligns legacy `gbrain_sync_mode` installs to the current `artifacts_sync_mode` key.
|
||||
- **Defense in depth on top of v1.38.0.0's surrogate sanitization (#1440)** — v1.38.0.0 sanitizes at `handleCommandInternal` (the choke point all callers go through). This release adds a second layer at the HTTP-response boundary: `browse/src/sanitize.ts` (new) exports `stripLoneSurrogates`, `stripLoneSurrogateEscapes` (handles `\uXXXX` JSON-escape variants the raw-codepoint regex misses), and `sanitizeBody` (picks the right pass for text/plain vs application/json). `buildCommandResponse` is extracted from `handleCommand` and exported so the response boundary is unit-testable without spinning up the server. `/batch` also gets a per-result + envelope sanitize as belt-and-suspenders. Defense-in-depth wraps at `getCleanText`, `getCleanTextWithStripping`, `html`, `accessibility`, and `snapshot` extraction sites so downstream consumers (datamarking, envelope wrapping) see clean text before any further processing.
|
||||
- **Federation sync drops `/office-hours` and `/plan-eng-review` artifacts (#1452)** — `bin/gstack-artifacts-init` adds `projects/*/*-design-*.md` and `projects/*/*-test-plan-*.md` to all three managed blocks: `.brain-allowlist`, `.brain-privacy-map.json` (class `artifact`), and `.gitattributes` (`merge=union`).
|
||||
- **`/setup-gbrain` wrong config key (#1441)** — verified already-fixed in v1.27.0.0; closed the issue with a comment citing the migration script that aligns legacy `gbrain_sync_mode` installs to the current `artifacts_sync_mode` key.
|
||||
|
||||
#### Added
|
||||
|
||||
- **`## Implementation Tasks` section + JSONL handoff in every review skill (#1454)**: `plan-ceo-review`, `plan-design-review`, `plan-eng-review`, `plan-devex-review` each emit a per-skill markdown checklist and write `~/.gstack/projects/$SLUG/tasks-{phase}-{datetime}.jsonl` via `jq -nc` (never hand-rolled echo). `/autoplan` Phase 4 reads all four phase JSONL files, scopes by current branch and 5-commit window, dedupes on exact `(component, sorted(files), title)` matches, and renders one aggregated list. Near-duplicates surface separately with a possible-duplicate note for human resolution.
|
||||
- **`browse/src/sanitize.ts`**: two surrogate-stripping utilities plus a convenience selector keyed on content-type. Pairs with a refactored `buildCommandResponse` in `server.ts` (exported for testability) and per-result sanitization in the `/batch` handler.
|
||||
- **`gstack-upgrade/migrations/v1.38.1.0.sh`**: idempotent per-file repair for `.brain-allowlist`, `.brain-privacy-map.json`, and `.gitattributes`. Uses `jq` for the JSON file (preserves validity); falls back with a clear warning if `jq` is missing. Does NOT re-run `gstack-artifacts-init` (which would commit + push to the user's federated repo).
|
||||
- **`## Implementation Tasks` section + JSONL handoff in every review skill (#1454)** — `plan-ceo-review`, `plan-design-review`, `plan-eng-review`, `plan-devex-review` each emit a per-skill markdown checklist and write `~/.gstack/projects/$SLUG/tasks-{phase}-{datetime}.jsonl` via `jq -nc` (never hand-rolled echo). `/autoplan` Phase 4 reads all four phase JSONL files, scopes by current branch and 5-commit window, dedupes on exact `(component, sorted(files), title)` matches, and renders one aggregated list. Near-duplicates surface separately with a possible-duplicate note for human resolution.
|
||||
- **`browse/src/sanitize.ts`** — two surrogate-stripping utilities plus a convenience selector keyed on content-type. Pairs with a refactored `buildCommandResponse` in `server.ts` (exported for testability) and per-result sanitization in the `/batch` handler.
|
||||
- **`gstack-upgrade/migrations/v1.38.1.0.sh`** — idempotent per-file repair for `.brain-allowlist`, `.brain-privacy-map.json`, and `.gitattributes`. Uses `jq` for the JSON file (preserves validity); falls back with a clear warning if `jq` is missing. Does NOT re-run `gstack-artifacts-init` (which would commit + push to the user's federated repo).
|
||||
- **32 new unit tests** across `browse/test/sanitize.test.ts` (18), `browse/test/build-command-response.test.ts` (7), `test/artifacts-init-migration.test.ts` (7). All gate-tier (free, runs on every PR).
|
||||
|
||||
#### Changed
|
||||
|
||||
- **`browse/src/snapshot.ts`, `read-commands.ts`, `content-security.ts`**: defense-in-depth surrogate wraps at extraction sites that feed pre-Response consumers (datamarking, envelope wrapping).
|
||||
- **`browse/src/snapshot.ts`, `read-commands.ts`, `content-security.ts`** — defense-in-depth surrogate wraps at extraction sites that feed pre-Response consumers (datamarking, envelope wrapping).
|
||||
- **`scripts/resolvers/tasks-section.ts`** (new) + **`scripts/task-emission-schema.ts`** (new) — shared resolver and schema for the per-skill task emission. Each review template invokes `{{TASKS_SECTION_EMIT:<phase>}}` once.
|
||||
|
||||
#### For contributors
|
||||
|
|
@ -1486,21 +1575,21 @@ If you run gstack on Windows: `./setup` now produces a working install across ev
|
|||
|
||||
#### Added
|
||||
|
||||
- **`browse/test/server-sanitize-surrogates.test.ts`**: 11 unit cases (passthrough, valid pair, lone high/low mid-string, trailing/leading lone, adjacent doubles, pair-then-lone, lone-then-pair), 2 bug-repro tests (UTF-8 round-trip + JSON round-trip), 3 wiring-invariant tests (handleCommandInternalImpl rename, SSE activity, SSE inspector).
|
||||
- **`test/setup-windows-fallback.test.ts`**: static invariant (zero raw `ln` calls outside helper), helper-existence assertions, behavior matrix (4 cells: file/dir × Windows/Unix) via awk-style helper extraction + `bash -c` sourcing, Windows-note printer registration check.
|
||||
- **`test/build-script-shell-compat.test.ts`**: regex against `package.json scripts.*` rejecting bash brace groups (Bun-Windows-hostile); asserts `.version` redirects use subshells, not braces.
|
||||
- **`test/docs-config-keys.test.ts`**: deprecated-key denylist (`gbrain_sync_mode`, `gbrain_sync_mode_prompted`) scanned across `docs/**/*.md`; round-trip test for `gstack-config get artifacts_sync_mode`.
|
||||
- **`browse/test/server-sanitize-surrogates.test.ts`** — 11 unit cases (passthrough, valid pair, lone high/low mid-string, trailing/leading lone, adjacent doubles, pair-then-lone, lone-then-pair), 2 bug-repro tests (UTF-8 round-trip + JSON round-trip), 3 wiring-invariant tests (handleCommandInternalImpl rename, SSE activity, SSE inspector).
|
||||
- **`test/setup-windows-fallback.test.ts`** — static invariant (zero raw `ln` calls outside helper), helper-existence assertions, behavior matrix (4 cells: file/dir × Windows/Unix) via awk-style helper extraction + `bash -c` sourcing, Windows-note printer registration check.
|
||||
- **`test/build-script-shell-compat.test.ts`** — regex against `package.json scripts.*` rejecting bash brace groups (Bun-Windows-hostile); asserts `.version` redirects use subshells, not braces.
|
||||
- **`test/docs-config-keys.test.ts`** — deprecated-key denylist (`gbrain_sync_mode`, `gbrain_sync_mode_prompted`) scanned across `docs/**/*.md`; round-trip test for `gstack-config get artifacts_sync_mode`.
|
||||
|
||||
#### Changed
|
||||
|
||||
- **`browse/src/server.ts`**: `handleCommandInternal` split into `handleCommandInternalImpl` (raw) + thin sanitizing wrapper. Single egress point for both HTTP and batch consumers. Inline INVARIANT comment near the wrapper documents the architectural constraint.
|
||||
- **`browse/src/server.ts` SSE producers**: activity feed (`/activity/stream`) and inspector stream stringify with `sanitizeReplacer`, a `JSON.stringify` replacer function that cleans every string value during encoding. Post-stringify regex is a no-op because `JSON.stringify` has already converted `\uD800` to `"\\ud800"` before the regex could match. Inline INVARIANT comment in each.
|
||||
- **`setup`**: new `_link_or_copy SRC DST` helper near `IS_WINDOWS` detection (~line 33). Auto-dispatches on file-vs-directory + Windows-vs-Unix, and skips Unix-style name-only aliases (e.g. `gstack/open-gstack-browser` for the connect-chrome alias) when the source doesn't resolve on disk so Windows installs don't abort under `set -e`. All 42 prior `ln -snf` call sites converted to `_link_or_copy`. New `_print_windows_copy_note_once` helper called from `link_claude_skill_dirs` after any link work completes. `cleanup_old_claude_symlinks` and `cleanup_prefixed_claude_symlinks` extended with a Windows branch so `--prefix` / `--no-prefix` flips remove stale real-file SKILL.md copies instead of leaving them behind.
|
||||
- **`.github/workflows/*.yml` (8 Linux workflows)**: every Linux `runs-on` switched to `ubicloud-standard-8`: `evals.yml`, `evals-periodic.yml`, `ci-image.yml`, `actionlint.yml`, `pr-title-sync.yml`, `skill-docs.yml`, `version-gate.yml`, and `make-pdf-gate.yml`'s Linux matrix entry. The `evals.yml` matrix default and the prose footer both updated to reference `ubicloud-standard-8`.
|
||||
- **`.github/workflows/windows-free-tests.yml`**: stays on GitHub-hosted free `windows-latest`. Test-list expanded to include the 4 new wave tests. Earlier attempts on Blacksmith/GitHub-larger/Ubicloud-Windows all failed (label not registered, org-billing off, vendor doesn't offer Windows respectively); free `windows-latest` is the working path.
|
||||
- **`.github/actionlint.yaml`**: registers the two Ubicloud Linux labels (`ubicloud-standard-2`, `ubicloud-standard-8`) so workflow lint accepts them. The duplicate dead-weight `actionlint.yaml` at the repo root is removed (actionlint only reads `.github/actionlint.yaml`).
|
||||
- **`package.json`**: build script's three `{ git rev-parse HEAD 2>/dev/null || true; } > path/.version` brace groups replaced with `( ... )` subshells. POSIX-universal, Bun-Windows-compatible.
|
||||
- **`docs/gbrain-sync.md`, `docs/gbrain-sync-errors.md`**: 5 stale `gbrain_sync_mode` config-key references → `artifacts_sync_mode` (the rename landed in v1.27.0.0 but two docs still pointed at the old key).
|
||||
- **`browse/src/server.ts`** — `handleCommandInternal` split into `handleCommandInternalImpl` (raw) + thin sanitizing wrapper. Single egress point for both HTTP and batch consumers. Inline INVARIANT comment near the wrapper documents the architectural constraint.
|
||||
- **`browse/src/server.ts` SSE producers** — activity feed (`/activity/stream`) and inspector stream stringify with `sanitizeReplacer`, a `JSON.stringify` replacer function that cleans every string value during encoding. Post-stringify regex is a no-op because `JSON.stringify` has already converted `\uD800` to `"\\ud800"` before the regex could match. Inline INVARIANT comment in each.
|
||||
- **`setup`** — new `_link_or_copy SRC DST` helper near `IS_WINDOWS` detection (~line 33). Auto-dispatches on file-vs-directory + Windows-vs-Unix, and skips Unix-style name-only aliases (e.g. `gstack/open-gstack-browser` for the connect-chrome alias) when the source doesn't resolve on disk so Windows installs don't abort under `set -e`. All 42 prior `ln -snf` call sites converted to `_link_or_copy`. New `_print_windows_copy_note_once` helper called from `link_claude_skill_dirs` after any link work completes. `cleanup_old_claude_symlinks` and `cleanup_prefixed_claude_symlinks` extended with a Windows branch so `--prefix` / `--no-prefix` flips remove stale real-file SKILL.md copies instead of leaving them behind.
|
||||
- **`.github/workflows/*.yml` (8 Linux workflows)** — every Linux `runs-on` switched to `ubicloud-standard-8`: `evals.yml`, `evals-periodic.yml`, `ci-image.yml`, `actionlint.yml`, `pr-title-sync.yml`, `skill-docs.yml`, `version-gate.yml`, and `make-pdf-gate.yml`'s Linux matrix entry. The `evals.yml` matrix default and the prose footer both updated to reference `ubicloud-standard-8`.
|
||||
- **`.github/workflows/windows-free-tests.yml`** — stays on GitHub-hosted free `windows-latest`. Test-list expanded to include the 4 new wave tests. Earlier attempts on Blacksmith/GitHub-larger/Ubicloud-Windows all failed (label not registered, org-billing off, vendor doesn't offer Windows respectively); free `windows-latest` is the working path.
|
||||
- **`.github/actionlint.yaml`** — registers the two Ubicloud Linux labels (`ubicloud-standard-2`, `ubicloud-standard-8`) so workflow lint accepts them. The duplicate dead-weight `actionlint.yaml` at the repo root is removed (actionlint only reads `.github/actionlint.yaml`).
|
||||
- **`package.json`** — build script's three `{ git rev-parse HEAD 2>/dev/null || true; } > path/.version` brace groups replaced with `( ... )` subshells. POSIX-universal, Bun-Windows-compatible.
|
||||
- **`docs/gbrain-sync.md`, `docs/gbrain-sync-errors.md`** — 5 stale `gbrain_sync_mode` config-key references → `artifacts_sync_mode` (the rename landed in v1.27.0.0 but two docs still pointed at the old key).
|
||||
|
||||
#### For contributors
|
||||
|
||||
|
|
@ -1642,15 +1731,15 @@ If you have been seeing `/codex review` fail on argv parsing since Codex CLI hit
|
|||
|
||||
#### Fixed
|
||||
|
||||
- **`codex/SKILL.md.tmpl` Step 2A**: replaced the unconditional `codex review "$boundary" --base <base>` invocation with a two-path branch. Default (no custom user instructions): bare `codex review --base <base>`. Custom instructions: `codex exec -s read-only "$(cat $_PROMPT_FILE)"` where `$_PROMPT_FILE` contains the filesystem boundary, the user's focus, and the diff between `DIFF_START` / `DIFF_END` markers. Probed `-c 'system_prompt="..."'` against Codex 0.130; the key isn't documented and silently no-ops, so the bare path ships without a re-injected boundary. Skill files under `.claude/` and `agents/` are public, so this is token efficiency, not safety. Contributed report by `Stashub` on #1428.
|
||||
- **`bin/gstack-learnings-log`**: added `'investigation'` to `ALLOWED_TYPES` (was: `[pattern, pitfall, preference, architecture, tool, operational]`). Updated the usage comment to list valid types. Contributed report by `diogolealassis` on #1423.
|
||||
- **`lib/gstack-memory-helpers.ts`**: rewrote `freshDetectEngineTier`. Three changes: switched `execSync` to `execFileSync` to drop the bash-specific `2>/dev/null` shell redirect (portable to Windows); recover stdout from the thrown error object so non-zero exits from `gbrain doctor` don't lose the JSON; fall back to reading `gbrain` config (respecting `$GBRAIN_HOME`, defaulting to `~/.gbrain/config.json`) when doctor output doesn't surface an `engine` field. Added `logGbrainError` helper that appends one-line JSONL to `~/.gstack/.gbrain-errors.jsonl` on parse failure. Patch shape contributed by `Shiv @shivasymbl` on #1415; tested against gstack v1.31.0.0 + gbrain v0.31.3 + Supabase.
|
||||
- **`codex/SKILL.md.tmpl` Step 2A** — replaced the unconditional `codex review "$boundary" --base <base>` invocation with a two-path branch. Default (no custom user instructions): bare `codex review --base <base>`. Custom instructions: `codex exec -s read-only "$(cat $_PROMPT_FILE)"` where `$_PROMPT_FILE` contains the filesystem boundary, the user's focus, and the diff between `DIFF_START` / `DIFF_END` markers. Probed `-c 'system_prompt="..."'` against Codex 0.130; the key isn't documented and silently no-ops, so the bare path ships without a re-injected boundary. Skill files under `.claude/` and `agents/` are public, so this is token efficiency, not safety. Contributed report by `Stashub` on #1428.
|
||||
- **`bin/gstack-learnings-log`** — added `'investigation'` to `ALLOWED_TYPES` (was: `[pattern, pitfall, preference, architecture, tool, operational]`). Updated the usage comment to list valid types. Contributed report by `diogolealassis` on #1423.
|
||||
- **`lib/gstack-memory-helpers.ts`** — rewrote `freshDetectEngineTier`. Three changes: switched `execSync` to `execFileSync` to drop the bash-specific `2>/dev/null` shell redirect (portable to Windows); recover stdout from the thrown error object so non-zero exits from `gbrain doctor` don't lose the JSON; fall back to reading `gbrain` config (respecting `$GBRAIN_HOME`, defaulting to `~/.gbrain/config.json`) when doctor output doesn't surface an `engine` field. Added `logGbrainError` helper that appends one-line JSONL to `~/.gstack/.gbrain-errors.jsonl` on parse failure. Patch shape contributed by `Shiv @shivasymbl` on #1415; tested against gstack v1.31.0.0 + gbrain v0.31.3 + Supabase.
|
||||
|
||||
#### Added
|
||||
|
||||
- **`test/gstack-memory-helpers.test.ts`**: `detectEngineTier` regression test for the schema_version:2 fallback path. Sets `HOME`, `GSTACK_HOME`, `GBRAIN_HOME`, and `PATH` to temp dirs (so the test doesn't read the developer's real `~/.gbrain/config.json` or invoke a real `gbrain`), writes a synthetic `{"engine":"postgres","database_url":"..."}` to the temp `GBRAIN_HOME`, asserts `detectEngineTier()` returns `engine: "supabase"`. The existing `detectEngineTier` `beforeEach`/`afterAll` blocks were also extended to isolate `HOME` and `GBRAIN_HOME`, closing a flake source where the prior tests would read whatever was on the reviewer's machine.
|
||||
- **`test/learnings.test.ts`**: two tests for the `investigation` type. One round-trips `gstack-learnings-log` with `type: "investigation"` and asserts the file gets the entry. The other reads `investigate/SKILL.md.tmpl` and asserts it emits `"type":"investigation"` verbatim, caller contract guard against the template drifting to an invalid type.
|
||||
- **`test/codex-hardening.test.ts`**: two tests applied to BOTH `codex/SKILL.md.tmpl` AND the generated `codex/SKILL.md`. The first parses Step 2A's section and asserts no `codex review` invocation line combines a quoted-prompt or variable positional argument with `--base`. The second asserts that Step 2A still contains either bare `codex review --base` OR `codex exec`, guards against accidentally deleting both fix paths in a future edit.
|
||||
- **`test/gstack-memory-helpers.test.ts`** — `detectEngineTier` regression test for the schema_version:2 fallback path. Sets `HOME`, `GSTACK_HOME`, `GBRAIN_HOME`, and `PATH` to temp dirs (so the test doesn't read the developer's real `~/.gbrain/config.json` or invoke a real `gbrain`), writes a synthetic `{"engine":"postgres","database_url":"..."}` to the temp `GBRAIN_HOME`, asserts `detectEngineTier()` returns `engine: "supabase"`. The existing `detectEngineTier` `beforeEach`/`afterAll` blocks were also extended to isolate `HOME` and `GBRAIN_HOME`, closing a flake source where the prior tests would read whatever was on the reviewer's machine.
|
||||
- **`test/learnings.test.ts`** — two tests for the `investigation` type. One round-trips `gstack-learnings-log` with `type: "investigation"` and asserts the file gets the entry. The other reads `investigate/SKILL.md.tmpl` and asserts it emits `"type":"investigation"` verbatim, caller contract guard against the template drifting to an invalid type.
|
||||
- **`test/codex-hardening.test.ts`** — two tests applied to BOTH `codex/SKILL.md.tmpl` AND the generated `codex/SKILL.md`. The first parses Step 2A's section and asserts no `codex review` invocation line combines a quoted-prompt or variable positional argument with `--base`. The second asserts that Step 2A still contains either bare `codex review --base` OR `codex exec`, guards against accidentally deleting both fix paths in a future edit.
|
||||
|
||||
#### For contributors
|
||||
|
||||
|
|
@ -1687,13 +1776,13 @@ Run `/gstack-upgrade` immediately after a new release and the script finds the n
|
|||
|
||||
#### Fixed
|
||||
|
||||
- **`bin/gstack-update-check`**: replaced the unconditional `curl` of `raw.githubusercontent.com/.../main/VERSION` with a SHA-pinned fetch path that resolves the live HEAD via `git ls-remote` first, then curls `raw.githubusercontent.com/garrytan/gstack/<SHA>/VERSION`. Branch-raw fetch kept as fallback when `git ls-remote` is unavailable or `GSTACK_REMOTE_URL` is explicitly set.
|
||||
- **`bin/gstack-update-check`**: added a semver-order guard. After fetching REMOTE, the script runs `sort -V` to confirm REMOTE > LOCAL before emitting `UPGRADE_AVAILABLE`. When LOCAL is at or ahead of REMOTE, it writes `UP_TO_DATE` and exits silently.
|
||||
- **`bin/gstack-update-check`**: fenced `git ls-remote` with `GIT_TERMINAL_PROMPT=0`, `GIT_HTTP_LOW_SPEED_LIMIT=1000`, and `GIT_HTTP_LOW_SPEED_TIME=5` so a flaky network cannot hang every skill preamble.
|
||||
- **`bin/gstack-update-check`** — replaced the unconditional `curl` of `raw.githubusercontent.com/.../main/VERSION` with a SHA-pinned fetch path that resolves the live HEAD via `git ls-remote` first, then curls `raw.githubusercontent.com/garrytan/gstack/<SHA>/VERSION`. Branch-raw fetch kept as fallback when `git ls-remote` is unavailable or `GSTACK_REMOTE_URL` is explicitly set.
|
||||
- **`bin/gstack-update-check`** — added a semver-order guard. After fetching REMOTE, the script runs `sort -V` to confirm REMOTE > LOCAL before emitting `UPGRADE_AVAILABLE`. When LOCAL is at or ahead of REMOTE, it writes `UP_TO_DATE` and exits silently.
|
||||
- **`bin/gstack-update-check`** — fenced `git ls-remote` with `GIT_TERMINAL_PROMPT=0`, `GIT_HTTP_LOW_SPEED_LIMIT=1000`, and `GIT_HTTP_LOW_SPEED_TIME=5` so a flaky network cannot hang every skill preamble.
|
||||
|
||||
#### Added
|
||||
|
||||
- **`browse/test/gstack-update-check.test.ts`**: 3 new tests covering: REMOTE older than LOCAL stays silent and caches `UP_TO_DATE`, multi-segment `1.9.0.0 < 1.10.0.0` produces `UPGRADE_AVAILABLE`, multi-segment `1.10.0.0 > 1.9.0.0` stays silent.
|
||||
- **`browse/test/gstack-update-check.test.ts`** — 3 new tests covering: REMOTE older than LOCAL stays silent and caches `UP_TO_DATE`, multi-segment `1.9.0.0 < 1.10.0.0` produces `UPGRADE_AVAILABLE`, multi-segment `1.10.0.0 > 1.9.0.0` stays silent.
|
||||
|
||||
## [1.34.0.0] - 2026-05-12
|
||||
|
||||
|
|
@ -1784,11 +1873,11 @@ If you've been seeing extra top-level skills (`/dublin-v1`, `/wellington`, etc.)
|
|||
|
||||
#### Fixed
|
||||
|
||||
- **`setup`**: added Conductor worktree guard before `ln -snf "$SOURCE_GSTACK_DIR" "$CLAUDE_GSTACK_LINK"`. Checks `[ -d "$CLAUDE_GSTACK_LINK" ] && [ ! -L "$CLAUDE_GSTACK_LINK" ]` for a real directory, then `cd ... && pwd -P` to compare against the source. If they differ, sets `_SKIP_CLAUDE_REGISTER=1`, prints a remediation message naming both paths, and exits the Claude registration branch without touching the global install.
|
||||
- **`setup`** — added Conductor worktree guard before `ln -snf "$SOURCE_GSTACK_DIR" "$CLAUDE_GSTACK_LINK"`. Checks `[ -d "$CLAUDE_GSTACK_LINK" ] && [ ! -L "$CLAUDE_GSTACK_LINK" ]` for a real directory, then `cd ... && pwd -P` to compare against the source. If they differ, sets `_SKIP_CLAUDE_REGISTER=1`, prints a remediation message naming both paths, and exits the Claude registration branch without touching the global install.
|
||||
|
||||
#### Added
|
||||
|
||||
- **`test/setup-conductor-worktree.test.ts`**: 8 tests (27 expect calls) covering: guard placement in `setup` before `ln -snf`, `pwd -P` resolution against `$SOURCE_GSTACK_DIR`, the skip-branch's remediation message, BSD `ln -snf` reproducer (proves the bug shape exists), guard skips when dest is real-dir-elsewhere, guard allows ln when dest doesn't exist, guard allows ln when dest is an existing symlink (upgrade-in-place), guard allows ln when dest already resolves to source (self-rerun).
|
||||
- **`test/setup-conductor-worktree.test.ts`** — 8 tests (27 expect calls) covering: guard placement in `setup` before `ln -snf`, `pwd -P` resolution against `$SOURCE_GSTACK_DIR`, the skip-branch's remediation message, BSD `ln -snf` reproducer (proves the bug shape exists), guard skips when dest is real-dir-elsewhere, guard allows ln when dest doesn't exist, guard allows ln when dest is an existing symlink (upgrade-in-place), guard allows ln when dest already resolves to source (self-rerun).
|
||||
|
||||
#### For contributors
|
||||
|
||||
|
|
@ -2815,12 +2904,12 @@ Source: `git diff --shortstat origin/main..HEAD` after V1 ship + the V1 test sui
|
|||
|---|---|
|
||||
| Net branch size vs main | **+4174 / −849 lines** across 39 files |
|
||||
| New shared library | **`lib/gstack-memory-helpers.ts`** (330 LOC, 5 public functions: canonicalizeRemote, secretScanFile, detectEngineTier, parseSkillManifest, withErrorContext) |
|
||||
| New helpers in `bin/` | **3 helpers**: `gstack-memory-ingest` (580 LOC), `gstack-gbrain-sync` (270 LOC), `gstack-brain-context-load` (420 LOC) |
|
||||
| Skills with V1 gbrain manifests | **6 skills**: `/office-hours`, `/plan-ceo-review`, `/design-shotgun`, `/design-consultation`, `/investigate`, `/retro` |
|
||||
| Memory types ingested | **8 types**: transcript (Claude Code + Codex), eureka, learning, timeline, ceo-plan, design-doc, retro, builder-profile-entry |
|
||||
| Tests added | **65 new tests**: 22 helpers + 15 ingest + 8 sync + 10 context-load + 10 E2E pipeline |
|
||||
| New /setup-gbrain steps | **2 steps**: Step 7.5 (transcript ingest gate with 5-option AskUserQuestion) + Step 10 (GREEN/YELLOW/RED idempotent doctor verdict) |
|
||||
| New user-facing reference | **`setup-gbrain/memory.md`**: what gets ingested, what stays local, secret scanning via gitleaks, querying, deleting, recovery cases |
|
||||
| New helpers in `bin/` | **3 helpers** — `gstack-memory-ingest` (580 LOC), `gstack-gbrain-sync` (270 LOC), `gstack-brain-context-load` (420 LOC) |
|
||||
| Skills with V1 gbrain manifests | **6 skills** — `/office-hours`, `/plan-ceo-review`, `/design-shotgun`, `/design-consultation`, `/investigate`, `/retro` |
|
||||
| Memory types ingested | **8 types** — transcript (Claude Code + Codex), eureka, learning, timeline, ceo-plan, design-doc, retro, builder-profile-entry |
|
||||
| Tests added | **65 new tests** — 22 helpers + 15 ingest + 8 sync + 10 context-load + 10 E2E pipeline |
|
||||
| New /setup-gbrain steps | **2 steps** — Step 7.5 (transcript ingest gate with 5-option AskUserQuestion) + Step 10 (GREEN/YELLOW/RED idempotent doctor verdict) |
|
||||
| New user-facing reference | **`setup-gbrain/memory.md`** — what gets ingested, what stays local, secret scanning via gitleaks, querying, deleting, recovery cases |
|
||||
| Manifest schema | **`gbrain.schema: 1`**, validated at gen-skill-docs time; 3 query kinds (vector / list / filesystem) with kind-specific required fields |
|
||||
| MCP-call timeout per query | **500ms** hard cap; preamble never blocks > 2s on gbrain issues |
|
||||
| Datamark envelope wrap | **per-page** (not per-message) — single envelope around rendered body |
|
||||
|
|
@ -3038,13 +3127,13 @@ Branch totals come from `git diff --shortstat origin/main..HEAD` after every lan
|
|||
|
||||
| Metric | Δ |
|
||||
|---|---|
|
||||
| New shared resolvers | **2 modules**: `bin/gstack-paths` (61 LOC), `browse/src/claude-bin.ts` (73 LOC) |
|
||||
| New shared resolvers | **2 modules** — `bin/gstack-paths` (61 LOC), `browse/src/claude-bin.ts` (73 LOC) |
|
||||
| Inline state-root chains consolidated | **8 skills** (was 5 in initial scope; 3 more found during T1) |
|
||||
| Hardcoded `claude` spawn sites rewired | **5 sites**: `security-classifier.ts:396`, `:496`, `preflight-agent-sdk.ts`, `helpers/providers/claude.ts`, `helpers/agent-sdk-runner.ts` |
|
||||
| Fork's 95-LOC `claude-bin.ts` reimplementation | **−75 lines**: replaced by `Bun.which()` + 18 LOC of override+args wrapping |
|
||||
| Hardcoded `claude` spawn sites rewired | **5 sites** — `security-classifier.ts:396`, `:496`, `preflight-agent-sdk.ts`, `helpers/providers/claude.ts`, `helpers/agent-sdk-runner.ts` |
|
||||
| Fork's 95-LOC `claude-bin.ts` reimplementation | **−75 lines** — replaced by `Bun.which()` + 18 LOC of override+args wrapping |
|
||||
| Windows-safe curated subset | **103 of 128 free tests** (80%) run on `windows-latest`; 25 excluded with reasons |
|
||||
| New tests added | **+31 tests**: gstack-paths (8), claude-bin (9), test-free-shards (14) |
|
||||
| New invariant tests | **+3**: private-path leak detector + 2 doc-inventory cross-checks in `test/skill-validation.test.ts` |
|
||||
| New tests added | **+31 tests** — gstack-paths (8), claude-bin (9), test-free-shards (14) |
|
||||
| New invariant tests | **+3** — private-path leak detector + 2 doc-inventory cross-checks in `test/skill-validation.test.ts` |
|
||||
| Skill inventory documented | **40+ skills** in AGENTS.md + docs/skills.md (was 21 in AGENTS.md; `/debug` → `/investigate`) |
|
||||
| Free test suite | **318 pass, 0 fail** (`bun test test/skill-validation.test.ts`) |
|
||||
|
||||
|
|
@ -3505,7 +3594,7 @@ The old chat queue is gone. `sidebar-agent.ts`, `/sidebar-command`, `/sidebar-ch
|
|||
#### Added
|
||||
|
||||
- **Interactive Terminal sidebar tab.** xterm.js + a non-compiled `terminal-agent.ts` Bun process that spawns claude with `Bun.spawn({terminal: {rows, cols, data}})`. Auto-connects when the side panel opens, no keypress needed.
|
||||
- **`$B tab-each <command>`**: fan-out helper for multi-tab work. Returns `{command, args, total, results: [{tabId, url, title, status, output}]}`. Skips chrome:// pages, scope-checks the inner command before iterating, restores the original active tab in a `finally` block, never pulls focus away from the user's foreground app.
|
||||
- **`$B tab-each <command>`** — fan-out helper for multi-tab work. Returns `{command, args, total, results: [{tabId, url, title, status, output}]}`. Skips chrome:// pages, scope-checks the inner command before iterating, restores the original active tab in a `finally` block, never pulls focus away from the user's foreground app.
|
||||
- **Live tab state files.** `<stateDir>/tabs.json` (full list with id, url, title, active, pinned, audible, windowId) and `<stateDir>/active-tab.json` (current active). Updated atomically on every `chrome.tabs` event (activated, created, removed, URL/title change). Claude reads on demand instead of running `$B tabs`.
|
||||
- **Tab-awareness system prompt** injected via `claude --append-system-prompt` at spawn so the model knows about the state files and the `$B tab-each` command without being told.
|
||||
- **Always-visible Restart button** in the Terminal toolbar. Force-restart claude any time, not just from the "session ended" state.
|
||||
|
|
@ -3517,7 +3606,7 @@ The old chat queue is gone. `sidebar-agent.ts`, `/sidebar-command`, `/sidebar-ch
|
|||
- **Repaint after debug-tab close.** xterm.js doesn't auto-redraw when its container flips from `display: none` back to `display: flex`. A MutationObserver on `#tab-terminal`'s class attribute now forces a `fitAddon.fit() + term.refresh() + resize` push when the pane becomes visible.
|
||||
|
||||
#### Removed
|
||||
- **`browse/src/sidebar-agent.ts`**: the one-shot `claude -p` queue worker. ~900 lines.
|
||||
- **`browse/src/sidebar-agent.ts`** — the one-shot `claude -p` queue worker. ~900 lines.
|
||||
- **Server endpoints**: `/sidebar-command`, `/sidebar-chat[/clear]`, `/sidebar-agent/{event,kill,stop}`, `/sidebar-tabs[/switch]`, `/sidebar-session{,/new,/list}`, `/sidebar-queue/dismiss`. ~600 lines.
|
||||
- **Chat-related state** in server.ts: `ChatEntry`, `SidebarSession`, `TabAgentState`, `pickSidebarModel`, `addChatEntry`, `processAgentEvent`, `killAgent`, the agent-health watchdog, `chatBuffer`, the per-tab agent map.
|
||||
- **Chat UI in sidepanel.html**: primary-tab nav, `<main id="tab-chat">`, the chat input bar, the experimental "Browser co-pilot" banner, the security event banner, the `clear-chat` footer button.
|
||||
|
|
@ -4411,14 +4500,14 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
|
|||
|
||||
### What actually ships
|
||||
|
||||
* **security.ts**: canary injection plus check, verdict combiner with ensemble rule, attack log with rotation, cross-process session state, device-salted payload hashing
|
||||
* **security-classifier.ts**: TestSavantAI (default) plus Claude Haiku transcript check plus opt-in DeBERTa-v3 ensemble, all with graceful fail-open
|
||||
* **security.ts** — canary injection plus check, verdict combiner with ensemble rule, attack log with rotation, cross-process session state, device-salted payload hashing
|
||||
* **security-classifier.ts** — TestSavantAI (default) plus Claude Haiku transcript check plus opt-in DeBERTa-v3 ensemble, all with graceful fail-open
|
||||
* **Pre-spawn ML scan** on every user message plus tool output scan on every Read, Glob, Grep, WebFetch, Bash result
|
||||
* **Shield icon** with 3 states (green, amber, red) updating continuously via `/sidebar-chat` poll
|
||||
* **Canary leak banner** (centered alert-heavy, per approved design mockup) with expandable layer-score detail
|
||||
* **Attack telemetry** via existing `gstack-telemetry-log` to `community-pulse` to Supabase pipe (tier-gated, community uploads, anonymous local-only, off is no-op)
|
||||
* **`gstack-security-dashboard` CLI**: attacks detected last 7 days, top attacked domains, layer distribution, verdict split
|
||||
* **BrowseSafe-Bench smoke harness**: 200 cases from Perplexity's 3,680-case adversarial dataset, cached hermetically, gates on signal separation
|
||||
* **`gstack-security-dashboard` CLI** — attacks detected last 7 days, top attacked domains, layer distribution, verdict split
|
||||
* **BrowseSafe-Bench smoke harness** — 200 cases from Perplexity's 3,680-case adversarial dataset, cached hermetically, gates on signal separation
|
||||
* **Live Playwright integration test** pins the L1 through L6 defense-in-depth contract
|
||||
* **Bun-native classifier research skeleton** plus design doc — WordPiece tokenizer matching transformers.js output, benchmark harness, FFI roadmap for future 5ms native inference
|
||||
|
||||
|
|
@ -4426,10 +4515,10 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
|
|||
|
||||
Two independent adversarial reviewers (Claude subagent and Codex/gpt-5.4) converged on four bypass paths. All four fixed before merge:
|
||||
|
||||
* **Canary stream-chunk split**: rolling-buffer detection across consecutive `text_delta` and `input_json_delta` events. Previously `.includes()` ran per-chunk, so an attacker could ask Claude to emit the canary split across two deltas and evade the check.
|
||||
* **Snapshot command bypass**: `$B snapshot` emits ARIA-name output from the page, but was missing from `PAGE_CONTENT_COMMANDS`, so malicious aria-labels flowed to Claude without the trust-boundary envelope every other read path gets.
|
||||
* **Tool-output single-layer BLOCK**: `combineVerdict` now accepts `{ toolOutput: true }`. On tool-result scans the Stack Overflow FP concern doesn't apply (content wasn't user-authored), so a single ML classifier at BLOCK threshold now blocks directly instead of degrading to WARN.
|
||||
* **Transcript classifier tool-output context**: Haiku previously saw only `user_message + tool_calls` (empty input) on tool-result scans, so only testsavant_content got a signal. Now receives the actual tool output text and can vote.
|
||||
* **Canary stream-chunk split** — rolling-buffer detection across consecutive `text_delta` and `input_json_delta` events. Previously `.includes()` ran per-chunk, so an attacker could ask Claude to emit the canary split across two deltas and evade the check.
|
||||
* **Snapshot command bypass** — `$B snapshot` emits ARIA-name output from the page, but was missing from `PAGE_CONTENT_COMMANDS`, so malicious aria-labels flowed to Claude without the trust-boundary envelope every other read path gets.
|
||||
* **Tool-output single-layer BLOCK** — `combineVerdict` now accepts `{ toolOutput: true }`. On tool-result scans the Stack Overflow FP concern doesn't apply (content wasn't user-authored), so a single ML classifier at BLOCK threshold now blocks directly instead of degrading to WARN.
|
||||
* **Transcript classifier tool-output context** — Haiku previously saw only `user_message + tool_calls` (empty input) on tool-result scans, so only testsavant_content got a signal. Now receives the actual tool output text and can vote.
|
||||
|
||||
Also: attribute-injection fix in `escapeHtml` (escapes `"` and `'` now), `GSTACK_SECURITY_OFF=1` is now a real gate in `loadTestsavant`/`loadDeberta` (not just a doc promise), device salt cached in-process so FS-unwritable environments don't break hash correlation, tool-use registry entries evicted on `tool_result` (memory leak fix), dashboard uses `jq` for brace-balanced JSON parse when available.
|
||||
|
||||
|
|
@ -4576,7 +4665,7 @@ If you're a solo builder or founder shipping a product one sprint at a time, `/d
|
|||
|
||||
- **Test infrastructure for multi-provider benchmarking.** `test/helpers/providers/{types,claude,gpt,gemini}.ts` defines a uniform `ProviderAdapter` interface and three adapters wrapping the existing CLI runners. `test/helpers/pricing.ts` has per-model cost tables (update quarterly). `test/helpers/tool-map.ts` declares which tools each provider's CLI exposes — benchmarks that need Edit/Glob/Grep correctly skip Gemini and report `unsupported_tool`.
|
||||
- **Model taxonomy in neutral `scripts/models.ts`.** Avoids an import cycle through `hosts/index.ts` that would have happened if `Model` lived in `scripts/resolvers/types.ts`. `resolveModel()` handles family heuristics: `gpt-5.4-mini` → `gpt-5.4`, `o3` → `o-series`, `claude-opus-4-7` → `claude`.
|
||||
- **`scripts/resolvers/preamble/`**: 18 single-purpose generators, 16-160 lines each. The composition root in `scripts/resolvers/preamble.ts` imports them and wires them into the tier-gated section list.
|
||||
- **`scripts/resolvers/preamble/`** — 18 single-purpose generators, 16-160 lines each. The composition root in `scripts/resolvers/preamble.ts` imports them and wires them into the tier-gated section list.
|
||||
- **Plan and reviews persisted.** Implementation followed `~/.claude/plans/declarative-riding-cook.md` which went through CEO review (SCOPE EXPANSION, 6 expansions accepted), DX review (POLISH, 5 gaps fixed), Eng review (4 architecture issues), and Codex review (11 brutal findings, all integrated and 2 prior decisions reversed).
|
||||
- **Mode-posture energy in Writing Style rules 2-4** (ported from main's v1.1.2.0). Rule 2 and rule 4 now cover three framings — pain reduction, capability unlocked, forcing-question pressure — so expansion, builder, and forcing-question skills keep their edge instead of collapsing into diagnostic-pain framing. Rule 3 adds an explicit exception for stacked forcing questions. Came in via the merge; sits on top of the submodule refactor already shipped in v1.3.
|
||||
- **Lite E2E coverage for v1.3 primitives.** Three new test files fill the real coverage gaps flagged in initial review: `test/taste-engine.test.ts` (24 tests — schema shape, Laplace-smoothed confidence, 5%/week decay clamped at 0, multi-dimension extraction, case-insensitive first-casing-wins policy, session cap via seed-then-one-call, legacy profile migration, taste-drift conflict warning, malformed-JSON recovery), `test/benchmark-cli.test.ts` (12 tests — CLI flag wiring, provider defaults, unknown-provider WARN path, NOT-READY branch regression catcher that strips auth env vars), `test/skill-e2e-benchmark-providers.test.ts` (8 periodic-tier live-API tests — trivial "echo ok" prompt through claude/codex/gemini adapters, assertions on parsed output + tokens + cost + timeout error codes + Promise.allSettled parallel isolation).
|
||||
|
|
@ -4650,7 +4739,7 @@ If you're a solo builder or founder shipping a product one sprint at a time, `/d
|
|||
- `file://` navigation is now an accepted scheme in `goto`, scoped to cwd + temp dir via the existing `validateReadPath()` policy. UNC/network hosts (`file://host.example.com/...`), IP hosts, IPv6 hosts, and Windows drive-letter hosts are all rejected with explicit errors.
|
||||
- **State files can no longer smuggle HTML content.** `state load` now uses an explicit allowlist for the fields it accepts from disk — a tampered state file cannot inject `loadedHtml` to bypass the `load-html` safe-dirs, extension allowlist, magic-byte sniff, or size cap checks. Tab ownership is preserved across context recreation via the same in-memory channel, closing a cross-agent authorization gap where scoped agents could lose (or gain) tabs after `viewport --scale`.
|
||||
- **Audit log now records the raw alias input.** When you type `setcontent`, the audit entry shows `cmd: load-html, aliasOf: setcontent` so the forensic trail reflects what the agent actually sent, not just the canonical form.
|
||||
- **`load-html` content correctly clears on every real navigation**: link clicks, form submits, and JavaScript redirects now invalidate the replay metadata just like explicit `goto`/`back`/`forward`/`reload` do. Previously a later `viewport --scale` after a click could resurrect the original `load-html` content (silent data corruption). Also fixes SPA fixture URLs: `goto file:///tmp/app.html?route=home#login` preserves the query string and fragment through normalization.
|
||||
- **`load-html` content correctly clears on every real navigation** — link clicks, form submits, and JavaScript redirects now invalidate the replay metadata just like explicit `goto`/`back`/`forward`/`reload` do. Previously a later `viewport --scale` after a click could resurrect the original `load-html` content (silent data corruption). Also fixes SPA fixture URLs: `goto file:///tmp/app.html?route=home#login` preserves the query string and fragment through normalization.
|
||||
|
||||
### For contributors
|
||||
- `validateNavigationUrl()` now returns the normalized URL (previously void). All four callers — goto, diff, newTab, restoreState — updated to consume the return value so smart-parsing takes effect at every navigation site.
|
||||
|
|
|
|||
|
|
@ -938,4 +938,10 @@ file globs. Run `/sync-gbrain` after meaningful code changes; for ongoing
|
|||
auto-sync across all worktrees, run `gbrain autopilot --install` once per
|
||||
machine — gbrain's daemon handles incremental refresh on a schedule.
|
||||
|
||||
Safety: don't run `/sync-gbrain` while `gbrain autopilot` is active — the
|
||||
orchestrator refuses destructive source ops when it detects a running autopilot
|
||||
to avoid racing it (#1734). Prefer registering user repos with `gbrain sources
|
||||
add --path <dir>` (no `--url`): URL-managed sources can auto-reclone, and the
|
||||
sync code walk for them requires an explicit `--allow-reclone` opt-in.
|
||||
|
||||
<!-- gstack-gbrain-search-guidance:end -->
|
||||
|
|
|
|||
|
|
@ -136,7 +136,7 @@ The skill runs three stages — code, memory, brain-sync — independently. A fa
|
|||
|
||||
1. **Pre-flight.** Checks `gbrain_local_status` (the local engine's health). If the engine is `broken-db` or `broken-config`, the skill STOPs with a remediation menu — it refuses to silently degrade. If the local engine is missing and you're in remote-MCP mode (Path 4), the code stage SKIPs cleanly and only brain-sync runs.
|
||||
2. **Code stage.** Registers the cwd as a federated source via `gbrain sources add`, writes a `.gbrain-source` pin file in the repo root (kubectl-style context — every worktree gets its own pin, so Conductor sibling worktrees don't collide), runs `gbrain sync --strategy code`.
|
||||
3. **Memory stage.** Stages your `~/.gstack/` transcripts + curated memory. In local-stdio MCP mode, ingests into the local engine. In remote-http MCP mode, persists staged markdown to `~/.gstack/transcripts/run-<pid>-<ts>/` for the remote brain admin's pull pipeline.
|
||||
3. **Memory stage.** Stages your `~/.gstack/` transcripts + curated memory. In local-stdio MCP mode, ingests into the local engine. In remote-http MCP mode, persists staged markdown to `~/.gstack/transcripts/run-<pid>-<ts>/` for the remote brain admin's pull pipeline. The ingest timeout is 30 minutes by default; raise it for a big brain with `GSTACK_INGEST_TIMEOUT_MS` (accepts 1 min–24h). On timeout the gbrain import checkpoint is preserved, so the next `/sync-gbrain` resumes instead of starting over.
|
||||
4. **Brain-sync stage.** Pushes curated artifacts (plans, designs, retros) to your private artifacts repo if you have one configured.
|
||||
5. **CLAUDE.md guidance.** Capability-checks the round-trip (write a page → search → find it). If green, writes the `## GBrain Search Guidance` block to your project's CLAUDE.md. If red, REMOVES the block — the agent should never be told to use a tool that isn't installed.
|
||||
|
||||
|
|
@ -379,7 +379,7 @@ Another gstack session in a sibling Conductor workspace may be holding a lock on
|
|||
## Related skills + next steps
|
||||
|
||||
- `/health` — includes a GBrain dimension (doctor status, sync queue depth, last-push age) in its 0-10 composite score. The dimension is omitted when gbrain isn't installed; running `/health` on a non-gbrain machine doesn't penalize that choice.
|
||||
- `/gstack-upgrade` — keeps gstack itself up to date. Does NOT upgrade gbrain independently. To bump gbrain, update `PINNED_COMMIT` in `bin/gstack-gbrain-install` and re-run `/setup-gbrain`.
|
||||
- `/gstack-upgrade` — keeps gstack itself up to date. Does NOT upgrade gbrain independently. gbrain installs at the latest HEAD by default; to refresh it, `git pull` in your gbrain clone (default `~/gbrain`) and re-run `/setup-gbrain`. Pin a specific commit with `gstack-gbrain-install --pinned-commit <sha>` if you need reproducibility. Installs below the minimum tested version are refused.
|
||||
- `/retro` — weekly retrospective pulls learnings and plans from your gbrain when memory sync is on, letting the retro reference cross-machine history.
|
||||
|
||||
Run `/setup-gbrain` and see what sticks.
|
||||
|
|
|
|||
|
|
@ -19,9 +19,14 @@
|
|||
# - git
|
||||
# - network reachability to https://github.com
|
||||
#
|
||||
# The pinned commit is declared here rather than resolved dynamically so
|
||||
# upgrades are explicit and reviewable. Update PINNED_COMMIT when gstack
|
||||
# verifies compatibility with a new gbrain release.
|
||||
# gbrain installs at the latest default-branch HEAD by default — the hard pin
|
||||
# was removed in #1744 (it had drifted ~23 versions behind). Pass
|
||||
# --pinned-commit <sha> to install a specific commit for reproducibility. A
|
||||
# minimum-version floor (MIN_GBRAIN_VERSION) hard-fails the install when the
|
||||
# resulting gbrain is too old for gstack's sync integration, and a fast
|
||||
# `gbrain doctor` self-test hard-fails a broken install when gbrain is already
|
||||
# configured. This keeps the version gate that the pin used to provide without
|
||||
# freezing users 23 releases behind.
|
||||
#
|
||||
# Env:
|
||||
# GBRAIN_INSTALL_DIR — override default install path (~/gbrain)
|
||||
|
|
@ -33,8 +38,14 @@
|
|||
set -euo pipefail
|
||||
|
||||
# --- defaults ---
|
||||
PINNED_COMMIT="08b3698e90532b7b66c445e6b1d8cdfe71822802" # gbrain v0.18.2
|
||||
PINNED_TAG="v0.18.2"
|
||||
# No version pin by default — install the latest default-branch HEAD (#1744).
|
||||
# --pinned-commit <sha> overrides for reproducibility.
|
||||
PINNED_COMMIT=""
|
||||
PINNED_TAG=""
|
||||
# Minimum gbrain version gstack's integration is known to work with. The
|
||||
# `sources list --json` wrapped-object shape + federated sources landed by 0.20;
|
||||
# older predates the surface gstack drives. Hard-fail below this floor (#1744).
|
||||
MIN_GBRAIN_VERSION="0.20.0"
|
||||
GBRAIN_REPO_URL="https://github.com/garrytan/gbrain.git"
|
||||
DEFAULT_INSTALL_DIR="${GBRAIN_INSTALL_DIR:-$HOME/gbrain}"
|
||||
INSTALL_DIR="$DEFAULT_INSTALL_DIR"
|
||||
|
|
@ -113,7 +124,7 @@ elif [ -n "$DETECTED_CLONE" ]; then
|
|||
else
|
||||
# Fresh clone path.
|
||||
if $DRY_RUN; then
|
||||
log "DRY RUN: would clone $GBRAIN_REPO_URL @ $PINNED_COMMIT → $INSTALL_DIR"
|
||||
log "DRY RUN: would clone $GBRAIN_REPO_URL ${PINNED_COMMIT:+@ $PINNED_COMMIT }→ $INSTALL_DIR (latest HEAD unless --pinned-commit)"
|
||||
exit 0
|
||||
fi
|
||||
if [ -d "$INSTALL_DIR" ]; then
|
||||
|
|
@ -121,8 +132,12 @@ else
|
|||
fi
|
||||
log "cloning $GBRAIN_REPO_URL → $INSTALL_DIR"
|
||||
git clone --quiet "$GBRAIN_REPO_URL" "$INSTALL_DIR"
|
||||
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
|
||||
log "pinned to $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
|
||||
if [ -n "$PINNED_COMMIT" ]; then
|
||||
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
|
||||
log "checked out pinned commit $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
|
||||
else
|
||||
log "installed latest gbrain (default-branch HEAD)"
|
||||
fi
|
||||
fi
|
||||
|
||||
if $DRY_RUN; then
|
||||
|
|
@ -195,6 +210,44 @@ fi
|
|||
|
||||
log "installed gbrain $actual_version from $INSTALL_DIR"
|
||||
|
||||
# --- minimum-version floor (#1744) ---
|
||||
# Unpinning means new installs track gbrain HEAD. Hard-fail if the resulting
|
||||
# version is below the floor gstack's sync integration needs — same exit-3 posture
|
||||
# as the PATH-shadow / version-mismatch failures above. A warning here is exactly
|
||||
# how the data-loss class slipped through, so this gate fails closed.
|
||||
version_lt() {
|
||||
# 0 (true) when $1 < $2 by version sort; equal versions are NOT less-than.
|
||||
[ "$1" = "$2" ] && return 1
|
||||
[ "$(printf '%s\n%s\n' "$1" "$2" | sort -V | head -1)" = "$1" ]
|
||||
}
|
||||
if version_lt "$actual_norm" "$MIN_GBRAIN_VERSION"; then
|
||||
echo "" >&2
|
||||
echo "gstack-gbrain-install: gbrain $actual_version is below the minimum gstack-tested version ($MIN_GBRAIN_VERSION)." >&2
|
||||
echo " gstack's sync integration needs the v0.20+ source/list surface." >&2
|
||||
echo " Fix: update the gbrain clone at $INSTALL_DIR to a newer release (git pull), then" >&2
|
||||
echo " re-run /setup-gbrain. Or pass --pinned-commit <sha> to install a specific newer commit." >&2
|
||||
echo "" >&2
|
||||
exit 3
|
||||
fi
|
||||
|
||||
# --- functional self-test when gbrain is already configured (#1744) ---
|
||||
# When a brain config exists (re-install / detected clone), run a fast doctor as
|
||||
# a hard gate so a broken gbrain is caught at setup, not at data-loss time.
|
||||
# Pre-init installs skip this (config not written yet); the full
|
||||
# `/sync-gbrain --dry-run` self-test runs from /setup-gbrain after `gbrain init`.
|
||||
_GBRAIN_HOME_CHECK="${GBRAIN_HOME:-$HOME/.gbrain}"
|
||||
if [ -f "$_GBRAIN_HOME_CHECK/config.json" ]; then
|
||||
if ! gbrain doctor --fast >/dev/null 2>&1; then
|
||||
echo "" >&2
|
||||
echo "gstack-gbrain-install: gbrain $actual_version installed but 'gbrain doctor --fast' failed." >&2
|
||||
echo " Refusing to leave a broken gbrain in place. Run 'gbrain doctor' to see what's wrong," >&2
|
||||
echo " fix it, then re-run /setup-gbrain." >&2
|
||||
echo "" >&2
|
||||
exit 3
|
||||
fi
|
||||
log "gbrain doctor --fast passed"
|
||||
fi
|
||||
|
||||
# v1.40.0.0 post-install validation (T6 / codex review #19): --ignore-scripts
|
||||
# may skip artifacts gbrain needs at runtime, especially on Windows
|
||||
# MSYS/MINGW where we DID pass --ignore-scripts. `gbrain --version` above
|
||||
|
|
|
|||
|
|
@ -37,9 +37,10 @@ import { createHash } from "crypto";
|
|||
|
||||
import "../lib/conductor-env-shim";
|
||||
import { detectEngineTier, withErrorContext, canonicalizeRemote } from "../lib/gstack-memory-helpers";
|
||||
import { ensureSourceRegistered, sourcePageCount } from "../lib/gbrain-sources";
|
||||
import { ensureSourceRegistered, sourcePageCount, parseSourcesList } from "../lib/gbrain-sources";
|
||||
import { detectAutopilot, decideSourceRemove, decideCodeSync } from "../lib/gbrain-guards";
|
||||
import { localEngineStatus, type LocalEngineStatus } from "../lib/gbrain-local-status";
|
||||
import { buildGbrainEnv, spawnGbrain, execGbrainJson } from "../lib/gbrain-exec";
|
||||
import { buildGbrainEnv, spawnGbrain, execGbrainJson, NEEDS_SHELL_ON_WINDOWS } from "../lib/gbrain-exec";
|
||||
|
||||
// ── Types ──────────────────────────────────────────────────────────────────
|
||||
|
||||
|
|
@ -52,6 +53,8 @@ interface CliArgs {
|
|||
noMemory: boolean;
|
||||
noBrainSync: boolean;
|
||||
codeOnly: boolean;
|
||||
/** #1734: opt-in to sync a URL-managed source whose code walk may auto-reclone. */
|
||||
allowReclone: boolean;
|
||||
}
|
||||
|
||||
interface CodeStageDetail {
|
||||
|
|
@ -59,7 +62,7 @@ interface CodeStageDetail {
|
|||
source_path?: string;
|
||||
page_count?: number | null;
|
||||
last_imported?: string;
|
||||
status?: "ok" | "skipped" | "failed";
|
||||
status?: "ok" | "skipped" | "failed" | "refused-autopilot" | "refused-reclone";
|
||||
}
|
||||
|
||||
interface StageResult {
|
||||
|
|
@ -205,6 +208,8 @@ Options:
|
|||
--no-memory Skip the gstack-memory-ingest stage (transcripts + artifacts).
|
||||
--no-brain-sync Skip the gstack-brain-sync git pipeline stage.
|
||||
--code-only Only run the code-import stage (alias for --no-memory --no-brain-sync).
|
||||
--allow-reclone Permit the code walk for URL-managed sources (remote_url set)
|
||||
even though gbrain may auto-reclone the working tree (#1734).
|
||||
--help This text.
|
||||
|
||||
Stages run in order: code → memory ingest → curated git push.
|
||||
|
|
@ -220,6 +225,7 @@ function parseArgs(): CliArgs {
|
|||
let noMemory = false;
|
||||
let noBrainSync = false;
|
||||
let codeOnly = false;
|
||||
let allowReclone = false;
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
const a = args[i];
|
||||
|
|
@ -231,6 +237,7 @@ function parseArgs(): CliArgs {
|
|||
case "--no-code": noCode = true; break;
|
||||
case "--no-memory": noMemory = true; break;
|
||||
case "--no-brain-sync": noBrainSync = true; break;
|
||||
case "--allow-reclone": allowReclone = true; break;
|
||||
case "--code-only":
|
||||
codeOnly = true;
|
||||
noMemory = true;
|
||||
|
|
@ -247,7 +254,7 @@ function parseArgs(): CliArgs {
|
|||
}
|
||||
}
|
||||
|
||||
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly };
|
||||
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly, allowReclone };
|
||||
}
|
||||
|
||||
// ── Helpers ────────────────────────────────────────────────────────────────
|
||||
|
|
@ -407,10 +414,7 @@ export function sourceLocalPath(sourceId: string, env?: NodeJS.ProcessEnv): stri
|
|||
{ baseEnv: env },
|
||||
);
|
||||
if (!raw) return null;
|
||||
const list: Array<{ id?: string; local_path?: string }> = Array.isArray(raw)
|
||||
? (raw as Array<{ id?: string; local_path?: string }>)
|
||||
: ((raw as { sources?: Array<{ id?: string; local_path?: string }> }).sources ?? []);
|
||||
const found = list.find((s) => s.id === sourceId);
|
||||
const found = parseSourcesList(raw).find((s) => s.id === sourceId);
|
||||
return found?.local_path ?? null;
|
||||
}
|
||||
|
||||
|
|
@ -469,20 +473,50 @@ export function planHostnameFoldMigration(
|
|||
return { kind: "pending-cleanup", oldId: legacyPathHashId };
|
||||
}
|
||||
|
||||
export interface GuardedRemoveResult {
|
||||
removed: boolean;
|
||||
/** True when a guard refused the remove (autopilot active or unsafe source). */
|
||||
skipped: boolean;
|
||||
reason: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* #1734: run `gbrain sources remove <id> --confirm-destructive` only behind the
|
||||
* data-loss guards. Checked immediately before the destructive op (E8: as late
|
||||
* as possible) so the autopilot window is as small as we can make it without a
|
||||
* gbrain-side lease. Refuses when autopilot is active or when the source is
|
||||
* user-managed and gbrain can't keep its storage. Pure side-effect helper; the
|
||||
* caller decides whether a skip is fatal (it never is today — removes are
|
||||
* best-effort cleanup).
|
||||
*/
|
||||
export function safeSourcesRemove(sourceId: string, env?: NodeJS.ProcessEnv): GuardedRemoveResult {
|
||||
const ap = detectAutopilot(env);
|
||||
if (ap.active) {
|
||||
return {
|
||||
removed: false,
|
||||
skipped: true,
|
||||
reason: `autopilot active (${ap.signal}); refusing destructive remove of ${sourceId}. ` +
|
||||
`Stop autopilot, then re-run /sync-gbrain.`,
|
||||
};
|
||||
}
|
||||
const decision = decideSourceRemove(sourceId, env);
|
||||
if (!decision.allow) {
|
||||
return { removed: false, skipped: true, reason: decision.reason };
|
||||
}
|
||||
const r = spawnGbrain(
|
||||
["sources", "remove", sourceId, "--confirm-destructive", ...decision.extraArgs],
|
||||
{ baseEnv: env },
|
||||
);
|
||||
return { removed: r.status === 0, skipped: false, reason: decision.reason };
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove an orphaned source. Called only after new-source sync verifies pages
|
||||
* exist, so the old source is provably redundant before deletion.
|
||||
*
|
||||
* Flag note: existing call sites used `--confirm-destructive` here and
|
||||
* `--yes` in `lib/gbrain-sources.ts` — gbrain 0.35.0.0 accepts neither
|
||||
* deterministically (the subcommand surface help is generic). We pass
|
||||
* `--confirm-destructive` to match the existing call site convention; the
|
||||
* flag-helper centralization in commit 4 (lib/gbrain-exec.ts) will resolve
|
||||
* the inconsistency across the codebase.
|
||||
* exist, so the old source is provably redundant before deletion. Routed through
|
||||
* safeSourcesRemove for the #1734 guards.
|
||||
*/
|
||||
export function removeOrphanedSource(oldId: string, env?: NodeJS.ProcessEnv): boolean {
|
||||
const r = spawnGbrain(["sources", "remove", oldId, "--confirm-destructive"], { baseEnv: env });
|
||||
return r.status === 0;
|
||||
return safeSourcesRemove(oldId, env).removed;
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
@ -661,13 +695,12 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
|||
const legacyId = deriveLegacyCodeSourceId(root);
|
||||
let legacyRemoved = false;
|
||||
if (legacyId !== sourceId) {
|
||||
const rm = spawnGbrain(["sources", "remove", legacyId, "--confirm-destructive"], {
|
||||
timeout: 30_000,
|
||||
baseEnv: gbrainEnv,
|
||||
});
|
||||
// Treat absent-source as success (clean state). gbrain emits "not found" on
|
||||
// missing id; treat any non-zero exit without "not found" as a soft fail.
|
||||
if (rm.status === 0) legacyRemoved = true;
|
||||
// #1734: route through the data-loss guards (autopilot + source-safety).
|
||||
const rm = safeSourcesRemove(legacyId, gbrainEnv);
|
||||
if (rm.skipped && !args.quiet) {
|
||||
console.error(`[sync:code] legacy-source cleanup skipped: ${rm.reason}`);
|
||||
}
|
||||
if (rm.removed) legacyRemoved = true;
|
||||
}
|
||||
|
||||
// Step 0b: Hostname-fold migration (#1414).
|
||||
|
|
@ -720,6 +753,29 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
|||
process.env.GSTACK_SYNC_CODE_TIMEOUT_MS,
|
||||
"GSTACK_SYNC_CODE_TIMEOUT_MS",
|
||||
);
|
||||
|
||||
// #1734 guards, checked immediately before the destructive walk (E8):
|
||||
// - autopilot active → refuse (the race that wiped a working tree).
|
||||
// - URL-managed source → the walk can auto-reclone (rm-rf); require
|
||||
// --allow-reclone. Both surface a visible reason and fail the stage so the
|
||||
// verdict shows ERR rather than silently skipping protection.
|
||||
const apBeforeWalk = detectAutopilot(gbrainEnv);
|
||||
if (apBeforeWalk.active) {
|
||||
return {
|
||||
name: "code", ran: true, ok: false, duration_ms: Date.now() - t0,
|
||||
summary: `refused: gbrain autopilot active (${apBeforeWalk.signal}). Stop autopilot, then re-run /sync-gbrain.`,
|
||||
detail: { source_id: sourceId, source_path: root, status: "refused-autopilot" },
|
||||
};
|
||||
}
|
||||
const reclone = decideCodeSync(sourceId, gbrainEnv, args.allowReclone);
|
||||
if (!reclone.allow) {
|
||||
return {
|
||||
name: "code", ran: true, ok: false, duration_ms: Date.now() - t0,
|
||||
summary: `refused: ${reclone.reason}`,
|
||||
detail: { source_id: sourceId, source_path: root, status: "refused-reclone" },
|
||||
};
|
||||
}
|
||||
|
||||
const walkResult = spawnGbrain(["sync", "--strategy", "code", "--source", sourceId], {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: codeTimeoutMs,
|
||||
|
|
@ -961,13 +1017,17 @@ function runBrainSyncPush(args: CliArgs): StageResult {
|
|||
return { name: "brain-sync", ran: false, ok: true, duration_ms: 0, summary: "skipped (gstack-brain-sync not installed)" };
|
||||
}
|
||||
|
||||
// #1731: gstack-brain-sync is a bash shebang script; Windows can't spawn it
|
||||
// without a shell, which surfaced as "brain-sync exited undefined".
|
||||
spawnSync(brainSyncPath, ["--discover-new"], {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: 60 * 1000,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS,
|
||||
});
|
||||
const result = spawnSync(brainSyncPath, ["--once"], {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: 60 * 1000,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS,
|
||||
});
|
||||
|
||||
return {
|
||||
|
|
|
|||
|
|
@ -53,18 +53,25 @@ for path in paths:
|
|||
continue
|
||||
if line in seen:
|
||||
continue
|
||||
# Prefer ISO ts field for sort; fall back to SHA-256.
|
||||
# Prefer ISO ts field for sort; fall back to SHA-256. The line
|
||||
# content is the final tiebreaker so the order is total: two
|
||||
# entries sharing a ts must resolve identically regardless of
|
||||
# which side they arrive on. Without it, equal-ts entries fall
|
||||
# back to insertion order (base, ours, theirs), and since ours
|
||||
# and theirs are swapped depending on which machine runs the
|
||||
# merge, the two sides produce divergent files that never
|
||||
# converge.
|
||||
sort_key = None
|
||||
try:
|
||||
obj = json.loads(line)
|
||||
ts = obj.get('ts') or obj.get('timestamp')
|
||||
if isinstance(ts, str):
|
||||
sort_key = (0, ts)
|
||||
sort_key = (0, ts, line)
|
||||
except (json.JSONDecodeError, ValueError, TypeError):
|
||||
pass
|
||||
if sort_key is None:
|
||||
h = hashlib.sha256(line.encode('utf-8')).hexdigest()
|
||||
sort_key = (1, h)
|
||||
sort_key = (1, h, line)
|
||||
seen[line] = sort_key
|
||||
except FileNotFoundError:
|
||||
# Absent base / absent ours / absent theirs are all valid.
|
||||
|
|
|
|||
|
|
@ -1349,10 +1349,32 @@ function installSignalForwarder(): void {
|
|||
* that kill the child on parent SIGTERM/SIGINT. Returns the same shape as
|
||||
* spawnSync's result so the caller doesn't care which mode was used.
|
||||
*/
|
||||
/**
|
||||
* #1611: the `gbrain import` is the long pole on big brains. Its timeout is
|
||||
* configurable via GSTACK_INGEST_TIMEOUT_MS (default 30 min, 1min–24h) so large
|
||||
* memory corpora aren't SIGTERM'd mid-import. On timeout we SIGTERM the child,
|
||||
* which preserves gbrain's import-checkpoint.json (see installSignalForwarder)
|
||||
* so the next run resumes instead of restarting from scratch.
|
||||
*/
|
||||
const DEFAULT_IMPORT_TIMEOUT_MS = 30 * 60 * 1000;
|
||||
export function resolveImportTimeoutMs(
|
||||
raw: string | undefined = process.env.GSTACK_INGEST_TIMEOUT_MS,
|
||||
): number {
|
||||
if (raw === undefined || raw === "") return DEFAULT_IMPORT_TIMEOUT_MS;
|
||||
const n = Number.parseInt(raw, 10);
|
||||
if (!Number.isFinite(n) || Number.isNaN(n) || n < 60_000 || n > 86_400_000) {
|
||||
console.error(
|
||||
`[memory-ingest] GSTACK_INGEST_TIMEOUT_MS="${raw}" invalid (need 60000–86400000ms); using ${DEFAULT_IMPORT_TIMEOUT_MS}ms`,
|
||||
);
|
||||
return DEFAULT_IMPORT_TIMEOUT_MS;
|
||||
}
|
||||
return n;
|
||||
}
|
||||
|
||||
function runGbrainImport(
|
||||
stagingDir: string,
|
||||
timeoutMs: number,
|
||||
): Promise<{ status: number | null; stdout: string; stderr: string }> {
|
||||
): Promise<{ status: number | null; stdout: string; stderr: string; timedOut: boolean }> {
|
||||
installSignalForwarder();
|
||||
return new Promise((resolve) => {
|
||||
// Seed DATABASE_URL from gbrain's own config so this stage works
|
||||
|
|
@ -1385,6 +1407,7 @@ function runGbrainImport(
|
|||
status: timedOut ? null : status,
|
||||
stdout,
|
||||
stderr,
|
||||
timedOut,
|
||||
});
|
||||
});
|
||||
child.on("error", (err) => {
|
||||
|
|
@ -1394,6 +1417,7 @@ function runGbrainImport(
|
|||
status: null,
|
||||
stdout,
|
||||
stderr: stderr + `\n[spawn-error] ${(err as Error).message}`,
|
||||
timedOut,
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -1608,13 +1632,33 @@ async function ingestPass(args: CliArgs): Promise<BulkResult> {
|
|||
// spawn, parent termination orphans the gbrain process (observed
|
||||
// during 2026-05-10 cold-run testing — gbrain kept running 15 min
|
||||
// after the orchestrator timed out).
|
||||
const importResult = await runGbrainImport(stagingDir, 30 * 60 * 1000);
|
||||
const importResult = await runGbrainImport(stagingDir, resolveImportTimeoutMs());
|
||||
|
||||
const stdout = importResult.stdout || "";
|
||||
const stderr = importResult.stderr || "";
|
||||
const importJson = parseImportJson(stdout);
|
||||
|
||||
if (importResult.status !== 0) {
|
||||
// #1611: on timeout, gbrain's import-checkpoint.json is preserved (the
|
||||
// SIGTERM forwarder keeps the staging dir), so the next /sync-gbrain
|
||||
// resumes rather than restarting. Tell the user instead of looking failed.
|
||||
if (importResult.timedOut) {
|
||||
const mins = Math.round(resolveImportTimeoutMs() / 60000);
|
||||
const msg =
|
||||
`gbrain import timed out after ${mins}min; checkpoint preserved — re-run ` +
|
||||
`/sync-gbrain to resume (raise GSTACK_INGEST_TIMEOUT_MS for big brains)`;
|
||||
console.error(`[memory-ingest] ${msg}`);
|
||||
return {
|
||||
written: 0,
|
||||
skipped_secret: prep.skippedSecret,
|
||||
skipped_dedup: prep.skippedDedup,
|
||||
skipped_unattributed: prep.skippedUnattributed,
|
||||
failed,
|
||||
duration_ms: Date.now() - t0,
|
||||
partial_pages: prep.partialPages,
|
||||
system_error: msg,
|
||||
};
|
||||
}
|
||||
const tail = (stderr.trim().split("\n").pop() || "").slice(0, 300);
|
||||
const msg = `gbrain import exited ${importResult.status}: ${tail}`;
|
||||
console.error(`[memory-ingest] ERR: ${msg}`);
|
||||
|
|
@ -1810,7 +1854,12 @@ async function main(): Promise<void> {
|
|||
if (result.system_error) process.exit(1);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error(`gstack-memory-ingest fatal: ${err instanceof Error ? err.message : String(err)}`);
|
||||
process.exit(1);
|
||||
});
|
||||
// Guard so the module is import-safe for unit tests (e.g. resolveImportTimeoutMs).
|
||||
// The orchestrator runs it as `bun gstack-memory-ingest.ts ...`, where
|
||||
// import.meta.main is true, so the CLI path is unaffected.
|
||||
if (import.meta.main) {
|
||||
main().catch((err) => {
|
||||
console.error(`gstack-memory-ingest fatal: ${err instanceof Error ? err.message : String(err)}`);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
|
|
|||
|
|
@ -0,0 +1,212 @@
|
|||
#!/usr/bin/env bun
|
||||
// gstack-version-bump — deterministic version-state classifier + writer for /ship.
|
||||
//
|
||||
// Extracted from ship Step 12 prose (v2 plan T9, hybrid CLI extraction). The
|
||||
// idempotency classification and the dual-write to VERSION + package.json are
|
||||
// pure deterministic logic; running them as tested code removes the single
|
||||
// worst /ship footgun — re-bumping an already-shipped branch — from prose the
|
||||
// agent could skip or misread when the step lives in a lazy-loaded section.
|
||||
//
|
||||
// What STAYS agent judgment (NOT here): the bump-LEVEL decision (micro/patch vs
|
||||
// minor/major, which may AskUserQuestion on feature signals) and the queue
|
||||
// collision prompt. The slot pick itself is bin/gstack-next-version. This CLI
|
||||
// only answers "what state am I in?" and "write this exact version".
|
||||
//
|
||||
// Subcommands:
|
||||
// classify --base <branch> [--version-path <p>]
|
||||
// Compares VERSION vs origin/<base>:VERSION vs package.json.version.
|
||||
// Emits JSON: { state, baseVersion, currentVersion, pkgVersion, pkgExists }
|
||||
// state ∈ FRESH | ALREADY_BUMPED | DRIFT_STALE_PKG | DRIFT_UNEXPECTED
|
||||
// Exit 0 on a decidable state (incl. DRIFT_UNEXPECTED — it's a real state
|
||||
// the caller must handle), exit 2 on bad args / unresolvable base.
|
||||
//
|
||||
// write --version <X.Y.Z.W> [--version-path <p>]
|
||||
// Validates the 4-digit pattern, writes VERSION + package.json.version.
|
||||
// Use for the FRESH bump (or an approved queue rebump). Exit 3 on a
|
||||
// half-write (VERSION written, package.json failed) so the caller knows
|
||||
// drift exists; the next classify() will report DRIFT_STALE_PKG.
|
||||
//
|
||||
// repair [--version-path <p>]
|
||||
// DRIFT_STALE_PKG path: sync package.json.version to the current VERSION
|
||||
// file. No bump. Validates the VERSION pattern first.
|
||||
//
|
||||
// Contract: classify NEVER writes. write/repair mutate VERSION + package.json
|
||||
// only. No git mutation, no network. Mirrors gstack-next-version's reader/writer
|
||||
// split so /ship composes them.
|
||||
|
||||
import { existsSync, readFileSync, writeFileSync } from "node:fs";
|
||||
import { execFileSync } from "node:child_process";
|
||||
import { join } from "node:path";
|
||||
|
||||
const VERSION_RE = /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/;
|
||||
const DEFAULT = "0.0.0.0";
|
||||
|
||||
type State = "FRESH" | "ALREADY_BUMPED" | "DRIFT_STALE_PKG" | "DRIFT_UNEXPECTED";
|
||||
|
||||
function fail(msg: string, code = 2): never {
|
||||
process.stderr.write(`gstack-version-bump: ${msg}\n`);
|
||||
process.exit(code);
|
||||
}
|
||||
|
||||
function argVal(args: string[], flag: string): string | undefined {
|
||||
const i = args.indexOf(flag);
|
||||
return i >= 0 && i + 1 < args.length ? args[i + 1] : undefined;
|
||||
}
|
||||
|
||||
/** Resolve the VERSION file path: --version-path, else .gstack/version-path, else "VERSION". */
|
||||
function resolveVersionPath(cwd: string, explicit?: string): string {
|
||||
if (explicit) return join(cwd, explicit);
|
||||
const pin = join(cwd, ".gstack", "version-path");
|
||||
if (existsSync(pin)) {
|
||||
const p = readFileSync(pin, "utf-8").trim();
|
||||
if (p) return join(cwd, p);
|
||||
}
|
||||
return join(cwd, "VERSION");
|
||||
}
|
||||
|
||||
function readVersionFile(p: string): string {
|
||||
try {
|
||||
const v = readFileSync(p, "utf-8").replace(/[\r\n\s]/g, "");
|
||||
return v || DEFAULT;
|
||||
} catch {
|
||||
return DEFAULT;
|
||||
}
|
||||
}
|
||||
|
||||
/** package.json version + existence, parsed without spawning node. */
|
||||
function readPkgVersion(cwd: string): { exists: boolean; version: string } {
|
||||
const pkgPath = join(cwd, "package.json");
|
||||
if (!existsSync(pkgPath)) return { exists: false, version: "" };
|
||||
let raw: string;
|
||||
try {
|
||||
raw = readFileSync(pkgPath, "utf-8");
|
||||
} catch {
|
||||
return { exists: true, version: "" };
|
||||
}
|
||||
let parsed: unknown;
|
||||
try {
|
||||
parsed = JSON.parse(raw);
|
||||
} catch {
|
||||
fail("package.json is not valid JSON. Fix the file before re-running /ship.", 2);
|
||||
}
|
||||
const version = (parsed as { version?: unknown })?.version;
|
||||
return { exists: true, version: typeof version === "string" ? version : "" };
|
||||
}
|
||||
|
||||
function writePkgVersion(cwd: string, version: string): void {
|
||||
const pkgPath = join(cwd, "package.json");
|
||||
const raw = readFileSync(pkgPath, "utf-8");
|
||||
const parsed = JSON.parse(raw) as Record<string, unknown>;
|
||||
parsed.version = version;
|
||||
writeFileSync(pkgPath, JSON.stringify(parsed, null, 2) + "\n");
|
||||
}
|
||||
|
||||
function baseVersion(cwd: string, base: string, versionRel: string): string {
|
||||
// Verify the base ref resolves, mirroring the Step 12 guard.
|
||||
try {
|
||||
execFileSync("git", ["rev-parse", "--verify", `origin/${base}`], { cwd, stdio: "ignore" });
|
||||
} catch {
|
||||
fail(`Unable to resolve origin/${base}. Run 'git fetch origin' or verify the base branch exists.`, 2);
|
||||
}
|
||||
try {
|
||||
const out = execFileSync("git", ["show", `origin/${base}:${versionRel}`], { cwd }).toString();
|
||||
const v = out.replace(/[\r\n\s]/g, "");
|
||||
return v || DEFAULT;
|
||||
} catch {
|
||||
// VERSION absent on base (new repo / new file) → treat as 0.0.0.0.
|
||||
return DEFAULT;
|
||||
}
|
||||
}
|
||||
|
||||
function classifyState(current: string, base: string, pkgExists: boolean, pkgVersion: string): State {
|
||||
if (current === base) {
|
||||
// VERSION unchanged vs base. A diverging package.json means someone hand-edited
|
||||
// package.json bypassing /ship — unsafe to guess which is authoritative.
|
||||
if (pkgExists && pkgVersion && pkgVersion !== current) return "DRIFT_UNEXPECTED";
|
||||
return "FRESH";
|
||||
}
|
||||
// VERSION already moved past base.
|
||||
if (pkgExists && pkgVersion && pkgVersion !== current) return "DRIFT_STALE_PKG";
|
||||
return "ALREADY_BUMPED";
|
||||
}
|
||||
|
||||
function cmdClassify(args: string[], cwd: string): void {
|
||||
const base = argVal(args, "--base");
|
||||
if (!base) fail("classify requires --base <branch>", 2);
|
||||
const versionPath = resolveVersionPath(cwd, argVal(args, "--version-path"));
|
||||
const versionRel = argVal(args, "--version-path") ?? "VERSION";
|
||||
const current = readVersionFile(versionPath);
|
||||
const baseV = baseVersion(cwd, base!, versionRel);
|
||||
const pkg = readPkgVersion(cwd);
|
||||
const state = classifyState(current, baseV, pkg.exists, pkg.version);
|
||||
process.stdout.write(
|
||||
JSON.stringify({
|
||||
state,
|
||||
baseVersion: baseV,
|
||||
currentVersion: current,
|
||||
pkgVersion: pkg.version || null,
|
||||
pkgExists: pkg.exists,
|
||||
}) + "\n",
|
||||
);
|
||||
// DRIFT_UNEXPECTED is a real, decidable state — the caller stops on it, but the
|
||||
// classification itself succeeded, so exit 0. (Bad args / unresolvable base are
|
||||
// the only exit-2 cases.)
|
||||
}
|
||||
|
||||
function cmdWrite(args: string[], cwd: string): void {
|
||||
const version = argVal(args, "--version");
|
||||
if (!version) fail("write requires --version <X.Y.Z.W>", 2);
|
||||
if (!VERSION_RE.test(version!)) {
|
||||
fail(`NEW_VERSION (${version}) does not match MAJOR.MINOR.PATCH.MICRO. Aborting.`, 2);
|
||||
}
|
||||
const versionPath = resolveVersionPath(cwd, argVal(args, "--version-path"));
|
||||
writeFileSync(versionPath, version + "\n");
|
||||
if (existsSync(join(cwd, "package.json"))) {
|
||||
try {
|
||||
writePkgVersion(cwd, version!);
|
||||
} catch {
|
||||
fail(
|
||||
"failed to update package.json. VERSION was written but package.json is now stale. " +
|
||||
"Re-run — classify will report DRIFT_STALE_PKG and repair will sync it.",
|
||||
3,
|
||||
);
|
||||
}
|
||||
}
|
||||
process.stdout.write(JSON.stringify({ wrote: version, packageJson: existsSync(join(cwd, "package.json")) }) + "\n");
|
||||
}
|
||||
|
||||
function cmdRepair(args: string[], cwd: string): void {
|
||||
const versionPath = resolveVersionPath(cwd, argVal(args, "--version-path"));
|
||||
const current = readVersionFile(versionPath);
|
||||
if (!VERSION_RE.test(current)) {
|
||||
fail(
|
||||
`VERSION file contents (${current}) do not match MAJOR.MINOR.PATCH.MICRO. ` +
|
||||
"Refusing to propagate invalid semver into package.json. Fix VERSION, then re-run /ship.",
|
||||
2,
|
||||
);
|
||||
}
|
||||
if (!existsSync(join(cwd, "package.json"))) {
|
||||
fail("repair: no package.json to sync.", 2);
|
||||
}
|
||||
try {
|
||||
writePkgVersion(cwd, current);
|
||||
} catch {
|
||||
fail("drift repair failed — could not update package.json.", 3);
|
||||
}
|
||||
process.stdout.write(JSON.stringify({ repaired: current }) + "\n");
|
||||
}
|
||||
|
||||
// Exported for unit tests (pure logic, no I/O).
|
||||
export { classifyState, VERSION_RE, type State };
|
||||
|
||||
if (import.meta.main) {
|
||||
const [sub, ...rest] = process.argv.slice(2);
|
||||
const cwd = process.cwd();
|
||||
switch (sub) {
|
||||
case "classify": cmdClassify(rest, cwd); break;
|
||||
case "write": cmdWrite(rest, cwd); break;
|
||||
case "repair": cmdRepair(rest, cwd); break;
|
||||
default:
|
||||
fail("usage: gstack-version-bump <classify|write|repair> [flags]", 2);
|
||||
}
|
||||
}
|
||||
|
|
@ -211,6 +211,86 @@ function cleanupLegacyState(): void {
|
|||
}
|
||||
}
|
||||
|
||||
// ─── Chromium profile lock helpers (#1781) ─────────────────────
|
||||
/** Profile dir used by headed/connect Chromium sessions. */
|
||||
function chromiumProfileDir(): string {
|
||||
return path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
|
||||
}
|
||||
|
||||
/** Remove Chromium SingletonLock/Socket/Cookie so a relaunch can acquire the
|
||||
* profile. Safe to call when absent. */
|
||||
function cleanChromiumProfileLocks(profileDir: string = chromiumProfileDir()): void {
|
||||
for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
|
||||
safeUnlinkQuiet(path.join(profileDir, lockFile));
|
||||
}
|
||||
}
|
||||
|
||||
/** Kill an orphaned Chromium that still holds the profile's SingletonLock. The
|
||||
* lock symlink target is "hostname-PID"; killing that PID tears down its
|
||||
* renderer tree so the next launch starts clean. No-op when absent/stale. */
|
||||
async function killOrphanChromium(profileDir: string = chromiumProfileDir()): Promise<void> {
|
||||
try {
|
||||
const lockTarget = fs.readlinkSync(path.join(profileDir, 'SingletonLock')); // "hostname-12345"
|
||||
const orphanPid = parseInt(lockTarget.split('-').pop() || '', 10);
|
||||
if (orphanPid && isProcessAlive(orphanPid)) {
|
||||
safeKill(orphanPid, 'SIGTERM');
|
||||
await new Promise(r => setTimeout(r, 1000));
|
||||
if (isProcessAlive(orphanPid)) {
|
||||
safeKill(orphanPid, 'SIGKILL');
|
||||
await new Promise(r => setTimeout(r, 500));
|
||||
}
|
||||
}
|
||||
} catch (err: any) {
|
||||
if (err?.code !== 'ENOENT' && err?.code !== 'EINVAL') throw err;
|
||||
}
|
||||
}
|
||||
|
||||
/** Bounded /health probe. Returns true if the server answers within `attempts`
|
||||
* tries spaced `backoffMs` apart — distinguishes a busy-but-alive daemon from a
|
||||
* dead one (#1781) so a slow server isn't killed and restarted into a crash-loop. */
|
||||
async function probeHealthWithBackoff(port: number, attempts = 3, backoffMs = 250): Promise<boolean> {
|
||||
for (let i = 0; i < attempts; i++) {
|
||||
if (await isServerHealthy(port)) return true;
|
||||
if (i < attempts - 1) await Bun.sleep(backoffMs);
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the env for an auto-restart after a crash. headed/proxy/configHash are
|
||||
* reapplied from THIS invocation OR the persisted server state, so a restart
|
||||
* triggered by a plain command (goto/status, no --headed flag) never silently
|
||||
* downgrades a headed session to headless (#1781). Pure + exported for tests.
|
||||
*/
|
||||
export function buildRestartEnv(
|
||||
globalFlags: GlobalFlags | null | undefined,
|
||||
oldState: ServerState | null,
|
||||
): Record<string, string> {
|
||||
const env: Record<string, string> = {};
|
||||
if (globalFlags?.proxyUrl) env.BROWSE_PROXY_URL = globalFlags.proxyUrl;
|
||||
if (globalFlags?.headed || oldState?.mode === 'headed') env.BROWSE_HEADED = '1';
|
||||
const configHash = globalFlags?.configHash || oldState?.configHash;
|
||||
if (configHash) env.BROWSE_CONFIG_HASH = configHash;
|
||||
return env;
|
||||
}
|
||||
|
||||
/** macOS only: pull the headed Chromium window to the user's current Space.
|
||||
* "Google Chrome for Testing" frequently opens behind the active window or on
|
||||
* another Space — the first thing users read as "I can't see the browser"
|
||||
* (#1781). Best-effort, fire-and-forget, never throws. The app name is a fixed
|
||||
* literal (no interpolation). */
|
||||
function raiseHeadedWindowMacOS(): void {
|
||||
if (process.platform !== 'darwin') return;
|
||||
try {
|
||||
nodeSpawn('osascript', ['-e', 'tell application "Google Chrome for Testing" to activate'], {
|
||||
stdio: 'ignore',
|
||||
detached: true,
|
||||
}).unref();
|
||||
} catch {
|
||||
// osascript missing or app not present — non-fatal
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Server Lifecycle ──────────────────────────────────────────
|
||||
async function startServer(extraEnv?: Record<string, string>): Promise<ServerState> {
|
||||
ensureStateDir(config);
|
||||
|
|
@ -219,6 +299,13 @@ async function startServer(extraEnv?: Record<string, string>): Promise<ServerSta
|
|||
safeUnlink(config.stateFile);
|
||||
safeUnlink(path.join(config.stateDir, 'browse-startup-error.log'));
|
||||
|
||||
// #1781: clear a stale Chromium profile lock (and kill the orphan still
|
||||
// holding it) before launch, so an auto-restart after an abrupt kill isn't
|
||||
// blocked by the previous Chromium's SingletonLock — the self-inflicted
|
||||
// crash-loop. Previously only the manual connect preamble did this.
|
||||
await killOrphanChromium();
|
||||
cleanChromiumProfileLocks();
|
||||
|
||||
// Allow the caller to opt out of the parent-process watchdog by setting
|
||||
// BROWSE_PARENT_PID=0 in the environment. Useful for CI, non-interactive
|
||||
// shells, and short-lived Bash invocations that need the server to outlive
|
||||
|
|
@ -486,26 +573,42 @@ async function sendCommand(state: ServerState, command: string, args: string[],
|
|||
}
|
||||
} catch (err: any) {
|
||||
if (err.name === 'AbortError') {
|
||||
console.error('[browse] Command timed out after 30s');
|
||||
// #1781: a 30s timeout on a heavy page usually means busy, not dead.
|
||||
// Don't kill a live server (that's what triggered the crash-loop) — report
|
||||
// and exit so the user can retry rather than losing their (headed) window.
|
||||
const ts = readState();
|
||||
const alive = ts?.pid ? isProcessAlive(ts.pid) : false;
|
||||
console.error(alive
|
||||
? '[browse] Command timed out after 30s (server still alive — busy, not restarting). Retry, or raise load.'
|
||||
: '[browse] Command timed out after 30s');
|
||||
process.exit(1);
|
||||
}
|
||||
// Connection error — server may have crashed
|
||||
// Connection error — server may have crashed, OR may just be busy.
|
||||
if (err.code === 'ECONNREFUSED' || err.code === 'ECONNRESET' || err.message?.includes('fetch failed')) {
|
||||
const oldState = readState();
|
||||
// #1781 busy-vs-dead: a single-threaded daemon under beacon/extension load
|
||||
// can briefly stop answering HTTP while still alive. Before declaring a
|
||||
// crash, if the process is alive give /health a bounded chance to recover
|
||||
// and just retry the command — never kill+restart a live-but-busy server.
|
||||
if (oldState?.pid && isProcessAlive(oldState.pid) && await probeHealthWithBackoff(oldState.port)) {
|
||||
if (retries >= 1) throw new Error('[browse] Server unresponsive after retry — aborting');
|
||||
console.error('[browse] Server was briefly unresponsive (busy); retrying command...');
|
||||
return sendCommand(oldState, command, args, retries + 1);
|
||||
}
|
||||
// Truly dead (or health never recovered) → restart.
|
||||
if (retries >= 1) throw new Error('[browse] Server crashed twice in a row — aborting');
|
||||
console.error('[browse] Server connection lost. Restarting...');
|
||||
// Kill the old server to avoid orphaned chromium processes
|
||||
const oldState = readState();
|
||||
if (oldState && oldState.pid) {
|
||||
await killServer(oldState.pid);
|
||||
}
|
||||
// Reapply --proxy / --headed flags from this invocation when restarting
|
||||
// after a crash. Without this, a proxied daemon that dies mid-command
|
||||
// would silently restart in default direct/headless mode and bypass
|
||||
// the SOCKS bridge.
|
||||
const restartEnv: Record<string, string> = {};
|
||||
if (_globalFlags?.proxyUrl) restartEnv.BROWSE_PROXY_URL = _globalFlags.proxyUrl;
|
||||
if (_globalFlags?.headed) restartEnv.BROWSE_HEADED = '1';
|
||||
if (_globalFlags?.configHash) restartEnv.BROWSE_CONFIG_HASH = _globalFlags.configHash;
|
||||
// startServer() now clears the Chromium SingletonLock + reaps the orphan,
|
||||
// so the relaunch isn't blocked by the dead Chromium's profile lock (#1781).
|
||||
//
|
||||
// Reapply --proxy / --headed when restarting. headed comes from THIS
|
||||
// invocation OR the persisted server mode, so a restart triggered by a
|
||||
// plain command (goto/status, no --headed) never silently downgrades a
|
||||
// headed session to headless (#1781). Same for proxy/configHash.
|
||||
const restartEnv = buildRestartEnv(_globalFlags, oldState);
|
||||
const newState = await startServer(Object.keys(restartEnv).length ? restartEnv : undefined);
|
||||
return sendCommand(newState, command, args, retries + 1);
|
||||
}
|
||||
|
|
@ -966,30 +1069,11 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
|||
}
|
||||
}
|
||||
|
||||
// Kill orphaned Chromium processes that may still hold the profile lock.
|
||||
// The server PID is the Bun process; Chromium is a child that can outlive it
|
||||
// if the server is killed abruptly (SIGKILL, crash, manual rm of state file).
|
||||
const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
|
||||
try {
|
||||
const singletonLock = path.join(profileDir, 'SingletonLock');
|
||||
const lockTarget = fs.readlinkSync(singletonLock); // e.g. "hostname-12345"
|
||||
const orphanPid = parseInt(lockTarget.split('-').pop() || '', 10);
|
||||
if (orphanPid && isProcessAlive(orphanPid)) {
|
||||
safeKill(orphanPid, 'SIGTERM');
|
||||
await new Promise(resolve => setTimeout(resolve, 1000));
|
||||
if (isProcessAlive(orphanPid)) {
|
||||
safeKill(orphanPid, 'SIGKILL');
|
||||
await new Promise(resolve => setTimeout(resolve, 500));
|
||||
}
|
||||
}
|
||||
} catch (err: any) {
|
||||
if (err?.code !== 'ENOENT' && err?.code !== 'EINVAL') throw err;
|
||||
}
|
||||
|
||||
// Clean up Chromium profile locks (can persist after crashes)
|
||||
for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
|
||||
safeUnlinkQuiet(path.join(profileDir, lockFile));
|
||||
}
|
||||
// Kill an orphaned Chromium still holding the profile lock (the Bun server
|
||||
// PID's Chromium child can outlive an abrupt kill/crash), then clear the
|
||||
// lock files so the launch is clean. Shared with the auto-restart path (#1781).
|
||||
await killOrphanChromium();
|
||||
cleanChromiumProfileLocks();
|
||||
|
||||
// Delete stale state file
|
||||
safeUnlinkQuiet(config.stateFile);
|
||||
|
|
@ -1027,6 +1111,11 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
|||
});
|
||||
const status = await resp.text();
|
||||
console.log(`Connected to real Chrome\n${status}`);
|
||||
// #1781: surface the window — it often opens behind/on another Space.
|
||||
raiseHeadedWindowMacOS();
|
||||
if (process.platform === 'darwin') {
|
||||
console.log('(If you still don\'t see it, check Mission Control / other Spaces.)');
|
||||
}
|
||||
|
||||
// sidebar-agent.ts spawn was here. Ripped alongside the chat queue —
|
||||
// the Terminal pane runs an interactive PTY now, no more one-shot
|
||||
|
|
@ -1194,11 +1283,11 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
|||
safeKill(existingState.pid, 'SIGKILL');
|
||||
}
|
||||
}
|
||||
// Clean profile locks and state file
|
||||
const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
|
||||
for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
|
||||
safeUnlinkQuiet(path.join(profileDir, lockFile));
|
||||
}
|
||||
// #1781: killing the daemon can orphan its Chromium child tree, which keeps
|
||||
// holding the SingletonLock and makes the next `connect` fail to launch.
|
||||
// Reap the orphan via the lock, then clear the lock files + state.
|
||||
await killOrphanChromium();
|
||||
cleanChromiumProfileLocks();
|
||||
// Xvfb orphan cleanup: if the recorded PID still matches our Xvfb (by
|
||||
// cmdline AND start-time), kill it. PID-only would risk killing a
|
||||
// recycled PID belonging to an unrelated process.
|
||||
|
|
@ -1258,6 +1347,11 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
|||
}
|
||||
|
||||
await sendCommand(state, command, commandArgs);
|
||||
|
||||
// #1781: `focus` means "show me the window". The server-side focus activates
|
||||
// the page via CDP, but on macOS the app can still sit on another Space — pull
|
||||
// it to the user's current Space too.
|
||||
if (command === 'focus') raiseHeadedWindowMacOS();
|
||||
}
|
||||
|
||||
if (import.meta.main) {
|
||||
|
|
|
|||
|
|
@ -0,0 +1,39 @@
|
|||
import { describe, test, expect } from "bun:test";
|
||||
import { buildRestartEnv } from "../src/cli";
|
||||
|
||||
// #1781: an auto-restart triggered by a plain command (no --headed flag) must
|
||||
// NOT silently downgrade a headed session to headless. buildRestartEnv reapplies
|
||||
// headed/proxy/configHash from this invocation OR the persisted server state.
|
||||
describe("buildRestartEnv (#1781 headed persistence)", () => {
|
||||
const headedState = { pid: 1, port: 9, token: "t", startedAt: "", serverPath: "", mode: "headed" as const };
|
||||
const launchedState = { pid: 1, port: 9, token: "t", startedAt: "", serverPath: "", mode: "launched" as const };
|
||||
|
||||
test("headed flag on this invocation → BROWSE_HEADED=1", () => {
|
||||
expect(buildRestartEnv({ headed: true } as any, null).BROWSE_HEADED).toBe("1");
|
||||
});
|
||||
|
||||
test("plain command + persisted headed state → still BROWSE_HEADED=1 (the regression)", () => {
|
||||
const env = buildRestartEnv({} as any, headedState as any);
|
||||
expect(env.BROWSE_HEADED).toBe("1");
|
||||
});
|
||||
|
||||
test("plain command + headless state → no BROWSE_HEADED (no spurious headed)", () => {
|
||||
const env = buildRestartEnv({} as any, launchedState as any);
|
||||
expect(env.BROWSE_HEADED).toBeUndefined();
|
||||
});
|
||||
|
||||
test("nothing set → empty env", () => {
|
||||
expect(buildRestartEnv(null, null)).toEqual({});
|
||||
});
|
||||
|
||||
test("proxy + configHash reapplied from flags", () => {
|
||||
const env = buildRestartEnv({ proxyUrl: "socks5://x", configHash: "abc" } as any, null);
|
||||
expect(env.BROWSE_PROXY_URL).toBe("socks5://x");
|
||||
expect(env.BROWSE_CONFIG_HASH).toBe("abc");
|
||||
});
|
||||
|
||||
test("configHash falls back to persisted state", () => {
|
||||
const env = buildRestartEnv({} as any, { ...launchedState, configHash: "fromstate" } as any);
|
||||
expect(env.BROWSE_CONFIG_HASH).toBe("fromstate");
|
||||
});
|
||||
});
|
||||
|
|
@ -2,7 +2,7 @@
|
|||
name: design-consultation
|
||||
preamble-tier: 3
|
||||
version: 1.0.0
|
||||
description: Design consultation: understands your product, researches the landscape, proposes a complete design system (aesthetic, typography, color, layout, spacing, motion), and generates font+color preview... (gstack)
|
||||
description: "Design consultation: understands your product, researches the landscape, proposes a complete design system (aesthetic, typography, color, layout, spacing, motion), and generates font+color preview... (gstack)"
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: design-html
|
||||
preamble-tier: 2
|
||||
version: 1.0.0
|
||||
description: Design finalization: generates production-quality Pretext-native HTML/CSS. (gstack)
|
||||
description: "Design finalization: generates production-quality Pretext-native HTML/CSS. (gstack)"
|
||||
triggers:
|
||||
- build the design
|
||||
- code the mockup
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: design-review
|
||||
preamble-tier: 4
|
||||
version: 2.0.0
|
||||
description: Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems, AI slop patterns, and slow interactions — then fixes them. (gstack)
|
||||
description: "Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems, AI slop patterns, and slow interactions — then fixes them. (gstack)"
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: design-shotgun
|
||||
preamble-tier: 2
|
||||
version: 1.0.0
|
||||
description: Design shotgun: generate multiple AI design variants, open a comparison board, collect structured feedback, and iterate. (gstack)
|
||||
description: "Design shotgun: generate multiple AI design variants, open a comparison board, collect structured feedback, and iterate. (gstack)"
|
||||
triggers:
|
||||
- explore design variants
|
||||
- show me design options
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
name: guard
|
||||
version: 0.1.0
|
||||
description: Full safety mode: destructive command warnings + directory-scoped edits. (gstack)
|
||||
description: "Full safety mode: destructive command warnings + directory-scoped edits. (gstack)"
|
||||
triggers:
|
||||
- full safety mode
|
||||
- guard against mistakes
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: ios-clean
|
||||
preamble-tier: 3
|
||||
version: 1.0.0
|
||||
description: Remove the DebugBridge SPM package and all #if DEBUG wiring from an iOS app. (gstack)
|
||||
description: "Remove the DebugBridge SPM package and all #if DEBUG wiring from an iOS app. (gstack)"
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: land
|
||||
preamble-tier: 4
|
||||
version: 1.0.0
|
||||
description: Land a PR through the right merge regime: pre-flight, CI wait, VERSION-drift check, pre-merge readiness gate, then merge via no-queue, (gstack)
|
||||
description: "Land a PR through the right merge regime: pre-flight, CI wait, VERSION-drift check, pre-merge readiness gate, then merge via no-queue, (gstack)"
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
|
|
|
|||
|
|
@ -137,6 +137,18 @@ export function buildGbrainEnv(opts: BuildGbrainEnvOptions = {}): NodeJS.Process
|
|||
return out;
|
||||
}
|
||||
|
||||
/**
|
||||
* Windows can't directly spawn the `gbrain` launcher (bun/npm install it as a
|
||||
* `gbrain.cmd`/`.ps1` shim) or a shebang script like the bash `gstack-brain-sync`
|
||||
* — `spawnSync`/`spawn` resolve those only through a shell's PATHEXT + interpreter
|
||||
* lookup. Without `shell: true` the child spawn fails ENOENT, which on the sync
|
||||
* orchestrator surfaced as "brain-sync exited undefined" (#1731). Gate on platform
|
||||
* so POSIX keeps the cheaper no-shell path. Exported so the static-grep tripwire
|
||||
* (test/gbrain-spawn-windows-shell.test.ts) can assert every gbrain/brain-sync
|
||||
* spawn carries it.
|
||||
*/
|
||||
export const NEEDS_SHELL_ON_WINDOWS = process.platform === "win32";
|
||||
|
||||
export interface SpawnGbrainOptions {
|
||||
/** Timeout in milliseconds. Defaults to 30s. */
|
||||
timeout?: number;
|
||||
|
|
@ -166,6 +178,7 @@ export function spawnGbrain(args: string[], opts: SpawnGbrainOptions = {}): Spaw
|
|||
cwd: opts.cwd,
|
||||
stdio: opts.stdio || ["ignore", "pipe", "pipe"],
|
||||
env: buildGbrainEnv({ baseEnv: opts.baseEnv, announce: opts.announce }),
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
}
|
||||
|
||||
|
|
@ -198,6 +211,7 @@ export function spawnGbrainAsync(
|
|||
stdio: opts.stdio || ["ignore", "pipe", "pipe"],
|
||||
cwd: opts.cwd,
|
||||
env: buildGbrainEnv({ baseEnv: opts.baseEnv, announce: false }),
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
}
|
||||
|
||||
|
|
@ -212,5 +226,6 @@ export function execGbrainText(args: string[], opts: SpawnGbrainOptions = {}): s
|
|||
cwd: opts.cwd,
|
||||
stdio: opts.stdio || ["ignore", "pipe", "pipe"],
|
||||
env: buildGbrainEnv({ baseEnv: opts.baseEnv, announce: opts.announce }),
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
}
|
||||
|
|
|
|||
|
|
@ -0,0 +1,266 @@
|
|||
/**
|
||||
* gbrain-guards — defense-in-depth against gbrain's destructive code paths (#1734).
|
||||
*
|
||||
* gbrain (the separate CLI gstack shells out to) can rm-rf a user's working tree
|
||||
* during an autopilot race (its own bug, upstream gbrain #1526). gstack can't fix
|
||||
* that, but it MUST stop treating gbrain's destructive subcommands as safe. These
|
||||
* guards gate the two ways the orchestrator can reach destruction:
|
||||
*
|
||||
* 1. `sources remove --confirm-destructive` → decideSourceRemove()
|
||||
* 2. `sync --strategy code` (can auto-reclone) → decideCodeSync()
|
||||
*
|
||||
* plus an autopilot-active check (detectAutopilot) that refuses to run destructive
|
||||
* ops concurrently with the daemon.
|
||||
*
|
||||
* Design notes grounded in the real gbrain 0.41.x surface:
|
||||
* - There is NO `--keep-storage` flag and NO structured capability command, and
|
||||
* subcommand `--help` is generic — so capability detection is best-effort and
|
||||
* defaults to "unsupported". When we can't protect a user-managed source's
|
||||
* files, we FAIL CLOSED (refuse the remove) rather than delete unprotected.
|
||||
* - The autopilot lock filename isn't documented and (gbrain #1226) ignores
|
||||
* GBRAIN_HOME, so the live `gbrain autopilot` process is the PRIMARY signal;
|
||||
* known lock paths under both the configured home and ~/.gbrain are secondary.
|
||||
* - We refuse only on an AFFIRMATIVE autopilot signal — inability to introspect
|
||||
* never blocks a normal sync (that would brick the tool).
|
||||
* - Path containment uses realpath so a symlink inside ~/.gbrain/clones can't
|
||||
* smuggle a delete out to a user repo.
|
||||
*
|
||||
* Pure decision functions; the orchestrator logs the reasons (observability).
|
||||
*/
|
||||
|
||||
import { spawnSync } from "child_process";
|
||||
import { existsSync, realpathSync } from "fs";
|
||||
import { homedir } from "os";
|
||||
import { join, resolve, sep } from "path";
|
||||
import { execGbrainJson, execGbrainText, NEEDS_SHELL_ON_WINDOWS } from "./gbrain-exec";
|
||||
import { parseSourcesList, type GbrainSourceRow } from "./gbrain-sources";
|
||||
|
||||
export function gbrainHome(env: NodeJS.ProcessEnv = process.env): string {
|
||||
return env.GBRAIN_HOME || join(homedir(), ".gbrain");
|
||||
}
|
||||
|
||||
/**
|
||||
* Directories gbrain owns and may delete safely. A source whose local_path
|
||||
* resolves inside one of these is gbrain-managed; outside = user-managed and
|
||||
* must be protected. Both the configured home and the default ~/.gbrain are
|
||||
* checked because gbrain #1226 shows home-resolution is inconsistent.
|
||||
*/
|
||||
function clonesDirs(env: NodeJS.ProcessEnv = process.env): string[] {
|
||||
return [...new Set([join(gbrainHome(env), "clones"), join(homedir(), ".gbrain", "clones")])];
|
||||
}
|
||||
|
||||
/** True if `p` resolves (symlinks + `..` collapsed) to a location inside `dir`. */
|
||||
export function isInside(p: string, dir: string): boolean {
|
||||
let rp: string;
|
||||
let rd: string;
|
||||
try { rp = realpathSync(p); } catch { rp = resolve(p); }
|
||||
try { rd = realpathSync(dir); } catch { rd = resolve(dir); }
|
||||
const base = rd.endsWith(sep) ? rd : rd + sep;
|
||||
return rp === rd || rp.startsWith(base);
|
||||
}
|
||||
|
||||
// ── Autopilot detection (E1: multi-signal, affirmative-only) ────────────────
|
||||
|
||||
export interface AutopilotStatus {
|
||||
active: boolean;
|
||||
/** Which signal fired (lock path or "process"), or null when inactive. */
|
||||
signal: string | null;
|
||||
}
|
||||
|
||||
export interface AutopilotProbe {
|
||||
/** Override the lock-path list (tests). */
|
||||
lockPaths?: string[];
|
||||
/** Override the live-process check (tests). */
|
||||
processRunning?: () => boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect a running gbrain autopilot. Refuse the caller's destructive op only on
|
||||
* an affirmative signal; absence of a confirmable mechanism returns inactive so
|
||||
* normal syncs are never bricked.
|
||||
*/
|
||||
export function detectAutopilot(
|
||||
env: NodeJS.ProcessEnv = process.env,
|
||||
probe: AutopilotProbe = {},
|
||||
): AutopilotStatus {
|
||||
// Secondary signal: known lock files. gbrain #1226 — the lock ignores
|
||||
// GBRAIN_HOME, so check both the configured home and the default ~/.gbrain.
|
||||
const lockPaths = probe.lockPaths ?? [
|
||||
join(gbrainHome(env), "autopilot.lock"),
|
||||
join(homedir(), ".gbrain", "autopilot.lock"),
|
||||
join(gbrainHome(env), "autopilot.pid"),
|
||||
join(homedir(), ".gbrain", "autopilot.pid"),
|
||||
];
|
||||
for (const lp of lockPaths) {
|
||||
if (existsSync(lp)) return { active: true, signal: `lock:${lp}` };
|
||||
}
|
||||
// Primary signal: a live `gbrain autopilot` process.
|
||||
const running = (probe.processRunning ?? defaultProcessRunning)();
|
||||
if (running) return { active: true, signal: "process:gbrain autopilot" };
|
||||
return { active: false, signal: null };
|
||||
}
|
||||
|
||||
function defaultProcessRunning(): boolean {
|
||||
// No reliable pgrep on Windows; rely on the lock-file signal there.
|
||||
if (process.platform === "win32") return false;
|
||||
const r = spawnSync("pgrep", ["-f", "gbrain autopilot"], { encoding: "utf-8", timeout: 3_000 });
|
||||
return r.status === 0 && (r.stdout || "").trim().length > 0;
|
||||
}
|
||||
|
||||
// ── Capability detection (E4 + Codex: per-process memo, no persistent cache) ─
|
||||
//
|
||||
// No structured capability command exists and subcommand --help is generic, so
|
||||
// --keep-storage support can't be probed reliably; default unsupported. Memoize
|
||||
// per process (keyed to the resolved gbrain identity) rather than persisting a
|
||||
// cross-run cache — Codex flagged stale persistent caches, and the probe is cheap.
|
||||
|
||||
let _keepStorageMemo: { key: string; value: boolean } | undefined;
|
||||
|
||||
function gbrainIdentity(env: NodeJS.ProcessEnv): string {
|
||||
const r = spawnSync("gbrain", ["--version"], {
|
||||
encoding: "utf-8",
|
||||
timeout: 3_000,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS,
|
||||
env,
|
||||
});
|
||||
return (r.stdout || "").trim() || "unknown";
|
||||
}
|
||||
|
||||
export function gbrainSupportsKeepStorage(env: NodeJS.ProcessEnv = process.env): boolean {
|
||||
const key = gbrainIdentity(env);
|
||||
if (_keepStorageMemo && _keepStorageMemo.key === key) return _keepStorageMemo.value;
|
||||
let value = false;
|
||||
for (const args of [["sources", "remove", "--help"], ["--help"]]) {
|
||||
try {
|
||||
if (/--keep-storage/.test(execGbrainText(args, { baseEnv: env, timeout: 5_000 }))) {
|
||||
value = true;
|
||||
break;
|
||||
}
|
||||
} catch {
|
||||
// generic/empty help or non-zero exit → treat as unsupported
|
||||
}
|
||||
}
|
||||
_keepStorageMemo = { key, value };
|
||||
return value;
|
||||
}
|
||||
|
||||
/** Test-only: reset the per-process capability memo. */
|
||||
export function _resetCapabilityMemo(): void {
|
||||
_keepStorageMemo = undefined;
|
||||
}
|
||||
|
||||
// ── Destructive-op decisions ────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Fetch + normalize the source list. Throws on read/parse failure so callers can
|
||||
* distinguish "couldn't read" (fail closed) from "empty list" (source absent).
|
||||
* Injectable for hermetic tests.
|
||||
*/
|
||||
export function fetchSources(env: NodeJS.ProcessEnv = process.env): GbrainSourceRow[] {
|
||||
const raw = execGbrainJson(["sources", "list", "--json"], { baseEnv: env });
|
||||
if (raw === null) throw new Error("gbrain sources list returned no JSON");
|
||||
return parseSourcesList(raw);
|
||||
}
|
||||
|
||||
export interface RemoveDecision {
|
||||
allow: boolean;
|
||||
/** Extra args to append to `sources remove` (e.g. --keep-storage). */
|
||||
extraArgs: string[];
|
||||
reason: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Decide whether `sources remove <id>` is safe, and with what flags.
|
||||
*
|
||||
* Fail-closed cases (allow=false):
|
||||
* - sources list unreadable/unparseable (can't prove the row is safe).
|
||||
* - the row is user-managed (remote_url set AND local_path outside gbrain's
|
||||
* clones) and gbrain has no --keep-storage to protect the files.
|
||||
*
|
||||
* Allowed: absent row (no-op), gbrain-managed (inside clones), or path-managed
|
||||
* without a remote_url (gbrain's remove won't touch an outside-clones path that
|
||||
* it didn't clone). --keep-storage is appended whenever supported, as extra armor.
|
||||
*/
|
||||
export interface DecideRemoveOpts {
|
||||
/** Override capability detection (tests / cached caps). */
|
||||
keepStorage?: boolean;
|
||||
/** Override the source-list fetch (tests). Throwing simulates a read failure. */
|
||||
fetchRows?: (env: NodeJS.ProcessEnv) => GbrainSourceRow[];
|
||||
}
|
||||
|
||||
export function decideSourceRemove(
|
||||
sourceId: string,
|
||||
env: NodeJS.ProcessEnv = process.env,
|
||||
opts: DecideRemoveOpts = {},
|
||||
): RemoveDecision {
|
||||
const keepStorage = opts.keepStorage ?? gbrainSupportsKeepStorage(env);
|
||||
const extra = keepStorage ? ["--keep-storage"] : [];
|
||||
|
||||
let rows: GbrainSourceRow[];
|
||||
try {
|
||||
rows = (opts.fetchRows ?? fetchSources)(env);
|
||||
} catch {
|
||||
return { allow: false, extraArgs: [], reason: "could not read sources list; refusing remove (fail closed)" };
|
||||
}
|
||||
|
||||
const row = rows.find((r) => r.id === sourceId);
|
||||
if (!row) return { allow: true, extraArgs: extra, reason: "source absent (no-op)" };
|
||||
|
||||
const remoteUrl = row.config?.remote_url;
|
||||
const userManaged =
|
||||
!!remoteUrl && !!row.local_path && !clonesDirs(env).some((d) => isInside(row.local_path!, d));
|
||||
|
||||
if (userManaged) {
|
||||
if (keepStorage) {
|
||||
return { allow: true, extraArgs: ["--keep-storage"], reason: "user-managed; --keep-storage protects files" };
|
||||
}
|
||||
return {
|
||||
allow: false,
|
||||
extraArgs: [],
|
||||
reason:
|
||||
`refusing remove of user-managed source "${sourceId}" (remote_url set, local_path ` +
|
||||
`${row.local_path} outside gbrain clones) — this gbrain has no --keep-storage to ` +
|
||||
`protect the working tree. Upgrade gbrain or remove the source manually.`,
|
||||
};
|
||||
}
|
||||
|
||||
return { allow: true, extraArgs: extra, reason: "gbrain-managed or path-managed without remote_url" };
|
||||
}
|
||||
|
||||
export interface SyncDecision {
|
||||
allow: boolean;
|
||||
reason: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Decide whether `sync --strategy code --source <id>` is safe to run.
|
||||
*
|
||||
* A source with a remote_url can trigger gbrain's auto-reclone, the ungated
|
||||
* rm-rf path behind the data loss (gbrain #1526). Require an explicit
|
||||
* --allow-reclone opt-in for URL-managed sources. Read failure here is NOT
|
||||
* itself destructive, so it fails open (proceed) — the autopilot guard, checked
|
||||
* first, is the primary protection against the race that caused the loss.
|
||||
*/
|
||||
export function decideCodeSync(
|
||||
sourceId: string,
|
||||
env: NodeJS.ProcessEnv = process.env,
|
||||
allowReclone = false,
|
||||
fetchRows: (env: NodeJS.ProcessEnv) => GbrainSourceRow[] = fetchSources,
|
||||
): SyncDecision {
|
||||
let rows: GbrainSourceRow[];
|
||||
try {
|
||||
rows = fetchRows(env);
|
||||
} catch {
|
||||
return { allow: true, reason: "sources unreadable; proceeding (sync read is non-destructive)" };
|
||||
}
|
||||
const row = rows.find((r) => r.id === sourceId);
|
||||
if (row?.config?.remote_url && !allowReclone) {
|
||||
return {
|
||||
allow: false,
|
||||
reason:
|
||||
`source "${sourceId}" is URL-managed (remote_url set); sync may auto-reclone and ` +
|
||||
`delete the working tree. Re-run /sync-gbrain with --allow-reclone to proceed.`,
|
||||
};
|
||||
}
|
||||
return { allow: true, reason: "no remote_url, or reclone explicitly allowed" };
|
||||
}
|
||||
|
|
@ -35,7 +35,7 @@ import {
|
|||
} from "fs";
|
||||
import { homedir } from "os";
|
||||
import { dirname, join } from "path";
|
||||
import { buildGbrainEnv } from "./gbrain-exec";
|
||||
import { buildGbrainEnv, NEEDS_SHELL_ON_WINDOWS } from "./gbrain-exec";
|
||||
|
||||
export type LocalEngineStatus =
|
||||
| "ok"
|
||||
|
|
@ -113,6 +113,7 @@ export function resolveGbrainBin(env?: NodeJS.ProcessEnv): string | null {
|
|||
timeout: 2_000,
|
||||
stdio: ["ignore", "ignore", "ignore"],
|
||||
env: e,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
result = "gbrain";
|
||||
} catch {
|
||||
|
|
@ -135,6 +136,7 @@ export function readGbrainVersion(env?: NodeJS.ProcessEnv): string {
|
|||
timeout: 2_000,
|
||||
stdio: ["ignore", "pipe", "ignore"],
|
||||
env: e,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
result = out.trim().split("\n")[0] || "";
|
||||
} catch {
|
||||
|
|
@ -241,6 +243,7 @@ function freshClassify(env?: NodeJS.ProcessEnv): LocalEngineStatus {
|
|||
timeout: PROBE_TIMEOUT_MS,
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
env: buildGbrainEnv({ baseEnv: env ?? process.env }),
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
return "ok";
|
||||
} catch (err) {
|
||||
|
|
|
|||
|
|
@ -11,6 +11,7 @@
|
|||
|
||||
import { execFileSync, spawnSync } from "child_process";
|
||||
import { withErrorContext } from "./gstack-memory-helpers";
|
||||
import { NEEDS_SHELL_ON_WINDOWS } from "./gbrain-exec";
|
||||
|
||||
export interface SourceState {
|
||||
/** "absent" — id not registered. "match" — id at expected path. "drift" — id at different path. */
|
||||
|
|
@ -26,6 +27,37 @@ export interface EnsureResult {
|
|||
state: SourceState;
|
||||
}
|
||||
|
||||
/**
|
||||
* One row of `gbrain sources list --json`. `config.remote_url` distinguishes
|
||||
* URL-managed sources (gbrain owns the clone, may auto-reclone) from
|
||||
* path-managed ones (user owns the working tree) — load-bearing for the #1734
|
||||
* destructive-op guards.
|
||||
*/
|
||||
export interface GbrainSourceRow {
|
||||
id?: string;
|
||||
local_path?: string;
|
||||
page_count?: number;
|
||||
config?: { remote_url?: string | null } | null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Normalize `gbrain sources list --json` output to an array of source rows.
|
||||
*
|
||||
* gbrain has shipped two shapes: a wrapped `{ sources: [...] }` object (v0.20+)
|
||||
* and, in older/other variants, a bare top-level array. #1576 was a crash when a
|
||||
* reader assumed one shape; the parse is centralized here so every reader
|
||||
* (probeSource, sourcePageCount, sourceLocalPath, the #1734 remote_url audit)
|
||||
* agrees on the shape in ONE place. Returns [] for null/garbage rather than
|
||||
* throwing — callers treat "no rows" as absent.
|
||||
*/
|
||||
export function parseSourcesList(raw: unknown): GbrainSourceRow[] {
|
||||
if (Array.isArray(raw)) return raw as GbrainSourceRow[];
|
||||
if (raw && typeof raw === "object" && Array.isArray((raw as { sources?: unknown }).sources)) {
|
||||
return (raw as { sources: GbrainSourceRow[] }).sources;
|
||||
}
|
||||
return [];
|
||||
}
|
||||
|
||||
export interface EnsureOptions {
|
||||
/** Pass --federated to `gbrain sources add`. Default false. */
|
||||
federated?: boolean;
|
||||
|
|
@ -56,6 +88,7 @@ export function probeSource(id: string, env?: NodeJS.ProcessEnv): SourceState {
|
|||
timeout: 30_000,
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
env,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
} catch (err) {
|
||||
const e = err as NodeJS.ErrnoException & { stderr?: Buffer };
|
||||
|
|
@ -69,14 +102,14 @@ export function probeSource(id: string, env?: NodeJS.ProcessEnv): SourceState {
|
|||
throw err;
|
||||
}
|
||||
|
||||
let parsed: { sources?: Array<{ id?: string; local_path?: string }> };
|
||||
let parsed: unknown;
|
||||
try {
|
||||
parsed = JSON.parse(stdout);
|
||||
} catch (err) {
|
||||
throw new Error(`gbrain sources list returned non-JSON output: ${(err as Error).message}`);
|
||||
}
|
||||
|
||||
const sources = parsed.sources || [];
|
||||
const sources = parseSourcesList(parsed);
|
||||
const match = sources.find((s) => s.id === id);
|
||||
if (!match) return { status: "absent" };
|
||||
return {
|
||||
|
|
@ -129,6 +162,7 @@ export async function ensureSourceRegistered(
|
|||
encoding: "utf-8",
|
||||
timeout: 30_000,
|
||||
env,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
if (rm.status !== 0) {
|
||||
throw new Error(`gbrain sources remove ${id} failed: ${rm.stderr || rm.stdout || `exit ${rm.status}`}`);
|
||||
|
|
@ -142,6 +176,7 @@ export async function ensureSourceRegistered(
|
|||
encoding: "utf-8",
|
||||
timeout: 30_000,
|
||||
env,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
if (add.status !== 0) {
|
||||
throw new Error(`gbrain sources add ${id} failed: ${add.stderr || add.stdout || `exit ${add.status}`}`);
|
||||
|
|
@ -167,14 +202,14 @@ export function sourcePageCount(id: string, env?: NodeJS.ProcessEnv): number | n
|
|||
timeout: 30_000,
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
env,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS, // #1731: gbrain is a .cmd shim on Windows
|
||||
});
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
|
||||
try {
|
||||
const parsed = JSON.parse(stdout) as { sources?: Array<{ id?: string; page_count?: number }> };
|
||||
const match = (parsed.sources || []).find((s) => s.id === id);
|
||||
const match = parseSourcesList(JSON.parse(stdout)).find((s) => s.id === id);
|
||||
if (!match) return null;
|
||||
if (typeof match.page_count !== "number") return null;
|
||||
return match.page_count;
|
||||
|
|
|
|||
|
|
@ -14,6 +14,7 @@
|
|||
"dev:make-pdf": "bun run make-pdf/src/cli.ts",
|
||||
"dev:design": "bun run design/src/cli.ts",
|
||||
"gen:skill-docs": "bun run scripts/gen-skill-docs.ts",
|
||||
"gen:skill-docs:user": "bun run scripts/gen-skill-docs.ts --respect-detection",
|
||||
"dev": "bun run browse/src/cli.ts",
|
||||
"server": "bun run browse/src/server.ts",
|
||||
"test": "bun test browse/test/ test/ make-pdf/test/ --ignore 'test/skill-e2e-*.test.ts' --ignore test/skill-llm-eval.test.ts --ignore test/skill-routing-e2e.test.ts --ignore test/codex-e2e.test.ts --ignore test/gemini-e2e.test.ts && (bun run slop:diff 2>/dev/null || true)",
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: plan-tune
|
||||
preamble-tier: 2
|
||||
version: 1.0.0
|
||||
description: Self-tuning question sensitivity + developer psychographic for gstack (v1: observational). (gstack)
|
||||
description: "Self-tuning question sensitivity + developer psychographic for gstack (v1: observational). (gstack)"
|
||||
triggers:
|
||||
- tune questions
|
||||
- stop asking me that
|
||||
|
|
|
|||
|
|
@ -26,6 +26,34 @@ export function discoverTemplates(root: string): Array<{ tmpl: string; output: s
|
|||
return results;
|
||||
}
|
||||
|
||||
/**
|
||||
* Discover on-demand section templates: `<skill>/sections/*.md.tmpl`.
|
||||
*
|
||||
* Returns the relative tmpl path, its generated output path (`.tmpl` stripped),
|
||||
* and the owning skill directory so the generator can build a TemplateContext
|
||||
* with the PARENT skill's name (not "sections") — see processSectionTemplate.
|
||||
*
|
||||
* Scans one level of subdirs (same depth as discoverTemplates), looking only
|
||||
* inside a `sections/` child. Skills without a sections/ dir contribute nothing,
|
||||
* so this is a no-op for every skill that hasn't been carved.
|
||||
*/
|
||||
export function discoverSectionTemplates(
|
||||
root: string,
|
||||
): Array<{ tmpl: string; output: string; skillDir: string }> {
|
||||
const results: Array<{ tmpl: string; output: string; skillDir: string }> = [];
|
||||
for (const dir of subdirs(root)) {
|
||||
const sectionsDir = path.join(root, dir, 'sections');
|
||||
if (!fs.existsSync(sectionsDir) || !fs.statSync(sectionsDir).isDirectory()) continue;
|
||||
for (const entry of fs.readdirSync(sectionsDir, { withFileTypes: true })) {
|
||||
if (!entry.isFile() || !entry.name.endsWith('.md.tmpl')) continue;
|
||||
const rel = `${dir}/sections/${entry.name}`;
|
||||
results.push({ tmpl: rel, output: rel.replace(/\.tmpl$/, ''), skillDir: dir });
|
||||
}
|
||||
}
|
||||
// Deterministic order so CI freshness checks don't flap on FS iteration order.
|
||||
return results.sort((a, b) => a.tmpl.localeCompare(b.tmpl));
|
||||
}
|
||||
|
||||
export function discoverSkillFiles(root: string): string[] {
|
||||
const dirs = ['', ...subdirs(root)];
|
||||
const results: string[] = [];
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@
|
|||
|
||||
import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
|
||||
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
|
||||
import { discoverTemplates } from './discover-skills';
|
||||
import { discoverTemplates, discoverSectionTemplates } from './discover-skills';
|
||||
import { writeLlmsTxt } from './gen-llms-txt';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
|
@ -356,6 +356,28 @@ export function buildWhenToInvokeSection(parts: CatalogParts): string {
|
|||
return lines.join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* Render a string as a YAML inline scalar value (the text after `key: `),
|
||||
* quoting only when a plain scalar would be invalid or ambiguous.
|
||||
*
|
||||
* The bug this guards (#1778): a description like "Ship workflow: detect..."
|
||||
* emitted as a plain scalar has an interior ": " that a strict YAML parser
|
||||
* (Codex/OpenAI skill loading) reads as a nested mapping and rejects with
|
||||
* "mapping values are not allowed in this context". When quoting is needed we
|
||||
* fall back to JSON.stringify, which produces a double-quoted scalar that YAML
|
||||
* accepts verbatim (YAML is a superset of JSON for flow scalars). Strings that
|
||||
* are already valid plain scalars pass through unchanged to keep regen diffs small.
|
||||
*/
|
||||
export function toYamlInlineScalar(s: string): string {
|
||||
const needsQuote =
|
||||
s.length === 0 ||
|
||||
s !== s.trim() || // leading/trailing whitespace
|
||||
/:(\s|$)/.test(s) || // "foo: bar" / trailing colon → mapping ambiguity
|
||||
/\s#/.test(s) || // " #" → inline comment
|
||||
/^[\s>|&*!%@`"'#,\[\]{}?-]/.test(s); // leading YAML indicator char
|
||||
return needsQuote ? JSON.stringify(s) : s;
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply catalog trim to a SKILL.md body:
|
||||
* - shorten frontmatter `description:` to lead + (gstack)
|
||||
|
|
@ -397,8 +419,16 @@ export function applyCatalogTrim(content: string, skillName: string): { content:
|
|||
|
||||
// Replace description in frontmatter — keep trailing newline so the next
|
||||
// YAML field doesn't collide on the same line as the description value.
|
||||
// Quote the value when it would be an invalid YAML plain scalar (the common
|
||||
// case: an interior ": " like "Ship workflow: detect..." which a strict YAML
|
||||
// parser reads as a nested mapping and rejects — #1778). toYamlInlineScalar
|
||||
// only quotes when needed, so descriptions without special chars stay plain.
|
||||
const newDesc = buildTrimmedDescription(parts);
|
||||
const newFrontmatter = frontmatter.replace(descMatch[0], `description: ${newDesc}\n`);
|
||||
// Function replacer (not a string) so a `$` in the description — e.g. a future
|
||||
// skill referencing `$B`/`$D` — can't be interpreted as a `$&`/`$1` replacement
|
||||
// pattern and silently corrupt the frontmatter.
|
||||
const newDescLine = `description: ${toYamlInlineScalar(newDesc)}\n`;
|
||||
const newFrontmatter = frontmatter.replace(descMatch[0], () => newDescLine);
|
||||
let newContent = '---\n' + newFrontmatter + content.slice(fmEnd);
|
||||
|
||||
// Insert body section after frontmatter (after the closing ---\n and any
|
||||
|
|
@ -574,6 +604,102 @@ function extractHookSafetyProse(tmplContent: string): string | null {
|
|||
|
||||
const GENERATED_HEADER = `<!-- AUTO-GENERATED from {{SOURCE}} — do not edit directly -->\n<!-- Regenerate: bun run gen:skill-docs -->\n`;
|
||||
|
||||
/**
|
||||
* Apply a host's configured path + tool rewrites. Extracted so both SKILL.md
|
||||
* (via processExternalHost) and section files (via processSectionTemplate) get
|
||||
* identical per-host treatment — a section's cross-references must rewrite the
|
||||
* same way the parent skill's do, or external hosts get wrong paths.
|
||||
*/
|
||||
function applyHostRewrites(content: string, hostConfig: HostConfig): string {
|
||||
let result = content;
|
||||
for (const rewrite of hostConfig.pathRewrites) {
|
||||
result = result.replaceAll(rewrite.from, rewrite.to);
|
||||
}
|
||||
if (hostConfig.toolRewrites) {
|
||||
for (const [from, to] of Object.entries(hostConfig.toolRewrites)) {
|
||||
result = result.replaceAll(from, to);
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve {{PLACEHOLDER}} / {{NAME:arg}} tokens against the RESOLVERS registry,
|
||||
* honoring host suppression and appliesTo gating, then assert nothing is left
|
||||
* unresolved. Extracted so SKILL.md and section templates resolve through the
|
||||
* exact same path — a security/sanitization fix to one can't miss the other.
|
||||
*/
|
||||
function resolvePlaceholders(
|
||||
tmplContent: string,
|
||||
ctx: TemplateContext,
|
||||
hostConfig: HostConfig,
|
||||
relTmplPath: string,
|
||||
): string {
|
||||
// effectiveSuppressedResolvers() honors --respect-detection: when gbrain is
|
||||
// detected locally, GBRAIN_* resolvers un-suppress. Shared by SKILL.md and
|
||||
// section generation so both paths get the same gbrain-aware behavior.
|
||||
const suppressed = effectiveSuppressedResolvers(hostConfig);
|
||||
const onePass = (input: string): string =>
|
||||
input.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (_match, fullKey) => {
|
||||
const parts = fullKey.split(':');
|
||||
const resolverName = parts[0];
|
||||
const args = parts.slice(1);
|
||||
if (suppressed.has(resolverName)) return '';
|
||||
const entry = RESOLVERS[resolverName];
|
||||
if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
|
||||
const { resolve, appliesTo } = unwrapResolver(entry);
|
||||
if (appliesTo && !appliesTo(ctx)) return '';
|
||||
return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
|
||||
});
|
||||
|
||||
// Multi-pass: a resolver may emit content that itself contains {{TOKENS}} — the
|
||||
// {{SECTION:id}} resolver inlines a section template (with its own resolvers)
|
||||
// for non-Claude hosts. .replace() doesn't re-scan inserted text, so loop until
|
||||
// the output stabilizes. Bounded to avoid an infinite loop if a resolver ever
|
||||
// emits its own placeholder; 6 passes is far more nesting than any skill needs.
|
||||
let content = tmplContent;
|
||||
for (let pass = 0; pass < 6; pass++) {
|
||||
const next = onePass(content);
|
||||
if (next === content) break;
|
||||
content = next;
|
||||
}
|
||||
|
||||
const remaining = content.match(/\{\{(\w+(?::[^}]+)?)\}\}/g);
|
||||
if (remaining) {
|
||||
throw new Error(`Unresolved placeholders in ${relTmplPath}: ${remaining.join(', ')}`);
|
||||
}
|
||||
return content;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the TemplateContext from a template's frontmatter. Shared by SKILL.md
|
||||
* and section generation so sections inherit the SAME context the parent skill
|
||||
* resolves with (skillName, tier, benefitsFrom, interactive) — enforced by
|
||||
* test/template-context-parity.test.ts. skillNameOverride lets section
|
||||
* generation pin the parent skill's name instead of deriving "sections".
|
||||
*/
|
||||
function buildContext(
|
||||
tmplContent: string,
|
||||
tmplPath: string,
|
||||
host: Host,
|
||||
skillNameOverride?: string,
|
||||
): TemplateContext {
|
||||
const { name: extractedName } = extractNameAndDescription(tmplContent);
|
||||
const skillName = skillNameOverride || extractedName || path.basename(path.dirname(tmplPath));
|
||||
const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
|
||||
const benefitsFrom = benefitsMatch
|
||||
? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
|
||||
: undefined;
|
||||
const tierMatch = tmplContent.match(/^preamble-tier:\s*(\d+)$/m);
|
||||
const preambleTier = tierMatch ? parseInt(tierMatch[1], 10) : undefined;
|
||||
const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
|
||||
const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
|
||||
return {
|
||||
skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host],
|
||||
preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Process external host output: routing, frontmatter, path rewrites, metadata.
|
||||
* Shared between Codex and Factory (and future external hosts).
|
||||
|
|
@ -619,17 +745,9 @@ function processExternalHost(
|
|||
result = result.slice(0, bodyStart) + '\n' + safetyProse + '\n' + result.slice(bodyStart);
|
||||
}
|
||||
|
||||
// Config-driven path rewrites (order matters, replaceAll)
|
||||
for (const rewrite of hostConfig.pathRewrites) {
|
||||
result = result.replaceAll(rewrite.from, rewrite.to);
|
||||
}
|
||||
|
||||
// Config-driven tool rewrites
|
||||
if (hostConfig.toolRewrites) {
|
||||
for (const [from, to] of Object.entries(hostConfig.toolRewrites)) {
|
||||
result = result.replaceAll(from, to);
|
||||
}
|
||||
}
|
||||
// Config-driven path + tool rewrites (shared with processSectionTemplate so
|
||||
// section cross-references get the same per-host treatment as SKILL.md).
|
||||
result = applyHostRewrites(result, hostConfig);
|
||||
|
||||
// Config-driven: generate metadata (e.g., openai.yaml for Codex)
|
||||
if (hostConfig.generation.generateMetadata && !symlinkLoop) {
|
||||
|
|
@ -650,53 +768,18 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
|
|||
// Determine skill directory relative to ROOT
|
||||
const skillDir = path.relative(ROOT, path.dirname(tmplPath));
|
||||
|
||||
// Extract skill name from frontmatter early — needed for both TemplateContext and external host output paths.
|
||||
// When frontmatter name: differs from directory name (e.g., run-tests/ with name: test),
|
||||
// the frontmatter name is used for external skill naming and setup script symlinks.
|
||||
// Extract name/description: name drives external skill naming + setup symlinks
|
||||
// (and TemplateContext.skillName via buildContext); description feeds external
|
||||
// host metadata. When frontmatter name: differs from directory name (e.g.
|
||||
// run-tests/ with name: test), the frontmatter name wins.
|
||||
const { name: extractedName, description: extractedDescription } = extractNameAndDescription(tmplContent);
|
||||
const skillName = extractedName || path.basename(path.dirname(tmplPath));
|
||||
|
||||
|
||||
// Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
|
||||
const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
|
||||
const benefitsFrom = benefitsMatch
|
||||
? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
|
||||
: undefined;
|
||||
|
||||
// Extract preamble-tier from frontmatter (1-4, controls which preamble sections are included)
|
||||
const tierMatch = tmplContent.match(/^preamble-tier:\s*(\d+)$/m);
|
||||
const preambleTier = tierMatch ? parseInt(tierMatch[1], 10) : undefined;
|
||||
|
||||
// Extract interactive flag from frontmatter (generator-only; controls plan-mode handshake inclusion)
|
||||
const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
|
||||
const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
|
||||
|
||||
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host], preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL };
|
||||
|
||||
// Replace placeholders (supports parameterized: {{NAME:arg1:arg2}})
|
||||
// Config-driven: suppressedResolvers return empty string for this host.
|
||||
// effectiveSuppressedResolvers() honors --respect-detection: when gbrain
|
||||
// is detected locally, GBRAIN_* resolvers un-suppress so brain-aware
|
||||
// blocks render for users who have gbrain installed.
|
||||
const currentHostConfig = getHostConfig(host);
|
||||
const suppressed = effectiveSuppressedResolvers(currentHostConfig);
|
||||
let content = tmplContent.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (match, fullKey) => {
|
||||
const parts = fullKey.split(':');
|
||||
const resolverName = parts[0];
|
||||
const args = parts.slice(1);
|
||||
if (suppressed.has(resolverName)) return '';
|
||||
const entry = RESOLVERS[resolverName];
|
||||
if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
|
||||
const { resolve, appliesTo } = unwrapResolver(entry);
|
||||
if (appliesTo && !appliesTo(ctx)) return '';
|
||||
return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
|
||||
});
|
||||
const ctx = buildContext(tmplContent, tmplPath, host);
|
||||
const skillName = ctx.skillName;
|
||||
|
||||
// Check for any remaining unresolved placeholders
|
||||
const remaining = content.match(/\{\{(\w+(?::[^}]+)?)\}\}/g);
|
||||
if (remaining) {
|
||||
throw new Error(`Unresolved placeholders in ${relTmplPath}: ${remaining.join(', ')}`);
|
||||
}
|
||||
// Replace placeholders + assert none remain (shared path with section generation).
|
||||
let content = resolvePlaceholders(tmplContent, ctx, currentHostConfig, relTmplPath);
|
||||
|
||||
// Preprocess voice triggers: fold into description, strip field from frontmatter.
|
||||
// Must run BEFORE transformFrontmatter so all hosts see the updated description,
|
||||
|
|
@ -742,6 +825,58 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
|
|||
return { outputPath, content, symlinkLoop, catalogParts };
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate one on-demand section file (`<skill>/sections/<name>.md.tmpl` →
|
||||
* `<name>.md`). Sections are BODY FRAGMENTS — no frontmatter, no catalog trim,
|
||||
* no voice triggers. They resolve placeholders through the SAME path as
|
||||
* SKILL.md (resolvePlaceholders) using the PARENT skill's TemplateContext
|
||||
* (so appliesTo gating + tier behave identically — a section's {{PREAMBLE}}-
|
||||
* style resolver renders the same content it would in the parent, not empty).
|
||||
*
|
||||
* Output routing mirrors SKILL.md: Claude writes in-tree at
|
||||
* `<skill>/sections/<name>.md`; external hosts write to
|
||||
* `<hostSubdir>/skills/<externalName>/sections/<name>.md`. External hosts get
|
||||
* applyHostRewrites so cross-references resolve per host.
|
||||
*/
|
||||
function processSectionTemplate(
|
||||
sectionTmplPath: string,
|
||||
skillDir: string,
|
||||
host: Host = 'claude',
|
||||
): { outputPath: string; content: string } {
|
||||
const tmplContent = fs.readFileSync(sectionTmplPath, 'utf-8');
|
||||
const relTmplPath = path.relative(ROOT, sectionTmplPath);
|
||||
const hostConfig = getHostConfig(host);
|
||||
|
||||
// Read the owning SKILL.md.tmpl so the section inherits the parent's name +
|
||||
// tier + benefits-from (TemplateContext parity). Fall back to the dir name.
|
||||
const parentTmplPath = path.join(ROOT, skillDir, 'SKILL.md.tmpl');
|
||||
const parentContent = fs.existsSync(parentTmplPath) ? fs.readFileSync(parentTmplPath, 'utf-8') : '';
|
||||
const parentName = (parentContent && extractNameAndDescription(parentContent).name) || skillDir;
|
||||
const ctx = buildContext(parentContent || tmplContent, parentTmplPath, host, parentName);
|
||||
|
||||
// Resolve placeholders against the section body (shared guard catches stragglers).
|
||||
let content = resolvePlaceholders(tmplContent, ctx, hostConfig, relTmplPath);
|
||||
|
||||
// External hosts: rewrite cross-reference paths/tools (no frontmatter to transform).
|
||||
if (host !== 'claude') {
|
||||
content = applyHostRewrites(content, hostConfig);
|
||||
}
|
||||
|
||||
// Plain generated header (no frontmatter to insert after).
|
||||
content = GENERATED_HEADER.replace('{{SOURCE}}', path.basename(sectionTmplPath)) + content;
|
||||
|
||||
const fileName = path.basename(sectionTmplPath).replace(/\.tmpl$/, '');
|
||||
let outputPath: string;
|
||||
if (host === 'claude') {
|
||||
outputPath = path.join(ROOT, skillDir, 'sections', fileName);
|
||||
} else {
|
||||
const externalName = externalSkillName(skillDir, parentName);
|
||||
outputPath = path.join(ROOT, hostConfig.hostSubdir, 'skills', externalName, 'sections', fileName);
|
||||
}
|
||||
if (!DRY_RUN) fs.mkdirSync(path.dirname(outputPath), { recursive: true });
|
||||
return { outputPath, content };
|
||||
}
|
||||
|
||||
// ─── Main ───────────────────────────────────────────────────
|
||||
|
||||
function findTemplates(): string[] {
|
||||
|
|
@ -833,6 +968,42 @@ for (const currentHost of hostsToRun) {
|
|||
}
|
||||
}
|
||||
|
||||
// ─── Section generation (v2 plan T9, Claude-first carve) ───
|
||||
// On-demand sections/*.md for carved skills. Generated for CLAUDE ONLY:
|
||||
// every other host inlines section content via the {{SECTION:id}} resolver
|
||||
// (keeping the full monolith skill), so they need no section files and we
|
||||
// sidestep host-portable section paths until that plumbing lands. No-op for
|
||||
// any skill without a sections/ dir. Mirrors the SKILL.md DRY_RUN handling so
|
||||
// sections participate in the freshness gate.
|
||||
for (const sec of currentHost === 'claude' ? discoverSectionTemplates(ROOT) : []) {
|
||||
if (currentHostConfig.generation.includeSkills?.length &&
|
||||
!currentHostConfig.generation.includeSkills.includes(sec.skillDir)) continue;
|
||||
if (currentHostConfig.generation.skipSkills?.length &&
|
||||
currentHostConfig.generation.skipSkills.includes(sec.skillDir)) continue;
|
||||
|
||||
const { outputPath, content } = processSectionTemplate(path.join(ROOT, sec.tmpl), sec.skillDir, currentHost);
|
||||
const relOutput = path.relative(ROOT, outputPath);
|
||||
|
||||
if (DRY_RUN) {
|
||||
const existing = fs.existsSync(outputPath) ? fs.readFileSync(outputPath, 'utf-8') : '';
|
||||
if (existing !== content) {
|
||||
console.log(`STALE: ${relOutput}`);
|
||||
hasChanges = true;
|
||||
} else {
|
||||
console.log(`FRESH: ${relOutput}`);
|
||||
}
|
||||
} else {
|
||||
fs.writeFileSync(outputPath, content);
|
||||
console.log(`GENERATED: ${relOutput}`);
|
||||
}
|
||||
|
||||
tokenBudget.push({
|
||||
skill: relOutput,
|
||||
lines: content.split('\n').length,
|
||||
tokens: Math.round(content.length / 4),
|
||||
});
|
||||
}
|
||||
|
||||
// Generate gstack-lite and gstack-full for OpenClaw host
|
||||
if (currentHost === 'openclaw' && !DRY_RUN) {
|
||||
const openclawDir = path.join(ROOT, 'openclaw');
|
||||
|
|
@ -959,10 +1130,14 @@ The orchestrator will persist the plan link to its own memory/knowledge store.
|
|||
}
|
||||
}
|
||||
|
||||
// --host all: report failures. Only exit(1) if claude failed.
|
||||
// --host all: any host failure fails the build. Previously only claude failures
|
||||
// exited nonzero, which let a stale or broken external-host output (e.g. a
|
||||
// section that failed to generate for Factory) slip through the freshness gate
|
||||
// silently. With sections fanned out across every host, "all hosts regenerated
|
||||
// in the same commit" is only a real gate if every host failure is fatal here.
|
||||
if (failures.length > 0 && HOST_ARG_VAL === 'all') {
|
||||
console.error(`\n${failures.length} host(s) failed: ${failures.map(f => f.host).join(', ')}`);
|
||||
if (failures.some(f => f.host === 'claude')) process.exit(1);
|
||||
process.exit(1);
|
||||
}
|
||||
// Single host dry-run failure already handled above
|
||||
|
||||
|
|
|
|||
|
|
@ -34,6 +34,7 @@ import { generateGBrainContextLoad, generateGBrainSaveResults, generateBrainPref
|
|||
import { generateQuestionPreferenceCheck, generateQuestionLog, generateInlineTuneFeedback } from './question-tuning';
|
||||
import { generateMakePdfSetup } from './make-pdf';
|
||||
import { generateTasksSectionEmit, generateTasksSectionAggregate } from './tasks-section';
|
||||
import { SECTION, SECTION_INDEX } from './sections';
|
||||
import { generateRedactTaxonomyTable, generateRedactInvocationBlock } from './redact-doc';
|
||||
|
||||
export const RESOLVERS: Record<string, ResolverValue> = {
|
||||
|
|
@ -98,4 +99,6 @@ export const RESOLVERS: Record<string, ResolverValue> = {
|
|||
MAKE_PDF_SETUP: generateMakePdfSetup,
|
||||
TASKS_SECTION_EMIT: generateTasksSectionEmit,
|
||||
TASKS_SECTION_AGGREGATE: generateTasksSectionAggregate,
|
||||
SECTION,
|
||||
SECTION_INDEX,
|
||||
};
|
||||
|
|
|
|||
|
|
@ -0,0 +1,96 @@
|
|||
/**
|
||||
* Section resolvers (v2 plan T9, Claude-first carve).
|
||||
*
|
||||
* A carved skill keeps its prose-heavy steps in `<skill>/sections/<id>.md`, read
|
||||
* on demand. The SAME template ships to every host, so these resolvers make the
|
||||
* carve host-aware:
|
||||
*
|
||||
* - On CLAUDE: {{SECTION:id}} emits a STOP-Read pointer to the generated section
|
||||
* file (the skeleton), and the section .md is generated + installed separately.
|
||||
* - On every OTHER host: {{SECTION:id}} INLINES the section template's content,
|
||||
* so external hosts keep the full monolith ship skill (no section files, no
|
||||
* host-portable-path problem). Inlined content keeps its own {{RESOLVER}}
|
||||
* tokens, which the generator's multi-pass resolve expands.
|
||||
*
|
||||
* {{SECTION_INDEX:skill}} renders the situation→section table from the PASSIVE
|
||||
* manifest on Claude (empty on other hosts — they have no sections). The manifest
|
||||
* is the single source of id/file/title/trigger text (CM2; v2_PLAN.md:663).
|
||||
*/
|
||||
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import type { ResolverFn, TemplateContext } from './types';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..', '..');
|
||||
|
||||
interface SectionEntry {
|
||||
id: string;
|
||||
file: string;
|
||||
title: string;
|
||||
trigger: string;
|
||||
}
|
||||
interface SectionManifest {
|
||||
skill: string;
|
||||
sections: SectionEntry[];
|
||||
}
|
||||
|
||||
function loadManifest(skill: string): SectionManifest {
|
||||
const p = path.join(ROOT, skill, 'sections', 'manifest.json');
|
||||
const raw = fs.readFileSync(p, 'utf-8');
|
||||
return JSON.parse(raw) as SectionManifest;
|
||||
}
|
||||
|
||||
function findSection(skill: string, id: string): SectionEntry {
|
||||
const entry = loadManifest(skill).sections.find(s => s.id === id);
|
||||
if (!entry) {
|
||||
throw new Error(`{{SECTION:${id}}} — no section "${id}" in ${skill}/sections/manifest.json`);
|
||||
}
|
||||
return entry;
|
||||
}
|
||||
|
||||
/**
|
||||
* {{SECTION:id}} — pointer on Claude, inline on other hosts.
|
||||
* Claude path uses the stable gstack-root install (`{skillRoot}/{skill}/sections/`),
|
||||
* which always exists, instead of a naked relative path (Codex outside-voice #7).
|
||||
*/
|
||||
export const SECTION: ResolverFn = (ctx: TemplateContext, args?: string[]): string => {
|
||||
const id = args?.[0];
|
||||
if (!id) throw new Error('{{SECTION:id}} requires a section id');
|
||||
const entry = findSection(ctx.skillName, id);
|
||||
|
||||
if (ctx.host === 'claude') {
|
||||
const sectionPath = `${ctx.paths.skillRoot}/${ctx.skillName}/sections/${entry.file}`;
|
||||
return [
|
||||
`> **STOP.** Before ${entry.trigger}, Read \`${sectionPath}\` and execute it`,
|
||||
`> in full. Do not work from memory — that section is the source of truth for this step.`,
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
// Non-Claude hosts inline the section template content (monolith preserved).
|
||||
// Inner {{RESOLVER}} tokens are expanded by the generator's multi-pass resolve.
|
||||
const tmplPath = path.join(ROOT, ctx.skillName, 'sections', `${entry.file}.tmpl`);
|
||||
return fs.readFileSync(tmplPath, 'utf-8').trimEnd();
|
||||
};
|
||||
|
||||
/**
|
||||
* {{SECTION_INDEX:skill}} — situation→section table from the passive manifest.
|
||||
* Claude only; other hosts inline everything so an index would be noise.
|
||||
*/
|
||||
export const SECTION_INDEX: ResolverFn = (ctx: TemplateContext, args?: string[]): string => {
|
||||
if (ctx.host !== 'claude') return '';
|
||||
const skill = args?.[0] ?? ctx.skillName;
|
||||
const manifest = loadManifest(skill);
|
||||
const lines: string[] = [
|
||||
'## Section index — Read each section when its situation applies',
|
||||
'',
|
||||
'This skill is a decision-tree skeleton. The steps below point to on-demand',
|
||||
'sections. Read a section in full before doing its step; do not work from memory.',
|
||||
'',
|
||||
'| When | Read this section |',
|
||||
'|------|-------------------|',
|
||||
];
|
||||
for (const s of manifest.sections) {
|
||||
lines.push(`| ${s.trigger} | \`sections/${s.file}\` |`);
|
||||
}
|
||||
return lines.join('\n');
|
||||
};
|
||||
22
setup
22
setup
|
|
@ -569,6 +569,14 @@ link_claude_skill_dirs() {
|
|||
# Validate target isn't a symlink before creating the link
|
||||
if [ -L "$target/SKILL.md" ]; then rm "$target/SKILL.md"; fi
|
||||
_link_or_copy "$gstack_dir/$dir_name/SKILL.md" "$target/SKILL.md"
|
||||
# Link the sections/ subdir for carved skills (v2 plan T9). The prefixed
|
||||
# Claude skill dir otherwise holds only SKILL.md, so a runtime
|
||||
# "Read sections/<name>.md" 404s. Route through _link_or_copy so Windows
|
||||
# gets a fresh copy (and re-copies on every ./setup, refreshing staleness).
|
||||
if [ -d "$gstack_dir/$dir_name/sections" ]; then
|
||||
if [ -e "$target/sections" ] || [ -L "$target/sections" ]; then rm -rf "$target/sections"; fi
|
||||
_link_or_copy "$gstack_dir/$dir_name/sections" "$target/sections"
|
||||
fi
|
||||
linked+=("$link_name")
|
||||
fi
|
||||
done
|
||||
|
|
@ -1144,6 +1152,20 @@ if [ "$INSTALL_KIRO" -eq 1 ]; then
|
|||
-e "s|~/.codex/skills/gstack|~/.kiro/skills/gstack|g" \
|
||||
-e "s|~/.claude/skills/gstack|~/.kiro/skills/gstack|g" \
|
||||
"$skill_dir/SKILL.md" > "$target_dir/SKILL.md"
|
||||
# Carved skills (v2 plan T9): rewrite + copy each sections/*.md the same way,
|
||||
# so a runtime "Read sections/<name>.md" resolves under ~/.kiro and doesn't
|
||||
# leak a ~/.codex or ~/.claude path. Kiro builds from the codex output, so
|
||||
# these section files only exist for skills that have been carved.
|
||||
if [ -d "$skill_dir/sections" ]; then
|
||||
mkdir -p "$target_dir/sections"
|
||||
for section_file in "$skill_dir/sections"/*; do
|
||||
[ -f "$section_file" ] || continue
|
||||
sed -e 's|\$HOME/.codex/skills/gstack|$HOME/.kiro/skills/gstack|g' \
|
||||
-e "s|~/.codex/skills/gstack|~/.kiro/skills/gstack|g" \
|
||||
-e "s|~/.claude/skills/gstack|~/.kiro/skills/gstack|g" \
|
||||
"$section_file" > "$target_dir/sections/$(basename "$section_file")"
|
||||
done
|
||||
fi
|
||||
done
|
||||
echo "gstack ready (kiro)."
|
||||
echo " browse: $BROWSE_BIN"
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
name: setup-gbrain
|
||||
preamble-tier: 2
|
||||
version: 1.0.0
|
||||
description: Set up gbrain for this coding agent: install the CLI, initialize a local PGLite or Supabase brain, register MCP, capture per-remote trust policy. (gstack)
|
||||
description: "Set up gbrain for this coding agent: install the CLI, initialize a local PGLite or Supabase brain, register MCP, capture per-remote trust policy. (gstack)"
|
||||
triggers:
|
||||
- setup gbrain
|
||||
- install gbrain
|
||||
|
|
|
|||
1997
ship/SKILL.md
1997
ship/SKILL.md
File diff suppressed because it is too large
Load Diff
|
|
@ -71,6 +71,10 @@ Never skip a verification step because a prior `/ship` run already performed it.
|
|||
|
||||
---
|
||||
|
||||
{{SECTION_INDEX:ship}}
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Pre-flight
|
||||
|
||||
1. Check the current branch. If on the base branch or the repo's default branch, **abort**: "You're on the base branch. Ship from a feature branch."
|
||||
|
|
@ -139,432 +143,53 @@ git fetch origin <base> && git merge origin/<base> --no-edit
|
|||
|
||||
---
|
||||
|
||||
## Step 4: Test Framework Bootstrap
|
||||
{{SECTION:tests}}
|
||||
|
||||
{{TEST_BOOTSTRAP}}
|
||||
{{SECTION:test-coverage}}
|
||||
|
||||
---
|
||||
{{SECTION:plan-completion}}
|
||||
|
||||
## Step 5: Run tests (on merged code)
|
||||
{{SECTION:review-army}}
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
`db:test:prepare` internally, which loads the schema into the correct lane database.
|
||||
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
|
||||
{{SECTION:greptile}}
|
||||
|
||||
Run both test suites in parallel:
|
||||
|
||||
```bash
|
||||
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
|
||||
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
|
||||
wait
|
||||
```
|
||||
|
||||
After both complete, read the output files and check pass/fail.
|
||||
|
||||
**If any test fails:** Do NOT immediately stop. Apply the Test Failure Ownership Triage:
|
||||
|
||||
{{TEST_FAILURE_TRIAGE}}
|
||||
|
||||
**After triage:** If any in-branch failures remain unfixed, **STOP**. Do not proceed. If all failures were pre-existing and handled (fixed, TODOed, assigned, or skipped), continue to Step 6.
|
||||
|
||||
**If all pass:** Continue silently — just note the counts briefly.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Eval Suites (conditional)
|
||||
|
||||
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
|
||||
|
||||
**1. Check if the diff touches prompt-related files:**
|
||||
|
||||
```bash
|
||||
git diff origin/<base> --name-only
|
||||
```
|
||||
|
||||
Match against these patterns (from CLAUDE.md):
|
||||
- `app/services/*_prompt_builder.rb`
|
||||
- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
|
||||
- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
|
||||
- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
|
||||
- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
|
||||
- `config/system_prompts/*.txt`
|
||||
- `test/evals/**/*` (eval infrastructure changes affect all suites)
|
||||
|
||||
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 9.
|
||||
|
||||
**2. Identify affected eval suites:**
|
||||
|
||||
Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
|
||||
|
||||
```bash
|
||||
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
|
||||
```
|
||||
|
||||
Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
|
||||
|
||||
**Special cases:**
|
||||
- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
|
||||
- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
|
||||
- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
|
||||
|
||||
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
|
||||
|
||||
`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
|
||||
|
||||
```bash
|
||||
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
|
||||
```
|
||||
|
||||
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
||||
|
||||
**4. Check results:**
|
||||
|
||||
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
||||
- **If all pass:** Note pass counts and cost. Continue to Step 9.
|
||||
|
||||
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 19).
|
||||
|
||||
**Tier reference (for context — /ship always uses `full`):**
|
||||
| Tier | When | Speed (cached) | Cost |
|
||||
|------|------|----------------|------|
|
||||
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
|
||||
| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
|
||||
| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Test Coverage Audit
|
||||
|
||||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent runs the coverage audit in a fresh context window — the parent only sees the conclusion, not intermediate file reads. This is context-rot defense.
|
||||
|
||||
**Subagent prompt:** Pass the following instructions to the subagent, with `<base>` substituted with the base branch:
|
||||
|
||||
> You are running a ship-workflow test coverage audit. Run `git diff <base>...HEAD` as needed. Do not commit or push — report only.
|
||||
>
|
||||
> {{TEST_COVERAGE_AUDIT_SHIP}}
|
||||
>
|
||||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"coverage_pct":N,"gaps":N,"diagram":"<full markdown coverage diagram for PR body>","tests_added":["path",...]}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Read the subagent's final output. Parse the LAST line as JSON.
|
||||
2. Store `coverage_pct` (for Step 20 metrics), `gaps` (user summary), `tests_added` (for the commit).
|
||||
3. Embed `diagram` verbatim in the PR body's `## Test Coverage` section (Step 19).
|
||||
4. Print a one-line summary: `Coverage: {coverage_pct}%, {gaps} gaps. {tests_added.length} tests added.`
|
||||
|
||||
**If the subagent fails, times out, or returns invalid JSON:** Fall back to running the audit inline in the parent. Do not block /ship on subagent failure — partial results are better than none.
|
||||
|
||||
---
|
||||
|
||||
## Step 8: Plan Completion Audit
|
||||
|
||||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent reads the plan file and every referenced code file in its own fresh context. Parent gets only the conclusion.
|
||||
|
||||
**Subagent prompt:** Pass these instructions to the subagent:
|
||||
|
||||
> You are running a ship-workflow plan completion audit. The base branch is `<base>`. Use `git diff <base>...HEAD` to see what shipped. Do not commit or push — report only.
|
||||
>
|
||||
> {{PLAN_COMPLETION_AUDIT_SHIP}}
|
||||
>
|
||||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"unverifiable":N,"summary":"<markdown checklist for PR body>"}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Parse the LAST line of the subagent's output as JSON.
|
||||
2. Store `done`, `deferred`, `unverifiable` for Step 20 metrics; use `summary` in PR body.
|
||||
3. If `deferred > 0` or `unverifiable > 0` and no user override, present the items via the appropriate AskUserQuestion (see Gate Logic priority order above) before continuing.
|
||||
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19). If `unverifiable > 0` and the user picked option A in the UNVERIFIABLE gate, also embed `## Plan Completion — Manual Verifications` listing each user-confirmed item.
|
||||
|
||||
**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline (parent processes the same plan-extraction + classification logic). If the inline fallback also fails (e.g., plan file unreadable, parser error), do NOT silently pass — surface the failure as an explicit AskUserQuestion: "Plan Completion audit could not run ({reason}). Options: (A) Skip audit and ship anyway — record that the audit was skipped in PR body and Step 20 metrics; (B) Stop and fix the audit." Default and recommended option is (B). Silent fail-open is the failure shape that VAS-449 surfaced.
|
||||
|
||||
---
|
||||
|
||||
{{PLAN_VERIFICATION_EXEC}}
|
||||
|
||||
{{LEARNINGS_SEARCH:query=release ship version changelog merge pr}}
|
||||
|
||||
{{SCOPE_DRIFT}}
|
||||
|
||||
---
|
||||
|
||||
## Step 9: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
|
||||
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
|
||||
|
||||
2. Run `git diff origin/<base>` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
|
||||
|
||||
3. Apply the review checklist in two passes:
|
||||
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
|
||||
- **Pass 2 (INFORMATIONAL):** All remaining categories
|
||||
|
||||
{{CONFIDENCE_CALIBRATION}}
|
||||
|
||||
{{DESIGN_REVIEW_LITE}}
|
||||
|
||||
Include any design findings alongside the code review findings. They follow the same Fix-First flow below.
|
||||
|
||||
{{REVIEW_ARMY}}
|
||||
|
||||
{{CROSS_REVIEW_DEDUP}}
|
||||
|
||||
4. **Classify each finding from both the checklist pass and specialist review (Step 9.1-Step 9.2) as AUTO-FIX or ASK** per the Fix-First Heuristic in
|
||||
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
|
||||
|
||||
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
|
||||
`[AUTO-FIXED] [file:line] Problem → what you did`
|
||||
|
||||
6. **If ASK items remain,** present them in ONE AskUserQuestion:
|
||||
- List each with number, severity, problem, recommended fix
|
||||
- Per-item options: A) Fix B) Skip
|
||||
- Overall RECOMMENDATION
|
||||
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
|
||||
|
||||
7. **After all fixes (auto + user-approved):**
|
||||
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
|
||||
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 12.
|
||||
|
||||
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
|
||||
|
||||
If no issues found: `Pre-Landing Review: No issues found.`
|
||||
|
||||
9. Persist the review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
|
||||
```
|
||||
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
|
||||
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
|
||||
- `quality_score` = the PR Quality Score computed in Step 9.2 (e.g., 7.5). If specialists were skipped (small diff), use `10.0`
|
||||
- `specialists` = the per-specialist stats object compiled in Step 9.2. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}`
|
||||
- `findings` = array of per-finding records. For each finding (from checklist pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"`, `"fixed"` (user approved), or `"skipped"` (user chose Skip).
|
||||
|
||||
Save the review output — it goes into the PR body in Step 19.
|
||||
|
||||
---
|
||||
|
||||
## Step 10: Address Greptile review comments (if PR exists)
|
||||
|
||||
**Dispatch the fetch + classification as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent pulls every Greptile comment, runs the escalation detection algorithm, and classifies each comment. Parent receives a structured list and handles user interaction + file edits.
|
||||
|
||||
**Subagent prompt:**
|
||||
|
||||
> You are classifying Greptile review comments for a /ship workflow. Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps. Do NOT fix code, do NOT reply to comments, do NOT commit — report only.
|
||||
>
|
||||
> For each comment, assign: `classification` (`valid_actionable`, `already_fixed`, `false_positive`, `suppressed`), `escalation_tier` (1 or 2), the file:line or [top-level] tag, body summary, and permalink URL.
|
||||
>
|
||||
> If no PR exists, `gh` fails, the API errors, or there are zero comments, output: `{"total":0,"comments":[]}` and stop.
|
||||
>
|
||||
> Otherwise, output a single JSON object on the LAST LINE of your response:
|
||||
> `{"total":N,"comments":[{"classification":"...","escalation_tier":N,"ref":"file:line","summary":"...","permalink":"url"},...]}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
Parse the LAST line as JSON.
|
||||
|
||||
If `total` is 0, skip this step silently. Continue to Step 12.
|
||||
|
||||
Otherwise, print: `+ {total} Greptile comments ({valid_actionable} valid, {already_fixed} already fixed, {false_positive} FP)`.
|
||||
|
||||
For each comment in `comments`:
|
||||
|
||||
**VALID & ACTIONABLE:** Use AskUserQuestion with:
|
||||
- The comment (file:line or [top-level] + body summary + permalink URL)
|
||||
- `RECOMMENDATION: Choose A because [one-line reason]`
|
||||
- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
|
||||
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
|
||||
|
||||
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history (type: already-fixed)
|
||||
|
||||
**FALSE POSITIVE:** Use AskUserQuestion:
|
||||
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
|
||||
- Options:
|
||||
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
|
||||
- B) Fix it anyway (if trivial)
|
||||
- C) Ignore silently
|
||||
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
|
||||
|
||||
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
**After all comments are resolved:** If any fixes were applied, the tests from Step 5 are now stale. **Re-run tests** (Step 5) before continuing to Step 12. If no fixes were applied, continue to Step 12.
|
||||
|
||||
---
|
||||
|
||||
{{ADVERSARIAL_STEP}}
|
||||
|
||||
{{LEARNINGS_LOG}}
|
||||
|
||||
{{GBRAIN_SAVE_RESULTS}}
|
||||
|
||||
### Refresh learnings for the headline feature on this branch
|
||||
|
||||
The top-of-skill learnings pull was keyed to "release ship" broadly. Before the VERSION/CHANGELOG step, re-pull learnings keyed to THIS branch's headline feature so any prior version-bump or CHANGELOG pitfalls for similar features surface.
|
||||
|
||||
Pick ONE keyword that names the headline feature you're shipping. The keyword should be a noun: the primary skill or module name, the central feature noun, or the binary you changed. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
|
||||
|
||||
Worked examples (ship-specific): good keywords are `learnings-search`, `pacing`, `worktree-ship`. Bad: `the branch headline`, `v1.31.1.0`, `feat: token-or search`.
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
|
||||
```
|
||||
|
||||
If any learnings come back, name which one applies to the version bump or CHANGELOG framing in one sentence. If none come back, continue without reference — the absence is itself useful information.
|
||||
{{SECTION:adversarial}}
|
||||
|
||||
## Step 12: Version bump (auto-decide)
|
||||
|
||||
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
|
||||
|
||||
```bash
|
||||
if ! git rev-parse --verify origin/<base> >/dev/null 2>&1; then
|
||||
echo "ERROR: Unable to resolve origin/<base>. Run 'git fetch origin' or verify the base branch exists."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
|
||||
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
|
||||
PKG_VERSION=""
|
||||
PKG_EXISTS=0
|
||||
if [ -f package.json ]; then
|
||||
PKG_EXISTS=1
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PARSE_EXIT" != "0" ]; then
|
||||
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
|
||||
|
||||
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_UNEXPECTED"
|
||||
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
|
||||
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
|
||||
exit 1
|
||||
fi
|
||||
echo "STATE: FRESH"
|
||||
else
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_STALE_PKG"
|
||||
else
|
||||
echo "STATE: ALREADY_BUMPED"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
Read the `STATE:` line and dispatch:
|
||||
|
||||
- **FRESH** → proceed with the bump action below (steps 1–4).
|
||||
- **ALREADY_BUMPED** → skip the bump by default, BUT check for queue drift first: call `bin/gstack-next-version` with the implied bump level (derived from `CURRENT_VERSION` vs `BASE_VERSION`), compare its `.version` against `CURRENT_VERSION`. If they differ (queue moved since last ship), use **AskUserQuestion**: "VERSION drift detected: you claim v<CURRENT> but next available is v<NEW> (queue moved). A) Rebump to v<NEW> and rewrite CHANGELOG header + PR title (recommended), B) Keep v<CURRENT> — will be rejected by CI version-gate until resolved." If A, treat this as FRESH with `NEW_VERSION=<new>` and run steps 1-4 (which will also trigger Step 13 CHANGELOG header rewrite and Step 19 PR title rewrite). If B, reuse `CURRENT_VERSION` and warn that CI will likely reject. If util is offline, warn and reuse `CURRENT_VERSION`.
|
||||
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. (Queue check still runs in ALREADY_BUMPED terms after repair.)
|
||||
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
|
||||
|
||||
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
|
||||
|
||||
2. **Auto-decide the bump level based on the diff:**
|
||||
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
|
||||
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
|
||||
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
|
||||
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
|
||||
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
|
||||
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
|
||||
|
||||
Save the chosen level as `BUMP_LEVEL` (one of `major`, `minor`, `patch`, `micro`). This is the user-intended level. The next step decides *placement* — the level stays the same even if queue-aware allocation has to advance past a claimed slot.
|
||||
|
||||
3. **Queue-aware version pick (workspace-aware ship, v1.6.4.0+).** Call `bin/gstack-next-version` to see what's already claimed by open PRs + active sibling Conductor worktrees, then render the queue state to the user:
|
||||
The deterministic version-state logic is the tested **`gstack-version-bump`** CLI
|
||||
(classify / write / repair). The bump-LEVEL decision and queue-collision handling
|
||||
stay agent judgment; the slot pick stays `gstack-next-version`.
|
||||
|
||||
1. **Classify state** — pure reader, never writes:
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run bin/gstack-next-version \
|
||||
--base <base> \
|
||||
--bump "$BUMP_LEVEL" \
|
||||
--current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
CLAIMED_COUNT=$(echo "$QUEUE_JSON" | jq -r '.claimed | length')
|
||||
ACTIVE_SIBLING_COUNT=$(echo "$QUEUE_JSON" | jq -r '.active_siblings | length')
|
||||
OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
|
||||
REASON=$(echo "$QUEUE_JSON" | jq -r '.reason // ""')
|
||||
bun run ~/.claude/skills/gstack/bin/gstack-version-bump classify --base <base>
|
||||
```
|
||||
Read the JSON `state` and dispatch:
|
||||
- **FRESH** → do the bump (steps 2-4).
|
||||
- **ALREADY_BUMPED** → skip the bump, but run the queue-drift check (step 3) with the reported `currentVersion`. If the queue moved (next free version differs), **AskUserQuestion**: rebump to the new version (rewrites CHANGELOG header + PR title) or keep current (CI version-gate will reject until resolved).
|
||||
- **DRIFT_STALE_PKG** → run `gstack-version-bump repair` (syncs package.json to VERSION). No re-bump; reuse `currentVersion` for CHANGELOG + PR.
|
||||
- **DRIFT_UNEXPECTED** → **STOP**. package.json disagrees with VERSION while VERSION matches base — a manual edit bypassed /ship. Reconcile manually, then re-run.
|
||||
|
||||
- If `OFFLINE=true` or the util fails (auth expired, no `gh`/`glab`, network): fall back to local `BUMP_LEVEL` arithmetic (bump `BASE_VERSION` at the chosen level). Print `⚠ workspace-aware ship offline — using local bump only`. Continue.
|
||||
- If `CLAIMED_COUNT > 0`: render the queue table to the user so they can see landing order at a glance:
|
||||
```
|
||||
Queue on <base> (vBASE_VERSION):
|
||||
#<pr> <branch> → v<version> [⚠ collision with #<other>]
|
||||
Active sibling workspaces (WIP, not yet PR'd):
|
||||
<path> → v<version> (committed Nh ago)
|
||||
Your branch will claim: vNEW_VERSION (<reason>)
|
||||
```
|
||||
- If `ACTIVE_SIBLING_COUNT > 0` and any active sibling's VERSION is `>= NEW_VERSION`, use **AskUserQuestion**: "Sibling workspace <path> has v<X> committed <N>h ago but hasn't PR'd yet. Wait for them to ship first, or advance past? A) Advance past (recommended for unrelated work), B) Abort /ship and sync up with sibling first."
|
||||
- Validate `NEW_VERSION` matches `MAJOR.MINOR.PATCH.MICRO`. If util returns an empty or malformed version, fall back to local bump.
|
||||
2. **Decide the bump level** from the diff (agent judgment):
|
||||
- **MICRO**: <50 lines, trivial tweaks/config. **PATCH**: 50+ lines, no feature signals.
|
||||
- **MINOR**: **ASK** if any feature signal (new route/page, migration, new module), OR 500+ lines. **MAJOR**: **ASK** — milestones or breaking changes only.
|
||||
Save as `BUMP_LEVEL`. The level is the user-intended bump; queue-aware placement may advance the slot without changing the level.
|
||||
|
||||
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
|
||||
3. **Queue-aware pick** (workspace-aware ship):
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run ~/.claude/skills/gstack/bin/gstack-next-version --base <base> --bump "$BUMP_LEVEL" --current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
```
|
||||
If `offline`/util fails: fall back to local `BUMP_LEVEL` arithmetic and print `⚠ workspace-aware ship offline — using local bump only`. If `claimed` is non-empty, render the queue table so the user sees landing order. If an active sibling workspace holds a version `>= NEW_VERSION`, **AskUserQuestion**: advance past (unrelated work) or abort and sync with the sibling.
|
||||
|
||||
```bash
|
||||
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
|
||||
exit 1
|
||||
fi
|
||||
echo "$NEW_VERSION" > VERSION
|
||||
if [ -f package.json ]; then
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
|
||||
exit 1
|
||||
}
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
4. **Write the bump** (FRESH, or an approved rebump):
|
||||
```bash
|
||||
bun run ~/.claude/skills/gstack/bin/gstack-version-bump write --version "$NEW_VERSION"
|
||||
```
|
||||
The CLI validates the 4-digit `MAJOR.MINOR.PATCH.MICRO` pattern and writes **both** VERSION and package.json. On a half-write (VERSION written, package.json failed) it exits 3 — re-run, and classify will report DRIFT_STALE_PKG for `repair` to fix.
|
||||
|
||||
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
|
||||
|
||||
```bash
|
||||
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
|
||||
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
|
||||
exit 1
|
||||
fi
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed — could not update package.json."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed."
|
||||
exit 1
|
||||
}
|
||||
fi
|
||||
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
{{CHANGELOG_WORKFLOW}}
|
||||
|
||||
---
|
||||
{{SECTION:changelog}}
|
||||
|
||||
## Step 14: TODOS.md (auto-update)
|
||||
|
||||
|
|
@ -770,211 +395,7 @@ git push -u origin <branch-name>
|
|||
|
||||
---
|
||||
|
||||
## Step 18: Documentation sync (via subagent, before PR creation)
|
||||
|
||||
**Dispatch /document-release as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent gets a fresh context window — zero rot from the preceding 17 steps. It also runs the **full** `/document-release` workflow (with CHANGELOG clobber protection, doc exclusions, risky-change gates, named staging, race-safe PR body editing) rather than a weaker reimplementation.
|
||||
|
||||
**Sequencing:** This step runs AFTER Step 17 (Push) and BEFORE Step 19 (Create PR). The PR is created once from final HEAD with the `## Documentation` section baked into the initial body. No create-then-re-edit dance.
|
||||
|
||||
**Subagent prompt:**
|
||||
|
||||
> You are executing the /document-release workflow after a code push. Read the full skill file `${HOME}/.claude/skills/gstack/document-release/SKILL.md` and execute its complete workflow end-to-end, including CHANGELOG clobber protection, doc exclusions, risky-change gates, and named staging. Do NOT attempt to edit the PR body — no PR exists yet. Branch: `<branch>`, base: `<base>`.
|
||||
>
|
||||
> After completing the workflow, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"files_updated":["README.md","CLAUDE.md",...],"commit_sha":"abc1234","pushed":true,"documentation_section":"<markdown block for PR body's ## Documentation section>"}`
|
||||
>
|
||||
> If no documentation files needed updating, output:
|
||||
> `{"files_updated":[],"commit_sha":null,"pushed":false,"documentation_section":null}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Parse the LAST line of the subagent's output as JSON.
|
||||
2. Store `documentation_section` — Step 19 embeds it in the PR body (or omits the section if null).
|
||||
3. If `files_updated` is non-empty, print: `Documentation synced: {files_updated.length} files updated, committed as {commit_sha}`.
|
||||
4. If `files_updated` is empty, print: `Documentation is current — no updates needed.`
|
||||
|
||||
**If the subagent fails or returns invalid JSON:** Print a warning and proceed to Step 19 without a `## Documentation` section. Do not block /ship on subagent failure. The user can run `/document-release` manually after the PR lands.
|
||||
|
||||
---
|
||||
|
||||
## Step 19: Create PR/MR
|
||||
|
||||
**Idempotency check:** Check if a PR/MR already exists for this branch.
|
||||
|
||||
**If GitHub:**
|
||||
```bash
|
||||
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
|
||||
```
|
||||
|
||||
**If GitLab:**
|
||||
```bash
|
||||
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
|
||||
```
|
||||
|
||||
If an **open** PR/MR already exists: **update** the PR body using `gh pr edit --body-file "$PR_BODY_FILE"` (GitHub) or `glab mr update -d ...` (GitLab). Always regenerate the PR body from scratch using this run's fresh results (test output, coverage audit, review findings, adversarial review, TODOS summary, documentation_section from Step 18). Never reuse stale PR body content from a prior run. **Run the same redaction scan-at-sink (PR body + title) as the create path (Step 19) before editing — scan the temp file, then `gh pr edit --body-file` from it.**
|
||||
|
||||
**Always update the PR title to start with `v$NEW_VERSION`.** PR titles use the workspace-aware format `v<NEW_VERSION> <type>: <summary>` — version ALWAYS first, no exceptions, no "custom title kept intentionally" escape hatch. The shared helper `bin/gstack-pr-title-rewrite.sh` is the single source of truth for the rule.
|
||||
|
||||
1. Read the current title: `CURRENT=$(gh pr view --json title -q .title)` (or `glab mr view -F json | jq -r .title`).
|
||||
2. Compute the corrected title: `NEW_TITLE=$(~/.claude/skills/gstack/bin/gstack-pr-title-rewrite.sh "$NEW_VERSION" "$CURRENT")`. The helper handles three cases: title already correct (no-op), title has a different `v<X.Y.Z.W>` prefix (replace it), or title has no version prefix (prepend one).
|
||||
3. If `NEW_TITLE` differs from `CURRENT`, run `gh pr edit --title "$NEW_TITLE"` (or `glab mr update -t "$NEW_TITLE"`).
|
||||
4. **Self-check:** re-fetch the title and assert it starts with `v$NEW_VERSION `. If it does not, retry the edit once. If still wrong, surface the failure to the user.
|
||||
|
||||
This keeps the title truthful when Step 12's queue-drift detection rebumps a stale version, and forces the format on PRs that were created without it.
|
||||
|
||||
Print the existing URL and continue to Step 20.
|
||||
|
||||
If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
|
||||
|
||||
The PR/MR body should contain these sections:
|
||||
|
||||
```
|
||||
## Summary
|
||||
<Summarize ALL changes being shipped. Run `git log <base>..HEAD --oneline` to enumerate
|
||||
every commit. Exclude the VERSION/CHANGELOG metadata commit (that's this PR's bookkeeping,
|
||||
not a substantive change). Group the remaining commits into logical sections (e.g.,
|
||||
"**Performance**", "**Dead Code Removal**", "**Infrastructure**"). Every substantive commit
|
||||
must appear in at least one section. If a commit's work isn't reflected in the summary,
|
||||
you missed it.>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 7, or "All new code paths have test coverage.">
|
||||
<If Step 7 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 9 code review, or "No issues found.">
|
||||
|
||||
## Design Review
|
||||
<If design review ran: "Design Review (lite): N findings — M auto-fixed, K skipped. AI Slop: clean/N issues.">
|
||||
<If no frontend files changed: "No frontend files changed — design review skipped.">
|
||||
|
||||
## Eval Results
|
||||
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
|
||||
|
||||
## Greptile Review
|
||||
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
|
||||
<If no Greptile comments found: "No Greptile comments.">
|
||||
<If no PR existed during Step 10: omit this section entirely>
|
||||
|
||||
## Scope Drift
|
||||
<If scope drift ran: "Scope Check: CLEAN" or list of drift/creep findings>
|
||||
<If no scope drift: omit this section>
|
||||
|
||||
## Plan Completion
|
||||
<If plan file found: completion checklist summary from Step 8>
|
||||
<If no plan file: "No plan file detected.">
|
||||
<If plan items deferred: list deferred items>
|
||||
|
||||
## Linked Spec
|
||||
<Auto-detect: look for /spec archives matching this branch via:
|
||||
eval "$(${ctx.paths.binDir}/gstack-paths)"
|
||||
eval "$(${ctx.paths.binDir}/gstack-slug)"
|
||||
CURRENT_BRANCH=$(git branch --show-current)
|
||||
SPEC_ARCHIVES="$GSTACK_STATE_ROOT/projects/$SLUG/specs"
|
||||
# Find newest archive whose spec_branch frontmatter matches current branch (or one of its
|
||||
# parents — if spec spawned worktree spec/<slug>-$$, the spawned worktree IS where /ship runs).
|
||||
SPEC_FILE=$(grep -l "^spec_branch: $CURRENT_BRANCH$" "$SPEC_ARCHIVES"/*.md 2>/dev/null | head -1)
|
||||
[ -z "$SPEC_FILE" ] && exit # no spec; omit this section entirely
|
||||
SPEC_ISSUE=$(grep "^spec_issue_number:" "$SPEC_FILE" | cut -d' ' -f2)
|
||||
[ -z "$SPEC_ISSUE" ] && exit # spec archive exists but no issue number; omit
|
||||
|
||||
# CONDITIONAL Closes #N (codex F4): only add when Plan Completion above is "complete".
|
||||
# If the plan completion gate from Step 8 reports any deferred or failed items, emit:
|
||||
# "Linked to #$SPEC_ISSUE (partial delivery — NOT auto-closing; close manually after follow-up)"
|
||||
# If Plan Completion is fully complete, emit:
|
||||
# "Closes #$SPEC_ISSUE"
|
||||
# and include the Closes #N line in the PR body so GitHub auto-closes on merge.>
|
||||
|
||||
<Format:
|
||||
Closes #<N>
|
||||
|
||||
This PR delivers the spec at <archive path relative to repo root>.
|
||||
Spec filed: <spec_filed_at from frontmatter>>
|
||||
|
||||
<If partial delivery, emit instead:
|
||||
Linked to #<N> (partial delivery — not auto-closing).
|
||||
Deferred items: <list from Plan Completion>.
|
||||
Close #<N> manually after follow-up lands.>
|
||||
|
||||
<If no /spec archive matches this branch: omit this entire section.>
|
||||
|
||||
## Verification Results
|
||||
<If verification ran: summary from Step 8.1 (N PASS, M FAIL, K SKIPPED)>
|
||||
<If skipped: reason (no plan, no server, no verification section)>
|
||||
<If not applicable: omit this section>
|
||||
|
||||
## TODOS
|
||||
<If items marked complete: bullet list of completed items with version>
|
||||
<If no items completed: "No TODO items completed in this PR.">
|
||||
<If TODOS.md created or reorganized: note that>
|
||||
<If TODOS.md doesn't exist and user skipped: omit this section>
|
||||
|
||||
## Documentation
|
||||
<Embed the `documentation_section` string returned by Step 18's subagent here, verbatim.>
|
||||
<If Step 18 returned `documentation_section: null` (no docs updated), omit this section entirely.>
|
||||
|
||||
## Test plan
|
||||
- [x] All Rails tests pass (N runs, 0 failures)
|
||||
- [x] All Vitest tests pass (N tests)
|
||||
|
||||
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
||||
```
|
||||
|
||||
#### Redaction scan (PR body + title) — runs before create AND edit
|
||||
|
||||
The PR body is world-readable on a public repo. Scan-at-sink before sending:
|
||||
write the composed body to a temp file, scan THAT file with the shared engine,
|
||||
and pass the same file to `gh`/`glab`. Wrap any Codex / Greptile / eval output
|
||||
sections in tool-attributed fences (` ```codex-review ` / ` ```greptile `) so the
|
||||
engine WARN-degrades the example credentials those tools quote instead of blocking
|
||||
the PR (a live-format credential inside the fence still blocks).
|
||||
|
||||
```bash
|
||||
REDACT_VIS=$(~/.claude/skills/gstack/bin/gstack-config get redact_repo_visibility 2>/dev/null)
|
||||
[ -z "$REDACT_VIS" ] && REDACT_VIS=$(gh repo view --json visibility -q .visibility 2>/dev/null | tr 'A-Z' 'a-z')
|
||||
REDACT_VIS="${REDACT_VIS:-unknown}"
|
||||
PR_BODY_FILE=$(mktemp)
|
||||
cat > "$PR_BODY_FILE" <<'PR_BODY_EOF'
|
||||
<PR body from above>
|
||||
PR_BODY_EOF
|
||||
~/.claude/skills/gstack/bin/gstack-redact --from-file "$PR_BODY_FILE" --repo-visibility "$REDACT_VIS" --self-email "$(git config user.email 2>/dev/null)" --json
|
||||
case $? in
|
||||
3) echo "BLOCKED — credential in PR body. Rotate + redact, do not create the PR."; exit 1 ;;
|
||||
2) echo "MEDIUM findings — confirm per finding (sterner on public) before proceeding." ;;
|
||||
esac
|
||||
# Also scan the title (short, single-line):
|
||||
printf '%s' "v$NEW_VERSION <type>: <summary>" | ~/.claude/skills/gstack/bin/gstack-redact --repo-visibility "$REDACT_VIS" --json
|
||||
```
|
||||
|
||||
HIGH blocks (exit 3, no skip). MEDIUM → AskUserQuestion (PII subset offers
|
||||
`--auto-redact`). Same scan runs before the `gh pr edit --body` path (Step 17).
|
||||
|
||||
**If GitHub:** create from the SCANNED file (exact bytes scanned = bytes sent):
|
||||
|
||||
```bash
|
||||
# PR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||||
gh pr create --base <base> --title "v$NEW_VERSION <type>: <summary>" --body-file "$PR_BODY_FILE"
|
||||
rm -f "$PR_BODY_FILE"
|
||||
```
|
||||
|
||||
**If GitLab:**
|
||||
|
||||
```bash
|
||||
# MR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||||
glab mr create -b <base> -t "v$NEW_VERSION <type>: <summary>" -d "$(cat <<'EOF'
|
||||
<MR body from above>
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
**If neither CLI is available:**
|
||||
Print the branch name, remote URL, and instruct the user to create the PR/MR manually via the web UI. Do not stop — the code is pushed and ready.
|
||||
|
||||
**Output the PR/MR URL** — then proceed to Step 20.
|
||||
|
||||
---
|
||||
{{SECTION:pr-body}}
|
||||
|
||||
## Step 20: Persist ship metrics
|
||||
|
||||
|
|
@ -1025,6 +446,16 @@ no-op. The marker guarantees at-most-once per machine. To re-enable:
|
|||
|
||||
---
|
||||
|
||||
## Section self-check (before you finish)
|
||||
|
||||
You ran a carved skill. For your situation, list every section the Section index
|
||||
named as applying, and confirm you issued a Read for each one. If you executed any
|
||||
of those steps from memory without reading its section, you skipped the source of
|
||||
truth — STOP, Read it now, and redo that step. Deterministic version work goes
|
||||
through `gstack-version-bump`; never hand-roll the VERSION/package.json write.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,168 @@
|
|||
<!-- AUTO-GENERATED from adversarial.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 11: Adversarial review (always-on)
|
||||
|
||||
Every diff gets adversarial review from both Claude and Codex. LOC is not a proxy for risk — a 5-line auth change can be critical.
|
||||
|
||||
**Detect diff size and tool availability:**
|
||||
|
||||
```bash
|
||||
DIFF_BASE=$(git merge-base origin/<base> HEAD)
|
||||
DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
|
||||
DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
|
||||
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
|
||||
command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
# Legacy opt-out — only gates Codex passes, Claude always runs
|
||||
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
|
||||
echo "DIFF_SIZE: $DIFF_TOTAL"
|
||||
echo "OLD_CFG: ${OLD_CFG:-not_set}"
|
||||
```
|
||||
|
||||
If `OLD_CFG` is `disabled`: skip Codex passes only. Claude adversarial subagent still runs (it's free and fast). Jump to the "Claude adversarial subagent" section.
|
||||
|
||||
**User override:** If the user explicitly requested "full review", "structured review", or "P1 gate", also run the Codex structured review regardless of diff size.
|
||||
|
||||
---
|
||||
|
||||
### Claude adversarial subagent (always runs)
|
||||
|
||||
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
|
||||
|
||||
Subagent prompt:
|
||||
"Read the diff for this branch with `DIFF_BASE=$(git merge-base origin/<base> HEAD) && git diff "$DIFF_BASE"`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment). After listing findings, end your output with ONE line in the canonical format `Recommendation: <action> because <one-line reason naming the most exploitable finding>` — examples: `Recommendation: Fix the unbounded retry at queue.ts:78 because it'll DoS the worker pool under sustained 429s` or `Recommendation: Ship as-is because the strongest finding is a theoretical race that requires conditions we can't trigger in production`. The reason must point to a specific finding (or no-fix rationale). Generic reasons like 'because it's safer' do not qualify."
|
||||
|
||||
Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
|
||||
|
||||
If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing."
|
||||
|
||||
---
|
||||
|
||||
### Codex adversarial challenge (always runs when available)
|
||||
|
||||
If Codex is available AND `OLD_CFG` is NOT `disabled`:
|
||||
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run DIFF_BASE=$(git merge-base origin/<base> HEAD) && git diff "$DIFF_BASE" to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems. End your output with ONE line in the canonical format `Recommendation: <action> because <one-line reason naming the most exploitable finding>`. Generic reasons like 'because it's safer' do not qualify; the reason must point to a specific finding or no-fix rationale." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
```bash
|
||||
cat "$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Present the full output verbatim. This is informational — it never blocks shipping.
|
||||
|
||||
**Error handling:** All errors are non-blocking — adversarial review is a quality enhancement, not a prerequisite.
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \`codex login\` to authenticate."
|
||||
- **Timeout:** "Codex timed out after 5 minutes."
|
||||
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>."
|
||||
|
||||
**Cleanup:** Run `rm -f "$TMPERR_ADV"` after processing.
|
||||
|
||||
If Codex is NOT available: "Codex CLI not found — running Claude adversarial only. Install Codex for cross-model coverage: `npm install -g @openai/codex`"
|
||||
|
||||
---
|
||||
|
||||
### Codex structured review (large diffs only, 200+ lines)
|
||||
|
||||
If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
|
||||
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch <base>. Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD to see the diff and review only those changes." -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
|
||||
Check for `[P1]` markers: found → `GATE: FAIL`, not found → `GATE: PASS`.
|
||||
|
||||
If GATE is FAIL, use AskUserQuestion:
|
||||
```
|
||||
Codex found N critical issues in the diff.
|
||||
|
||||
A) Investigate and fix now (recommended)
|
||||
B) Continue — review will still complete
|
||||
```
|
||||
|
||||
If A: address the findings. After fixing, re-run tests (Step 5) since code has changed. Re-run `codex review` to verify.
|
||||
|
||||
Read stderr for errors (same error handling as Codex adversarial above).
|
||||
|
||||
After stderr: `rm -f "$TMPERR"`
|
||||
|
||||
If `DIFF_TOTAL < 200`: skip this section silently. The Claude + Codex adversarial passes provide sufficient coverage for smaller diffs.
|
||||
|
||||
---
|
||||
|
||||
### Persist the review result
|
||||
|
||||
After all passes complete, persist:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"always","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
```
|
||||
Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), "skipped" if diff < 200, or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
|
||||
|
||||
---
|
||||
|
||||
### Cross-model synthesis
|
||||
|
||||
After all passes complete, synthesize findings across all sources:
|
||||
|
||||
```
|
||||
ADVERSARIAL REVIEW SYNTHESIS (always-on, N lines):
|
||||
════════════════════════════════════════════════════════════
|
||||
High confidence (found by multiple sources): [findings agreed on by >1 pass]
|
||||
Unique to Claude structured review: [from earlier step]
|
||||
Unique to Claude adversarial: [from subagent]
|
||||
Unique to Codex: [from codex adversarial or code review, if ran]
|
||||
Models used: Claude structured ✓ Claude adversarial ✓/✗ Codex ✓/✗
|
||||
════════════════════════════════════════════════════════════
|
||||
```
|
||||
|
||||
High-confidence findings (agreed on by multiple sources) should be prioritized for fixes.
|
||||
|
||||
---
|
||||
|
||||
## Capture Learnings
|
||||
|
||||
If you discovered a non-obvious pattern, pitfall, or architectural insight during
|
||||
this session, log it for future sessions:
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"ship","type":"TYPE","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":N,"source":"SOURCE","files":["path/to/relevant/file"]}'
|
||||
```
|
||||
|
||||
**Types:** `pattern` (reusable approach), `pitfall` (what NOT to do), `preference`
|
||||
(user stated), `architecture` (structural decision), `tool` (library/framework insight),
|
||||
`operational` (project environment/CLI/workflow knowledge).
|
||||
|
||||
**Sources:** `observed` (you found this in the code), `user-stated` (user told you),
|
||||
`inferred` (AI deduction), `cross-model` (both Claude and Codex agree).
|
||||
|
||||
**Confidence:** 1-10. Be honest. An observed pattern you verified in the code is 8-9.
|
||||
An inference you're not sure about is 4-5. A user preference they explicitly stated is 10.
|
||||
|
||||
**files:** Include the specific file paths this learning references. This enables
|
||||
staleness detection: if those files are later deleted, the learning can be flagged.
|
||||
|
||||
**Only log genuine discoveries.** Don't log obvious things. Don't log things the user
|
||||
already knows. A good test: would this insight save time in a future session? If yes, log it.
|
||||
|
||||
|
||||
|
||||
### Refresh learnings for the headline feature on this branch
|
||||
|
||||
The top-of-skill learnings pull was keyed to "release ship" broadly. Before the VERSION/CHANGELOG step, re-pull learnings keyed to THIS branch's headline feature so any prior version-bump or CHANGELOG pitfalls for similar features surface.
|
||||
|
||||
Pick ONE keyword that names the headline feature you're shipping. The keyword should be a noun: the primary skill or module name, the central feature noun, or the binary you changed. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
|
||||
|
||||
Worked examples (ship-specific): good keywords are `learnings-search`, `pacing`, `worktree-ship`. Bad: `the branch headline`, `v1.31.1.0`, `feat: token-or search`.
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
|
||||
```
|
||||
|
||||
If any learnings come back, name which one applies to the version bump or CHANGELOG framing in one sentence. If none come back, continue without reference — the absence is itself useful information.
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
{{ADVERSARIAL_STEP}}
|
||||
|
||||
{{LEARNINGS_LOG}}
|
||||
|
||||
{{GBRAIN_SAVE_RESULTS}}
|
||||
|
||||
### Refresh learnings for the headline feature on this branch
|
||||
|
||||
The top-of-skill learnings pull was keyed to "release ship" broadly. Before the VERSION/CHANGELOG step, re-pull learnings keyed to THIS branch's headline feature so any prior version-bump or CHANGELOG pitfalls for similar features surface.
|
||||
|
||||
Pick ONE keyword that names the headline feature you're shipping. The keyword should be a noun: the primary skill or module name, the central feature noun, or the binary you changed. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
|
||||
|
||||
Worked examples (ship-specific): good keywords are `learnings-search`, `pacing`, `worktree-ship`. Bad: `the branch headline`, `v1.31.1.0`, `feat: token-or search`.
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
|
||||
```
|
||||
|
||||
If any learnings come back, name which one applies to the version bump or CHANGELOG framing in one sentence. If none come back, continue without reference — the absence is itself useful information.
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
<!-- AUTO-GENERATED from changelog.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 13: CHANGELOG (auto-generate)
|
||||
|
||||
1. Read `CHANGELOG.md` header to know the format.
|
||||
|
||||
2. **First, enumerate every commit on the branch:**
|
||||
```bash
|
||||
git log <base>..HEAD --oneline
|
||||
```
|
||||
Copy the full list. Count the commits. You will use this as a checklist.
|
||||
|
||||
3. **Read the full diff** to understand what each commit actually changed:
|
||||
```bash
|
||||
git diff <base>...HEAD
|
||||
```
|
||||
|
||||
4. **Group commits by theme** before writing anything. Common themes:
|
||||
- New features / capabilities
|
||||
- Performance improvements
|
||||
- Bug fixes
|
||||
- Dead code removal / cleanup
|
||||
- Infrastructure / tooling / tests
|
||||
- Refactoring
|
||||
|
||||
5. **Write the CHANGELOG entry** covering ALL groups:
|
||||
- If existing CHANGELOG entries on the branch already cover some commits, replace them with one unified entry for the new version
|
||||
- Categorize changes into applicable sections:
|
||||
- `### Added` — new features
|
||||
- `### Changed` — changes to existing functionality
|
||||
- `### Fixed` — bug fixes
|
||||
- `### Removed` — removed features
|
||||
- Write concise, descriptive bullet points
|
||||
- Insert after the file header (line 5), dated today
|
||||
- Format: `## [X.Y.Z.W] - YYYY-MM-DD`
|
||||
- **Voice:** Lead with what the user can now **do** that they couldn't before. Use plain language, not implementation details. Never mention TODOS.md, internal tracking, or contributor-facing details.
|
||||
|
||||
6. **Cross-check:** Compare your CHANGELOG entry against the commit list from step 2.
|
||||
Every commit must map to at least one bullet point. If any commit is unrepresented,
|
||||
add it now. If the branch has N commits spanning K themes, the CHANGELOG must
|
||||
reflect all K themes.
|
||||
|
||||
**Do NOT ask the user to describe changes.** Infer from the diff and commit history.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
{{CHANGELOG_WORKFLOW}}
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,51 @@
|
|||
<!-- AUTO-GENERATED from greptile.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 10: Address Greptile review comments (if PR exists)
|
||||
|
||||
**Dispatch the fetch + classification as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent pulls every Greptile comment, runs the escalation detection algorithm, and classifies each comment. Parent receives a structured list and handles user interaction + file edits.
|
||||
|
||||
**Subagent prompt:**
|
||||
|
||||
> You are classifying Greptile review comments for a /ship workflow. Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps. Do NOT fix code, do NOT reply to comments, do NOT commit — report only.
|
||||
>
|
||||
> For each comment, assign: `classification` (`valid_actionable`, `already_fixed`, `false_positive`, `suppressed`), `escalation_tier` (1 or 2), the file:line or [top-level] tag, body summary, and permalink URL.
|
||||
>
|
||||
> If no PR exists, `gh` fails, the API errors, or there are zero comments, output: `{"total":0,"comments":[]}` and stop.
|
||||
>
|
||||
> Otherwise, output a single JSON object on the LAST LINE of your response:
|
||||
> `{"total":N,"comments":[{"classification":"...","escalation_tier":N,"ref":"file:line","summary":"...","permalink":"url"},...]}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
Parse the LAST line as JSON.
|
||||
|
||||
If `total` is 0, skip this step silently. Continue to Step 12.
|
||||
|
||||
Otherwise, print: `+ {total} Greptile comments ({valid_actionable} valid, {already_fixed} already fixed, {false_positive} FP)`.
|
||||
|
||||
For each comment in `comments`:
|
||||
|
||||
**VALID & ACTIONABLE:** Use AskUserQuestion with:
|
||||
- The comment (file:line or [top-level] + body summary + permalink URL)
|
||||
- `RECOMMENDATION: Choose A because [one-line reason]`
|
||||
- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
|
||||
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
|
||||
|
||||
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history (type: already-fixed)
|
||||
|
||||
**FALSE POSITIVE:** Use AskUserQuestion:
|
||||
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
|
||||
- Options:
|
||||
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
|
||||
- B) Fix it anyway (if trivial)
|
||||
- C) Ignore silently
|
||||
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
|
||||
|
||||
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
**After all comments are resolved:** If any fixes were applied, the tests from Step 5 are now stale. **Re-run tests** (Step 5) before continuing to Step 12. If no fixes were applied, continue to Step 12.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
## Step 10: Address Greptile review comments (if PR exists)
|
||||
|
||||
**Dispatch the fetch + classification as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent pulls every Greptile comment, runs the escalation detection algorithm, and classifies each comment. Parent receives a structured list and handles user interaction + file edits.
|
||||
|
||||
**Subagent prompt:**
|
||||
|
||||
> You are classifying Greptile review comments for a /ship workflow. Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps. Do NOT fix code, do NOT reply to comments, do NOT commit — report only.
|
||||
>
|
||||
> For each comment, assign: `classification` (`valid_actionable`, `already_fixed`, `false_positive`, `suppressed`), `escalation_tier` (1 or 2), the file:line or [top-level] tag, body summary, and permalink URL.
|
||||
>
|
||||
> If no PR exists, `gh` fails, the API errors, or there are zero comments, output: `{"total":0,"comments":[]}` and stop.
|
||||
>
|
||||
> Otherwise, output a single JSON object on the LAST LINE of your response:
|
||||
> `{"total":N,"comments":[{"classification":"...","escalation_tier":N,"ref":"file:line","summary":"...","permalink":"url"},...]}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
Parse the LAST line as JSON.
|
||||
|
||||
If `total` is 0, skip this step silently. Continue to Step 12.
|
||||
|
||||
Otherwise, print: `+ {total} Greptile comments ({valid_actionable} valid, {already_fixed} already fixed, {false_positive} FP)`.
|
||||
|
||||
For each comment in `comments`:
|
||||
|
||||
**VALID & ACTIONABLE:** Use AskUserQuestion with:
|
||||
- The comment (file:line or [top-level] + body summary + permalink URL)
|
||||
- `RECOMMENDATION: Choose A because [one-line reason]`
|
||||
- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
|
||||
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
|
||||
|
||||
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history (type: already-fixed)
|
||||
|
||||
**FALSE POSITIVE:** Use AskUserQuestion:
|
||||
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
|
||||
- Options:
|
||||
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
|
||||
- B) Fix it anyway (if trivial)
|
||||
- C) Ignore silently
|
||||
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
|
||||
|
||||
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
**After all comments are resolved:** If any fixes were applied, the tests from Step 5 are now stale. **Re-run tests** (Step 5) before continuing to Step 12. If no fixes were applied, continue to Step 12.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,56 @@
|
|||
{
|
||||
"$schema": "https://gstack.dev/schemas/section-manifest.json",
|
||||
"skill": "ship",
|
||||
"version": 1,
|
||||
"note": "PASSIVE registry (v2 plan T9 / CM2). Fields are IDs, file paths, human titles, and human-readable trigger text ONLY. The skeleton's decision-tree prose is the ONLY place that decides WHEN to read a section; required-reads live in the E2E fixtures. No machine predicate here — see docs/designs/v2_PLAN.md:663.",
|
||||
"sections": [
|
||||
{
|
||||
"id": "tests",
|
||||
"file": "tests.md",
|
||||
"title": "Test bootstrap, run, triage + eval suites",
|
||||
"trigger": "running the test suites and (if prompt files changed) the eval suites (Steps 4-6)"
|
||||
},
|
||||
{
|
||||
"id": "test-coverage",
|
||||
"file": "test-coverage.md",
|
||||
"title": "Test coverage audit (subagent)",
|
||||
"trigger": "auditing test coverage of the diff (Step 7)"
|
||||
},
|
||||
{
|
||||
"id": "plan-completion",
|
||||
"file": "plan-completion.md",
|
||||
"title": "Plan completion + verification audit (subagent)",
|
||||
"trigger": "auditing plan completion, verification, and scope drift (Step 8)"
|
||||
},
|
||||
{
|
||||
"id": "review-army",
|
||||
"file": "review-army.md",
|
||||
"title": "Pre-landing review + specialist army",
|
||||
"trigger": "the pre-landing review and specialist dispatch (Step 9)"
|
||||
},
|
||||
{
|
||||
"id": "greptile",
|
||||
"file": "greptile.md",
|
||||
"title": "Address Greptile review comments",
|
||||
"trigger": "addressing Greptile review comments when a PR exists (Step 10)"
|
||||
},
|
||||
{
|
||||
"id": "adversarial",
|
||||
"file": "adversarial.md",
|
||||
"title": "Adversarial review + learnings refresh",
|
||||
"trigger": "the adversarial review and learnings capture (Step 11)"
|
||||
},
|
||||
{
|
||||
"id": "changelog",
|
||||
"file": "changelog.md",
|
||||
"title": "CHANGELOG entry (release-summary + itemized)",
|
||||
"trigger": "writing the CHANGELOG entry (Step 13)"
|
||||
},
|
||||
{
|
||||
"id": "pr-body",
|
||||
"file": "pr-body.md",
|
||||
"title": "Documentation sync + PR/MR creation",
|
||||
"trigger": "syncing docs and creating or updating the PR/MR (Steps 18-19)"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,322 @@
|
|||
<!-- AUTO-GENERATED from plan-completion.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 8: Plan Completion Audit
|
||||
|
||||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent reads the plan file and every referenced code file in its own fresh context. Parent gets only the conclusion.
|
||||
|
||||
**Subagent prompt:** Pass these instructions to the subagent:
|
||||
|
||||
> You are running a ship-workflow plan completion audit. The base branch is `<base>`. Use `git diff <base>...HEAD` to see what shipped. Do not commit or push — report only.
|
||||
>
|
||||
> ### Plan File Discovery
|
||||
|
||||
1. **Conversation context (primary):** Check if there is an active plan file in this conversation. The host agent's system messages include plan file paths when in plan mode. If found, use it directly — this is the most reliable signal.
|
||||
|
||||
2. **Content-based search (fallback):** If no plan file is referenced in conversation context, search by content:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-')
|
||||
REPO=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)")
|
||||
# Compute project slug for ~/.gstack/projects/ lookup
|
||||
_PLAN_SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-' | tr -cd 'a-zA-Z0-9._-') || true
|
||||
_PLAN_SLUG="${_PLAN_SLUG:-$(basename "$PWD" | tr -cd 'a-zA-Z0-9._-')}"
|
||||
# Search common plan file locations (project designs first, then personal/local)
|
||||
for PLAN_DIR in "$HOME/.gstack/projects/$_PLAN_SLUG" "$HOME/.claude/plans" "$HOME/.codex/plans" ".gstack/plans"; do
|
||||
[ -d "$PLAN_DIR" ] || continue
|
||||
PLAN=$(ls -t "$PLAN_DIR"/*.md 2>/dev/null | xargs grep -l "$BRANCH" 2>/dev/null | head -1)
|
||||
[ -z "$PLAN" ] && PLAN=$(ls -t "$PLAN_DIR"/*.md 2>/dev/null | xargs grep -l "$REPO" 2>/dev/null | head -1)
|
||||
[ -z "$PLAN" ] && PLAN=$(find "$PLAN_DIR" -name '*.md' -mmin -1440 -maxdepth 1 2>/dev/null | xargs ls -t 2>/dev/null | head -1)
|
||||
[ -n "$PLAN" ] && break
|
||||
done
|
||||
[ -n "$PLAN" ] && echo "PLAN_FILE: $PLAN" || echo "NO_PLAN_FILE"
|
||||
```
|
||||
|
||||
3. **Validation:** If a plan file was found via content-based search (not conversation context), read the first 20 lines and verify it is relevant to the current branch's work. If it appears to be from a different project or feature, treat as "no plan file found."
|
||||
|
||||
**Error handling:**
|
||||
- No plan file found → skip with "No plan file detected — skipping."
|
||||
- Plan file found but unreadable (permissions, encoding) → skip with "Plan file found but unreadable — skipping."
|
||||
|
||||
### Actionable Item Extraction
|
||||
|
||||
Read the plan file. Extract every actionable item — anything that describes work to be done. Look for:
|
||||
|
||||
- **Checkbox items:** `- [ ] ...` or `- [x] ...`
|
||||
- **Numbered steps** under implementation headings: "1. Create ...", "2. Add ...", "3. Modify ..."
|
||||
- **Imperative statements:** "Add X to Y", "Create a Z service", "Modify the W controller"
|
||||
- **File-level specifications:** "New file: path/to/file.ts", "Modify path/to/existing.rb"
|
||||
- **Test requirements:** "Test that X", "Add test for Y", "Verify Z"
|
||||
- **Data model changes:** "Add column X to table Y", "Create migration for Z"
|
||||
|
||||
**Ignore:**
|
||||
- Context/Background sections (`## Context`, `## Background`, `## Problem`)
|
||||
- Questions and open items (marked with ?, "TBD", "TODO: decide")
|
||||
- Review report sections (`## GSTACK REVIEW REPORT`)
|
||||
- Explicitly deferred items ("Future:", "Out of scope:", "NOT in scope:", "P2:", "P3:", "P4:")
|
||||
- CEO Review Decisions sections (these record choices, not work items)
|
||||
|
||||
**Cap:** Extract at most 50 items. If the plan has more, note: "Showing top 50 of N plan items — full list in plan file."
|
||||
|
||||
**No items found:** If the plan contains no extractable actionable items, skip with: "Plan file contains no actionable items — skipping completion audit."
|
||||
|
||||
For each item, note:
|
||||
- The item text (verbatim or concise summary)
|
||||
- Its category: CODE | TEST | MIGRATION | CONFIG | DOCS
|
||||
|
||||
### Verification Mode
|
||||
|
||||
Before judging completion, classify HOW each item can be verified. The diff alone cannot prove every kind of work. Items outside the current repo or system are structurally invisible to `git diff`.
|
||||
|
||||
- **DIFF-VERIFIABLE** — A code change in this repo would manifest in `git diff <base>...HEAD`. Examples: "add UserService" (file appears), "validate input X" (validation logic appears), "create users table" (migration file appears).
|
||||
- **CROSS-REPO** — Item names a file or change in a sibling repo (e.g., `domain-hq/docs/dashboard.md`, `~/Development/<other-repo>/...`). The current diff CANNOT prove this.
|
||||
- **EXTERNAL-STATE** — Item names state in an external system: Supabase config/RLS, Cloudflare DNS, Vercel env vars, OAuth provider allowlists, third-party SaaS, DNS records. The current diff CANNOT prove this.
|
||||
- **CONTENT-SHAPE** — Item requires a file to follow a specific convention. If the file is in this repo: diff-verifiable. If in another repo or system: see CROSS-REPO / EXTERNAL-STATE.
|
||||
|
||||
**Verification dispatch:**
|
||||
|
||||
- **DIFF-VERIFIABLE** → cross-reference against diff (next section).
|
||||
- **CROSS-REPO** → if the sibling repo is reachable on disk (try `~/Development/<repo>/`, `~/code/<repo>/`, the parent of the current repo), run `[ -f <path> ]` to check file existence. File exists → DONE (cite path). File missing → NOT DONE (cite path). Path unreachable → UNVERIFIABLE (cite what needs manual check).
|
||||
- **EXTERNAL-STATE** → UNVERIFIABLE. Cite the system and the specific check the user must perform.
|
||||
- **CONTENT-SHAPE in another repo** → if the file exists, run any project-detected validator (see "Validator detection" below) before falling back to UNVERIFIABLE. With a validator: pass → DONE; fail → NOT DONE (cite validator output). No validator available: classify UNVERIFIABLE and cite both the file path and the convention to confirm.
|
||||
|
||||
**Path concreteness rule.** If a plan item names a *concrete filesystem path* (absolute, `~/...`, or `<sibling-repo>/<file>`), it MUST be classified DONE or NOT DONE based on `[ -f <path> ]`. UNVERIFIABLE is only valid when the path is genuinely abstract ("Cloudflare DNS", "Supabase allowlist") or the sibling root is unreachable on this machine. "I don't want to check" is not unreachable.
|
||||
|
||||
**Validator detection.** Before falling back to UNVERIFIABLE on a CONTENT-SHAPE item, scan the target repo's `package.json` for any script matching `validate-*`, `lint-wiki`, `check-docs`, or similar. If found, invoke it with the relevant path argument (e.g., `npm run validate-wiki -- <path>`). For multi-target validators (e.g., `validate-wiki --all`), run once and reconcile per-item from the output. A passing validator promotes the item from UNVERIFIABLE to DONE; a failing one demotes to NOT DONE.
|
||||
|
||||
**Honesty rule.** Do NOT classify an item as DONE just because related code shipped. Code that *handles* a deliverable is not the deliverable. Shipping a markdown-extraction library is not the same as shipping the markdown file. When in doubt between DONE and UNVERIFIABLE, prefer UNVERIFIABLE — better to surface a confirmation prompt than silently miss a deliverable.
|
||||
|
||||
### Cross-Reference Against Diff
|
||||
|
||||
Run `git diff origin/<base>...HEAD` and `git log origin/<base>..HEAD --oneline` to understand what was implemented.
|
||||
|
||||
For each extracted plan item, run the verification dispatch from the previous section, then classify:
|
||||
|
||||
- **DONE** — Clear evidence the item shipped. Cite the specific file(s) changed in the diff for DIFF-VERIFIABLE items, or the verified path that exists for CROSS-REPO items with a reachable sibling repo.
|
||||
- **PARTIAL** — Some work toward this item exists but is incomplete (e.g., model created but controller missing, function exists but edge cases not handled).
|
||||
- **NOT DONE** — Verification ran and produced negative evidence (file missing, code absent in diff, sibling-repo file confirmed absent).
|
||||
- **CHANGED** — The item was implemented using a different approach than the plan described, but the same goal is achieved. Note the difference.
|
||||
- **UNVERIFIABLE** — The diff and any reachable sibling-repo checks cannot prove or disprove this. Always applies to EXTERNAL-STATE items and to CROSS-REPO items where the sibling repo isn't reachable. Cite the specific manual verification the user must perform (e.g., "check Cloudflare DNS shows DNS-only mode for dashboard.example.com", "confirm /docs/dashboard.md exists in domain-hq repo").
|
||||
|
||||
**Be conservative with DONE** — require clear evidence. A file being touched is not enough; the specific functionality described must be present.
|
||||
**Be generous with CHANGED** — if the goal is met by different means, that counts as addressed.
|
||||
**Be honest with UNVERIFIABLE** — better to surface 5 items the user must manually confirm than silently classify them DONE.
|
||||
|
||||
### Output Format
|
||||
|
||||
```
|
||||
PLAN COMPLETION AUDIT
|
||||
═══════════════════════════════
|
||||
Plan: {plan file path}
|
||||
|
||||
## Implementation Items
|
||||
[DONE] Create UserService — src/services/user_service.rb (+142 lines)
|
||||
[PARTIAL] Add validation — model validates but missing controller checks
|
||||
[NOT DONE] Add caching layer — no cache-related changes in diff
|
||||
[CHANGED] "Redis queue" → implemented with Sidekiq instead
|
||||
|
||||
## Test Items
|
||||
[DONE] Unit tests for UserService — test/services/user_service_test.rb
|
||||
[NOT DONE] E2E test for signup flow
|
||||
|
||||
## Migration Items
|
||||
[DONE] Create users table — db/migrate/20240315_create_users.rb
|
||||
|
||||
## Cross-Repo / External Items
|
||||
[DONE] sibling-repo has /docs/dashboard.md — verified at ~/Development/sibling-repo/docs/dashboard.md
|
||||
[UNVERIFIABLE] Cloudflare DNS-only on api.example.com — external system, manual check required
|
||||
[UNVERIFIABLE] Supabase auth allowlist contains user email — external system, confirm in Supabase dashboard
|
||||
|
||||
─────────────────────────────────
|
||||
COMPLETION: 5/9 DONE, 1 PARTIAL, 1 NOT DONE, 1 CHANGED, 2 UNVERIFIABLE
|
||||
─────────────────────────────────
|
||||
```
|
||||
|
||||
### Gate Logic
|
||||
|
||||
After producing the completion checklist, evaluate in priority order:
|
||||
|
||||
1. **Any NOT DONE items** (highest priority — known missing work). Use AskUserQuestion:
|
||||
- Show the completion checklist above
|
||||
- "{N} items from the plan are NOT DONE. These were part of the original plan but are missing from the implementation."
|
||||
- RECOMMENDATION: depends on item count and severity. If 1-2 minor items (docs, config), recommend B. If core functionality is missing, recommend A.
|
||||
- Options:
|
||||
A) Stop — implement the missing items before shipping
|
||||
B) Ship anyway — defer these to a follow-up (will create P1 TODOs in Step 5.5)
|
||||
C) These items were intentionally dropped — remove from scope
|
||||
- If A: STOP. List the missing items for the user to implement.
|
||||
- If B: Continue. For each NOT DONE item, create a P1 TODO in Step 5.5 with "Deferred from plan: {plan file path}".
|
||||
- If C: Continue. Note in PR body: "Plan items intentionally dropped: {list}."
|
||||
|
||||
2. **Any UNVERIFIABLE items** (silent gaps — the diff cannot prove them either way). Only fires after NOT DONE is resolved or absent.
|
||||
|
||||
**Per-item confirmation is mandatory.** Do NOT use a single AskUserQuestion to blanket-confirm all UNVERIFIABLE items. Blanket confirmation is the failure mode that surfaced in VAS-449 (user clicks A without opening any file). Instead:
|
||||
|
||||
- Loop through UNVERIFIABLE items one at a time.
|
||||
- For each item, use AskUserQuestion with the item's *specific* manual check (e.g., "Confirm: does `~/Development/domain-hq/docs/dashboard.md` exist?", not "Have you checked all items?").
|
||||
- Options per item:
|
||||
Y) Confirmed done — cite what you verified (free-text, embedded in PR body)
|
||||
N) Not done — block ship; treat as NOT DONE and re-enter the priority-1 gate
|
||||
D) Intentionally dropped — note in PR body: "Plan item intentionally dropped: {item}"
|
||||
- RECOMMENDATION per item: Y if the item is concrete and easily verified; N if it's critical-path (auth, DNS, deliverables to other repos) and the user shows hesitation.
|
||||
|
||||
**Exit conditions:**
|
||||
- Any N: STOP. Surface the missing items, suggest re-running /ship after they're addressed.
|
||||
- All Y or D: Continue. Embed `## Plan Completion — Manual Verifications` section in PR body listing each Y'd item with the user's free-text evidence and each D'd item with "intentionally dropped".
|
||||
|
||||
**Cap.** If there are more than 5 UNVERIFIABLE items, present them as a numbered list first and ask whether the user wants to (1) confirm each individually, (2) stop and reduce scope, or (3) explicitly accept blanket-confirmation with the warning that this is the VAS-449 failure shape. Default and recommended option is (1).
|
||||
|
||||
3. **Only PARTIAL items (no NOT DONE, no UNVERIFIABLE):** Continue with a note in the PR body. Not blocking.
|
||||
|
||||
4. **All DONE or CHANGED:** Pass. "Plan completion: PASS — all items addressed." Continue.
|
||||
|
||||
**No plan file found:** Skip entirely. "No plan file detected — skipping plan completion audit."
|
||||
|
||||
**Include in PR body (Step 8):** Add a `## Plan Completion` section with the checklist summary.
|
||||
>
|
||||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"unverifiable":N,"summary":"<markdown checklist for PR body>"}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Parse the LAST line of the subagent's output as JSON.
|
||||
2. Store `done`, `deferred`, `unverifiable` for Step 20 metrics; use `summary` in PR body.
|
||||
3. If `deferred > 0` or `unverifiable > 0` and no user override, present the items via the appropriate AskUserQuestion (see Gate Logic priority order above) before continuing.
|
||||
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19). If `unverifiable > 0` and the user picked option A in the UNVERIFIABLE gate, also embed `## Plan Completion — Manual Verifications` listing each user-confirmed item.
|
||||
|
||||
**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline (parent processes the same plan-extraction + classification logic). If the inline fallback also fails (e.g., plan file unreadable, parser error), do NOT silently pass — surface the failure as an explicit AskUserQuestion: "Plan Completion audit could not run ({reason}). Options: (A) Skip audit and ship anyway — record that the audit was skipped in PR body and Step 20 metrics; (B) Stop and fix the audit." Default and recommended option is (B). Silent fail-open is the failure shape that VAS-449 surfaced.
|
||||
|
||||
---
|
||||
|
||||
## Step 8.1: Plan Verification
|
||||
|
||||
Automatically verify the plan's testing/verification steps using the `/qa-only` skill.
|
||||
|
||||
### 1. Check for verification section
|
||||
|
||||
Using the plan file already discovered in Step 8, look for a verification section. Match any of these headings: `## Verification`, `## Test plan`, `## Testing`, `## How to test`, `## Manual testing`, or any section with verification-flavored items (URLs to visit, things to check visually, interactions to test).
|
||||
|
||||
**If no verification section found:** Skip with "No verification steps found in plan — skipping auto-verification."
|
||||
**If no plan file was found in Step 8:** Skip (already handled).
|
||||
|
||||
### 2. Check for running dev server
|
||||
|
||||
Before invoking browse-based verification, check if a dev server is reachable:
|
||||
|
||||
```bash
|
||||
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000 2>/dev/null || \
|
||||
curl -s -o /dev/null -w '%{http_code}' http://localhost:8080 2>/dev/null || \
|
||||
curl -s -o /dev/null -w '%{http_code}' http://localhost:5173 2>/dev/null || \
|
||||
curl -s -o /dev/null -w '%{http_code}' http://localhost:4000 2>/dev/null || echo "NO_SERVER"
|
||||
```
|
||||
|
||||
**If NO_SERVER:** Skip with "No dev server detected — skipping plan verification. Run /qa separately after deploying."
|
||||
|
||||
### 3. Invoke /qa-only inline
|
||||
|
||||
Read the `/qa-only` skill from disk:
|
||||
|
||||
```bash
|
||||
cat ${CLAUDE_SKILL_DIR}/../qa-only/SKILL.md
|
||||
```
|
||||
|
||||
**If unreadable:** Skip with "Could not load /qa-only — skipping plan verification."
|
||||
|
||||
Follow the /qa-only workflow with these modifications:
|
||||
- **Skip the preamble** (already handled by /ship)
|
||||
- **Use the plan's verification section as the primary test input** — treat each verification item as a test case
|
||||
- **Use the detected dev server URL** as the base URL
|
||||
- **Skip the fix loop** — this is report-only verification during /ship
|
||||
- **Cap at the verification items from the plan** — do not expand into general site QA
|
||||
|
||||
### 4. Gate logic
|
||||
|
||||
- **All verification items PASS:** Continue silently. "Plan verification: PASS."
|
||||
- **Any FAIL:** Use AskUserQuestion:
|
||||
- Show the failures with screenshot evidence
|
||||
- RECOMMENDATION: Choose A if failures indicate broken functionality. Choose B if cosmetic only.
|
||||
- Options:
|
||||
A) Fix the failures before shipping (recommended for functional issues)
|
||||
B) Ship anyway — known issues (acceptable for cosmetic issues)
|
||||
- **No verification section / no server / unreadable skill:** Skip (non-blocking).
|
||||
|
||||
### 5. Include in PR body
|
||||
|
||||
Add a `## Verification Results` section to the PR body (Step 19):
|
||||
- If verification ran: summary of results (N PASS, M FAIL, K SKIPPED)
|
||||
- If skipped: reason for skipping (no plan, no server, no verification section)
|
||||
|
||||
## Prior Learnings
|
||||
|
||||
Search for relevant learnings from previous sessions:
|
||||
|
||||
```bash
|
||||
_CROSS_PROJ=$(~/.claude/skills/gstack/bin/gstack-config get cross_project_learnings 2>/dev/null || echo "unset")
|
||||
echo "CROSS_PROJECT: $_CROSS_PROJ"
|
||||
if [ "$_CROSS_PROJ" = "true" ]; then
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" --cross-project 2>/dev/null || true
|
||||
else
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" 2>/dev/null || true
|
||||
fi
|
||||
```
|
||||
|
||||
If `CROSS_PROJECT` is `unset` (first time): Use AskUserQuestion:
|
||||
|
||||
> gstack can search learnings from your other projects on this machine to find
|
||||
> patterns that might apply here. This stays local (no data leaves your machine).
|
||||
> Recommended for solo developers. Skip if you work on multiple client codebases
|
||||
> where cross-contamination would be a concern.
|
||||
|
||||
Options:
|
||||
- A) Enable cross-project learnings (recommended)
|
||||
- B) Keep learnings project-scoped only
|
||||
|
||||
If A: run `~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings true`
|
||||
If B: run `~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings false`
|
||||
|
||||
Then re-run the search with the appropriate flag.
|
||||
|
||||
If learnings are found, incorporate them into your analysis. When a review finding
|
||||
matches a past learning, display:
|
||||
|
||||
**"Prior learning applied: [key] (confidence N/10, from [date])"**
|
||||
|
||||
This makes the compounding visible. The user should see that gstack is getting
|
||||
smarter on their codebase over time.
|
||||
|
||||
## Step 8.2: Scope Drift Detection
|
||||
|
||||
Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
|
||||
|
||||
1. Read `TODOS.md` (if it exists). Read PR description (`gh pr view --json body --jq .body 2>/dev/null || true`).
|
||||
Read commit messages (`git log origin/<base>..HEAD --oneline`).
|
||||
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
|
||||
2. Identify the **stated intent** — what was this branch supposed to accomplish?
|
||||
3. Run `DIFF_BASE=$(git merge-base origin/<base> HEAD) && git diff "$DIFF_BASE" --stat` and compare the files changed against the stated intent.
|
||||
|
||||
4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
|
||||
|
||||
**SCOPE CREEP detection:**
|
||||
- Files changed that are unrelated to the stated intent
|
||||
- New features or refactors not mentioned in the plan
|
||||
- "While I was in there..." changes that expand blast radius
|
||||
|
||||
**MISSING REQUIREMENTS detection:**
|
||||
- Requirements from TODOS.md/PR description not addressed in the diff
|
||||
- Test coverage gaps for stated requirements
|
||||
- Partial implementations (started but not finished)
|
||||
|
||||
5. Output (before the main review begins):
|
||||
\`\`\`
|
||||
Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
|
||||
Intent: <1-line summary of what was requested>
|
||||
Delivered: <1-line summary of what the diff actually does>
|
||||
[If drift: list each out-of-scope change]
|
||||
[If missing: list each unaddressed requirement]
|
||||
\`\`\`
|
||||
|
||||
6. This is **INFORMATIONAL** — does not block the review. Proceed to the next step.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
## Step 8: Plan Completion Audit
|
||||
|
||||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent reads the plan file and every referenced code file in its own fresh context. Parent gets only the conclusion.
|
||||
|
||||
**Subagent prompt:** Pass these instructions to the subagent:
|
||||
|
||||
> You are running a ship-workflow plan completion audit. The base branch is `<base>`. Use `git diff <base>...HEAD` to see what shipped. Do not commit or push — report only.
|
||||
>
|
||||
> {{PLAN_COMPLETION_AUDIT_SHIP}}
|
||||
>
|
||||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"unverifiable":N,"summary":"<markdown checklist for PR body>"}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Parse the LAST line of the subagent's output as JSON.
|
||||
2. Store `done`, `deferred`, `unverifiable` for Step 20 metrics; use `summary` in PR body.
|
||||
3. If `deferred > 0` or `unverifiable > 0` and no user override, present the items via the appropriate AskUserQuestion (see Gate Logic priority order above) before continuing.
|
||||
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19). If `unverifiable > 0` and the user picked option A in the UNVERIFIABLE gate, also embed `## Plan Completion — Manual Verifications` listing each user-confirmed item.
|
||||
|
||||
**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline (parent processes the same plan-extraction + classification logic). If the inline fallback also fails (e.g., plan file unreadable, parser error), do NOT silently pass — surface the failure as an explicit AskUserQuestion: "Plan Completion audit could not run ({reason}). Options: (A) Skip audit and ship anyway — record that the audit was skipped in PR body and Step 20 metrics; (B) Stop and fix the audit." Default and recommended option is (B). Silent fail-open is the failure shape that VAS-449 surfaced.
|
||||
|
||||
---
|
||||
|
||||
{{PLAN_VERIFICATION_EXEC}}
|
||||
|
||||
{{LEARNINGS_SEARCH:query=release ship version changelog merge pr}}
|
||||
|
||||
{{SCOPE_DRIFT}}
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,207 @@
|
|||
<!-- AUTO-GENERATED from pr-body.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 18: Documentation sync (via subagent, before PR creation)
|
||||
|
||||
**Dispatch /document-release as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent gets a fresh context window — zero rot from the preceding 17 steps. It also runs the **full** `/document-release` workflow (with CHANGELOG clobber protection, doc exclusions, risky-change gates, named staging, race-safe PR body editing) rather than a weaker reimplementation.
|
||||
|
||||
**Sequencing:** This step runs AFTER Step 17 (Push) and BEFORE Step 19 (Create PR). The PR is created once from final HEAD with the `## Documentation` section baked into the initial body. No create-then-re-edit dance.
|
||||
|
||||
**Subagent prompt:**
|
||||
|
||||
> You are executing the /document-release workflow after a code push. Read the full skill file `${HOME}/.claude/skills/gstack/document-release/SKILL.md` and execute its complete workflow end-to-end, including CHANGELOG clobber protection, doc exclusions, risky-change gates, and named staging. Do NOT attempt to edit the PR body — no PR exists yet. Branch: `<branch>`, base: `<base>`.
|
||||
>
|
||||
> After completing the workflow, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"files_updated":["README.md","CLAUDE.md",...],"commit_sha":"abc1234","pushed":true,"documentation_section":"<markdown block for PR body's ## Documentation section>"}`
|
||||
>
|
||||
> If no documentation files needed updating, output:
|
||||
> `{"files_updated":[],"commit_sha":null,"pushed":false,"documentation_section":null}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Parse the LAST line of the subagent's output as JSON.
|
||||
2. Store `documentation_section` — Step 19 embeds it in the PR body (or omits the section if null).
|
||||
3. If `files_updated` is non-empty, print: `Documentation synced: {files_updated.length} files updated, committed as {commit_sha}`.
|
||||
4. If `files_updated` is empty, print: `Documentation is current — no updates needed.`
|
||||
|
||||
**If the subagent fails or returns invalid JSON:** Print a warning and proceed to Step 19 without a `## Documentation` section. Do not block /ship on subagent failure. The user can run `/document-release` manually after the PR lands.
|
||||
|
||||
---
|
||||
|
||||
## Step 19: Create PR/MR
|
||||
|
||||
**Idempotency check:** Check if a PR/MR already exists for this branch.
|
||||
|
||||
**If GitHub:**
|
||||
```bash
|
||||
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
|
||||
```
|
||||
|
||||
**If GitLab:**
|
||||
```bash
|
||||
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
|
||||
```
|
||||
|
||||
If an **open** PR/MR already exists: **update** the PR body using `gh pr edit --body-file "$PR_BODY_FILE"` (GitHub) or `glab mr update -d ...` (GitLab). Always regenerate the PR body from scratch using this run's fresh results (test output, coverage audit, review findings, adversarial review, TODOS summary, documentation_section from Step 18). Never reuse stale PR body content from a prior run. **Run the same redaction scan-at-sink (PR body + title) as the create path (Step 19) before editing — scan the temp file, then `gh pr edit --body-file` from it.**
|
||||
|
||||
**Always update the PR title to start with `v$NEW_VERSION`.** PR titles use the workspace-aware format `v<NEW_VERSION> <type>: <summary>` — version ALWAYS first, no exceptions, no "custom title kept intentionally" escape hatch. The shared helper `bin/gstack-pr-title-rewrite.sh` is the single source of truth for the rule.
|
||||
|
||||
1. Read the current title: `CURRENT=$(gh pr view --json title -q .title)` (or `glab mr view -F json | jq -r .title`).
|
||||
2. Compute the corrected title: `NEW_TITLE=$(~/.claude/skills/gstack/bin/gstack-pr-title-rewrite.sh "$NEW_VERSION" "$CURRENT")`. The helper handles three cases: title already correct (no-op), title has a different `v<X.Y.Z.W>` prefix (replace it), or title has no version prefix (prepend one).
|
||||
3. If `NEW_TITLE` differs from `CURRENT`, run `gh pr edit --title "$NEW_TITLE"` (or `glab mr update -t "$NEW_TITLE"`).
|
||||
4. **Self-check:** re-fetch the title and assert it starts with `v$NEW_VERSION `. If it does not, retry the edit once. If still wrong, surface the failure to the user.
|
||||
|
||||
This keeps the title truthful when Step 12's queue-drift detection rebumps a stale version, and forces the format on PRs that were created without it.
|
||||
|
||||
Print the existing URL and continue to Step 20.
|
||||
|
||||
If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
|
||||
|
||||
The PR/MR body should contain these sections:
|
||||
|
||||
```
|
||||
## Summary
|
||||
<Summarize ALL changes being shipped. Run `git log <base>..HEAD --oneline` to enumerate
|
||||
every commit. Exclude the VERSION/CHANGELOG metadata commit (that's this PR's bookkeeping,
|
||||
not a substantive change). Group the remaining commits into logical sections (e.g.,
|
||||
"**Performance**", "**Dead Code Removal**", "**Infrastructure**"). Every substantive commit
|
||||
must appear in at least one section. If a commit's work isn't reflected in the summary,
|
||||
you missed it.>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 7, or "All new code paths have test coverage.">
|
||||
<If Step 7 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 9 code review, or "No issues found.">
|
||||
|
||||
## Design Review
|
||||
<If design review ran: "Design Review (lite): N findings — M auto-fixed, K skipped. AI Slop: clean/N issues.">
|
||||
<If no frontend files changed: "No frontend files changed — design review skipped.">
|
||||
|
||||
## Eval Results
|
||||
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
|
||||
|
||||
## Greptile Review
|
||||
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
|
||||
<If no Greptile comments found: "No Greptile comments.">
|
||||
<If no PR existed during Step 10: omit this section entirely>
|
||||
|
||||
## Scope Drift
|
||||
<If scope drift ran: "Scope Check: CLEAN" or list of drift/creep findings>
|
||||
<If no scope drift: omit this section>
|
||||
|
||||
## Plan Completion
|
||||
<If plan file found: completion checklist summary from Step 8>
|
||||
<If no plan file: "No plan file detected.">
|
||||
<If plan items deferred: list deferred items>
|
||||
|
||||
## Linked Spec
|
||||
<Auto-detect: look for /spec archives matching this branch via:
|
||||
eval "$(${ctx.paths.binDir}/gstack-paths)"
|
||||
eval "$(${ctx.paths.binDir}/gstack-slug)"
|
||||
CURRENT_BRANCH=$(git branch --show-current)
|
||||
SPEC_ARCHIVES="$GSTACK_STATE_ROOT/projects/$SLUG/specs"
|
||||
# Find newest archive whose spec_branch frontmatter matches current branch (or one of its
|
||||
# parents — if spec spawned worktree spec/<slug>-$$, the spawned worktree IS where /ship runs).
|
||||
SPEC_FILE=$(grep -l "^spec_branch: $CURRENT_BRANCH$" "$SPEC_ARCHIVES"/*.md 2>/dev/null | head -1)
|
||||
[ -z "$SPEC_FILE" ] && exit # no spec; omit this section entirely
|
||||
SPEC_ISSUE=$(grep "^spec_issue_number:" "$SPEC_FILE" | cut -d' ' -f2)
|
||||
[ -z "$SPEC_ISSUE" ] && exit # spec archive exists but no issue number; omit
|
||||
|
||||
# CONDITIONAL Closes #N (codex F4): only add when Plan Completion above is "complete".
|
||||
# If the plan completion gate from Step 8 reports any deferred or failed items, emit:
|
||||
# "Linked to #$SPEC_ISSUE (partial delivery — NOT auto-closing; close manually after follow-up)"
|
||||
# If Plan Completion is fully complete, emit:
|
||||
# "Closes #$SPEC_ISSUE"
|
||||
# and include the Closes #N line in the PR body so GitHub auto-closes on merge.>
|
||||
|
||||
<Format:
|
||||
Closes #<N>
|
||||
|
||||
This PR delivers the spec at <archive path relative to repo root>.
|
||||
Spec filed: <spec_filed_at from frontmatter>>
|
||||
|
||||
<If partial delivery, emit instead:
|
||||
Linked to #<N> (partial delivery — not auto-closing).
|
||||
Deferred items: <list from Plan Completion>.
|
||||
Close #<N> manually after follow-up lands.>
|
||||
|
||||
<If no /spec archive matches this branch: omit this entire section.>
|
||||
|
||||
## Verification Results
|
||||
<If verification ran: summary from Step 8.1 (N PASS, M FAIL, K SKIPPED)>
|
||||
<If skipped: reason (no plan, no server, no verification section)>
|
||||
<If not applicable: omit this section>
|
||||
|
||||
## TODOS
|
||||
<If items marked complete: bullet list of completed items with version>
|
||||
<If no items completed: "No TODO items completed in this PR.">
|
||||
<If TODOS.md created or reorganized: note that>
|
||||
<If TODOS.md doesn't exist and user skipped: omit this section>
|
||||
|
||||
## Documentation
|
||||
<Embed the `documentation_section` string returned by Step 18's subagent here, verbatim.>
|
||||
<If Step 18 returned `documentation_section: null` (no docs updated), omit this section entirely.>
|
||||
|
||||
## Test plan
|
||||
- [x] All Rails tests pass (N runs, 0 failures)
|
||||
- [x] All Vitest tests pass (N tests)
|
||||
|
||||
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
||||
```
|
||||
|
||||
#### Redaction scan (PR body + title) — runs before create AND edit
|
||||
|
||||
The PR body is world-readable on a public repo. Scan-at-sink before sending:
|
||||
write the composed body to a temp file, scan THAT file with the shared engine,
|
||||
and pass the same file to `gh`/`glab`. Wrap any Codex / Greptile / eval output
|
||||
sections in tool-attributed fences (` ```codex-review ` / ` ```greptile `) so the
|
||||
engine WARN-degrades the example credentials those tools quote instead of blocking
|
||||
the PR (a live-format credential inside the fence still blocks).
|
||||
|
||||
```bash
|
||||
REDACT_VIS=$(~/.claude/skills/gstack/bin/gstack-config get redact_repo_visibility 2>/dev/null)
|
||||
[ -z "$REDACT_VIS" ] && REDACT_VIS=$(gh repo view --json visibility -q .visibility 2>/dev/null | tr 'A-Z' 'a-z')
|
||||
REDACT_VIS="${REDACT_VIS:-unknown}"
|
||||
PR_BODY_FILE=$(mktemp)
|
||||
cat > "$PR_BODY_FILE" <<'PR_BODY_EOF'
|
||||
<PR body from above>
|
||||
PR_BODY_EOF
|
||||
~/.claude/skills/gstack/bin/gstack-redact --from-file "$PR_BODY_FILE" --repo-visibility "$REDACT_VIS" --self-email "$(git config user.email 2>/dev/null)" --json
|
||||
case $? in
|
||||
3) echo "BLOCKED — credential in PR body. Rotate + redact, do not create the PR."; exit 1 ;;
|
||||
2) echo "MEDIUM findings — confirm per finding (sterner on public) before proceeding." ;;
|
||||
esac
|
||||
# Also scan the title (short, single-line):
|
||||
printf '%s' "v$NEW_VERSION <type>: <summary>" | ~/.claude/skills/gstack/bin/gstack-redact --repo-visibility "$REDACT_VIS" --json
|
||||
```
|
||||
|
||||
HIGH blocks (exit 3, no skip). MEDIUM → AskUserQuestion (PII subset offers
|
||||
`--auto-redact`). Same scan runs before the `gh pr edit --body` path (Step 17).
|
||||
|
||||
**If GitHub:** create from the SCANNED file (exact bytes scanned = bytes sent):
|
||||
|
||||
```bash
|
||||
# PR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||||
gh pr create --base <base> --title "v$NEW_VERSION <type>: <summary>" --body-file "$PR_BODY_FILE"
|
||||
rm -f "$PR_BODY_FILE"
|
||||
```
|
||||
|
||||
**If GitLab:**
|
||||
|
||||
```bash
|
||||
# MR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||||
glab mr create -b <base> -t "v$NEW_VERSION <type>: <summary>" -d "$(cat <<'EOF'
|
||||
<MR body from above>
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
**If neither CLI is available:**
|
||||
Print the branch name, remote URL, and instruct the user to create the PR/MR manually via the web UI. Do not stop — the code is pushed and ready.
|
||||
|
||||
**Output the PR/MR URL** — then proceed to Step 20.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,205 @@
|
|||
## Step 18: Documentation sync (via subagent, before PR creation)
|
||||
|
||||
**Dispatch /document-release as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent gets a fresh context window — zero rot from the preceding 17 steps. It also runs the **full** `/document-release` workflow (with CHANGELOG clobber protection, doc exclusions, risky-change gates, named staging, race-safe PR body editing) rather than a weaker reimplementation.
|
||||
|
||||
**Sequencing:** This step runs AFTER Step 17 (Push) and BEFORE Step 19 (Create PR). The PR is created once from final HEAD with the `## Documentation` section baked into the initial body. No create-then-re-edit dance.
|
||||
|
||||
**Subagent prompt:**
|
||||
|
||||
> You are executing the /document-release workflow after a code push. Read the full skill file `${HOME}/.claude/skills/gstack/document-release/SKILL.md` and execute its complete workflow end-to-end, including CHANGELOG clobber protection, doc exclusions, risky-change gates, and named staging. Do NOT attempt to edit the PR body — no PR exists yet. Branch: `<branch>`, base: `<base>`.
|
||||
>
|
||||
> After completing the workflow, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"files_updated":["README.md","CLAUDE.md",...],"commit_sha":"abc1234","pushed":true,"documentation_section":"<markdown block for PR body's ## Documentation section>"}`
|
||||
>
|
||||
> If no documentation files needed updating, output:
|
||||
> `{"files_updated":[],"commit_sha":null,"pushed":false,"documentation_section":null}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Parse the LAST line of the subagent's output as JSON.
|
||||
2. Store `documentation_section` — Step 19 embeds it in the PR body (or omits the section if null).
|
||||
3. If `files_updated` is non-empty, print: `Documentation synced: {files_updated.length} files updated, committed as {commit_sha}`.
|
||||
4. If `files_updated` is empty, print: `Documentation is current — no updates needed.`
|
||||
|
||||
**If the subagent fails or returns invalid JSON:** Print a warning and proceed to Step 19 without a `## Documentation` section. Do not block /ship on subagent failure. The user can run `/document-release` manually after the PR lands.
|
||||
|
||||
---
|
||||
|
||||
## Step 19: Create PR/MR
|
||||
|
||||
**Idempotency check:** Check if a PR/MR already exists for this branch.
|
||||
|
||||
**If GitHub:**
|
||||
```bash
|
||||
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
|
||||
```
|
||||
|
||||
**If GitLab:**
|
||||
```bash
|
||||
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
|
||||
```
|
||||
|
||||
If an **open** PR/MR already exists: **update** the PR body using `gh pr edit --body-file "$PR_BODY_FILE"` (GitHub) or `glab mr update -d ...` (GitLab). Always regenerate the PR body from scratch using this run's fresh results (test output, coverage audit, review findings, adversarial review, TODOS summary, documentation_section from Step 18). Never reuse stale PR body content from a prior run. **Run the same redaction scan-at-sink (PR body + title) as the create path (Step 19) before editing — scan the temp file, then `gh pr edit --body-file` from it.**
|
||||
|
||||
**Always update the PR title to start with `v$NEW_VERSION`.** PR titles use the workspace-aware format `v<NEW_VERSION> <type>: <summary>` — version ALWAYS first, no exceptions, no "custom title kept intentionally" escape hatch. The shared helper `bin/gstack-pr-title-rewrite.sh` is the single source of truth for the rule.
|
||||
|
||||
1. Read the current title: `CURRENT=$(gh pr view --json title -q .title)` (or `glab mr view -F json | jq -r .title`).
|
||||
2. Compute the corrected title: `NEW_TITLE=$(~/.claude/skills/gstack/bin/gstack-pr-title-rewrite.sh "$NEW_VERSION" "$CURRENT")`. The helper handles three cases: title already correct (no-op), title has a different `v<X.Y.Z.W>` prefix (replace it), or title has no version prefix (prepend one).
|
||||
3. If `NEW_TITLE` differs from `CURRENT`, run `gh pr edit --title "$NEW_TITLE"` (or `glab mr update -t "$NEW_TITLE"`).
|
||||
4. **Self-check:** re-fetch the title and assert it starts with `v$NEW_VERSION `. If it does not, retry the edit once. If still wrong, surface the failure to the user.
|
||||
|
||||
This keeps the title truthful when Step 12's queue-drift detection rebumps a stale version, and forces the format on PRs that were created without it.
|
||||
|
||||
Print the existing URL and continue to Step 20.
|
||||
|
||||
If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
|
||||
|
||||
The PR/MR body should contain these sections:
|
||||
|
||||
```
|
||||
## Summary
|
||||
<Summarize ALL changes being shipped. Run `git log <base>..HEAD --oneline` to enumerate
|
||||
every commit. Exclude the VERSION/CHANGELOG metadata commit (that's this PR's bookkeeping,
|
||||
not a substantive change). Group the remaining commits into logical sections (e.g.,
|
||||
"**Performance**", "**Dead Code Removal**", "**Infrastructure**"). Every substantive commit
|
||||
must appear in at least one section. If a commit's work isn't reflected in the summary,
|
||||
you missed it.>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 7, or "All new code paths have test coverage.">
|
||||
<If Step 7 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 9 code review, or "No issues found.">
|
||||
|
||||
## Design Review
|
||||
<If design review ran: "Design Review (lite): N findings — M auto-fixed, K skipped. AI Slop: clean/N issues.">
|
||||
<If no frontend files changed: "No frontend files changed — design review skipped.">
|
||||
|
||||
## Eval Results
|
||||
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
|
||||
|
||||
## Greptile Review
|
||||
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
|
||||
<If no Greptile comments found: "No Greptile comments.">
|
||||
<If no PR existed during Step 10: omit this section entirely>
|
||||
|
||||
## Scope Drift
|
||||
<If scope drift ran: "Scope Check: CLEAN" or list of drift/creep findings>
|
||||
<If no scope drift: omit this section>
|
||||
|
||||
## Plan Completion
|
||||
<If plan file found: completion checklist summary from Step 8>
|
||||
<If no plan file: "No plan file detected.">
|
||||
<If plan items deferred: list deferred items>
|
||||
|
||||
## Linked Spec
|
||||
<Auto-detect: look for /spec archives matching this branch via:
|
||||
eval "$(${ctx.paths.binDir}/gstack-paths)"
|
||||
eval "$(${ctx.paths.binDir}/gstack-slug)"
|
||||
CURRENT_BRANCH=$(git branch --show-current)
|
||||
SPEC_ARCHIVES="$GSTACK_STATE_ROOT/projects/$SLUG/specs"
|
||||
# Find newest archive whose spec_branch frontmatter matches current branch (or one of its
|
||||
# parents — if spec spawned worktree spec/<slug>-$$, the spawned worktree IS where /ship runs).
|
||||
SPEC_FILE=$(grep -l "^spec_branch: $CURRENT_BRANCH$" "$SPEC_ARCHIVES"/*.md 2>/dev/null | head -1)
|
||||
[ -z "$SPEC_FILE" ] && exit # no spec; omit this section entirely
|
||||
SPEC_ISSUE=$(grep "^spec_issue_number:" "$SPEC_FILE" | cut -d' ' -f2)
|
||||
[ -z "$SPEC_ISSUE" ] && exit # spec archive exists but no issue number; omit
|
||||
|
||||
# CONDITIONAL Closes #N (codex F4): only add when Plan Completion above is "complete".
|
||||
# If the plan completion gate from Step 8 reports any deferred or failed items, emit:
|
||||
# "Linked to #$SPEC_ISSUE (partial delivery — NOT auto-closing; close manually after follow-up)"
|
||||
# If Plan Completion is fully complete, emit:
|
||||
# "Closes #$SPEC_ISSUE"
|
||||
# and include the Closes #N line in the PR body so GitHub auto-closes on merge.>
|
||||
|
||||
<Format:
|
||||
Closes #<N>
|
||||
|
||||
This PR delivers the spec at <archive path relative to repo root>.
|
||||
Spec filed: <spec_filed_at from frontmatter>>
|
||||
|
||||
<If partial delivery, emit instead:
|
||||
Linked to #<N> (partial delivery — not auto-closing).
|
||||
Deferred items: <list from Plan Completion>.
|
||||
Close #<N> manually after follow-up lands.>
|
||||
|
||||
<If no /spec archive matches this branch: omit this entire section.>
|
||||
|
||||
## Verification Results
|
||||
<If verification ran: summary from Step 8.1 (N PASS, M FAIL, K SKIPPED)>
|
||||
<If skipped: reason (no plan, no server, no verification section)>
|
||||
<If not applicable: omit this section>
|
||||
|
||||
## TODOS
|
||||
<If items marked complete: bullet list of completed items with version>
|
||||
<If no items completed: "No TODO items completed in this PR.">
|
||||
<If TODOS.md created or reorganized: note that>
|
||||
<If TODOS.md doesn't exist and user skipped: omit this section>
|
||||
|
||||
## Documentation
|
||||
<Embed the `documentation_section` string returned by Step 18's subagent here, verbatim.>
|
||||
<If Step 18 returned `documentation_section: null` (no docs updated), omit this section entirely.>
|
||||
|
||||
## Test plan
|
||||
- [x] All Rails tests pass (N runs, 0 failures)
|
||||
- [x] All Vitest tests pass (N tests)
|
||||
|
||||
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
||||
```
|
||||
|
||||
#### Redaction scan (PR body + title) — runs before create AND edit
|
||||
|
||||
The PR body is world-readable on a public repo. Scan-at-sink before sending:
|
||||
write the composed body to a temp file, scan THAT file with the shared engine,
|
||||
and pass the same file to `gh`/`glab`. Wrap any Codex / Greptile / eval output
|
||||
sections in tool-attributed fences (` ```codex-review ` / ` ```greptile `) so the
|
||||
engine WARN-degrades the example credentials those tools quote instead of blocking
|
||||
the PR (a live-format credential inside the fence still blocks).
|
||||
|
||||
```bash
|
||||
REDACT_VIS=$(~/.claude/skills/gstack/bin/gstack-config get redact_repo_visibility 2>/dev/null)
|
||||
[ -z "$REDACT_VIS" ] && REDACT_VIS=$(gh repo view --json visibility -q .visibility 2>/dev/null | tr 'A-Z' 'a-z')
|
||||
REDACT_VIS="${REDACT_VIS:-unknown}"
|
||||
PR_BODY_FILE=$(mktemp)
|
||||
cat > "$PR_BODY_FILE" <<'PR_BODY_EOF'
|
||||
<PR body from above>
|
||||
PR_BODY_EOF
|
||||
~/.claude/skills/gstack/bin/gstack-redact --from-file "$PR_BODY_FILE" --repo-visibility "$REDACT_VIS" --self-email "$(git config user.email 2>/dev/null)" --json
|
||||
case $? in
|
||||
3) echo "BLOCKED — credential in PR body. Rotate + redact, do not create the PR."; exit 1 ;;
|
||||
2) echo "MEDIUM findings — confirm per finding (sterner on public) before proceeding." ;;
|
||||
esac
|
||||
# Also scan the title (short, single-line):
|
||||
printf '%s' "v$NEW_VERSION <type>: <summary>" | ~/.claude/skills/gstack/bin/gstack-redact --repo-visibility "$REDACT_VIS" --json
|
||||
```
|
||||
|
||||
HIGH blocks (exit 3, no skip). MEDIUM → AskUserQuestion (PII subset offers
|
||||
`--auto-redact`). Same scan runs before the `gh pr edit --body` path (Step 17).
|
||||
|
||||
**If GitHub:** create from the SCANNED file (exact bytes scanned = bytes sent):
|
||||
|
||||
```bash
|
||||
# PR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||||
gh pr create --base <base> --title "v$NEW_VERSION <type>: <summary>" --body-file "$PR_BODY_FILE"
|
||||
rm -f "$PR_BODY_FILE"
|
||||
```
|
||||
|
||||
**If GitLab:**
|
||||
|
||||
```bash
|
||||
# MR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||||
glab mr create -b <base> -t "v$NEW_VERSION <type>: <summary>" -d "$(cat <<'EOF'
|
||||
<MR body from above>
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
**If neither CLI is available:**
|
||||
Print the branch name, remote URL, and instruct the user to create the PR/MR manually via the web UI. Do not stop — the code is pushed and ready.
|
||||
|
||||
**Output the PR/MR URL** — then proceed to Step 20.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,405 @@
|
|||
<!-- AUTO-GENERATED from review-army.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 9: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
|
||||
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
|
||||
|
||||
2. Run `git diff origin/<base>` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
|
||||
|
||||
3. Apply the review checklist in two passes:
|
||||
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
|
||||
- **Pass 2 (INFORMATIONAL):** All remaining categories
|
||||
|
||||
## Confidence Calibration
|
||||
|
||||
Every finding MUST include a confidence score (1-10):
|
||||
|
||||
| Score | Meaning | Display rule |
|
||||
|-------|---------|-------------|
|
||||
| 9-10 | Verified by reading specific code. Concrete bug or exploit demonstrated. | Show normally |
|
||||
| 7-8 | High confidence pattern match. Very likely correct. | Show normally |
|
||||
| 5-6 | Moderate. Could be a false positive. | Show with caveat: "Medium confidence, verify this is actually an issue" |
|
||||
| 3-4 | Low confidence. Pattern is suspicious but may be fine. | Suppress from main report. Include in appendix only. |
|
||||
| 1-2 | Speculation. | Only report if severity would be P0. |
|
||||
|
||||
**Finding format:**
|
||||
|
||||
\`[SEVERITY] (confidence: N/10) file:line — description\`
|
||||
|
||||
Example:
|
||||
\`[P1] (confidence: 9/10) app/models/user.rb:42 — SQL injection via string interpolation in where clause\`
|
||||
\`[P2] (confidence: 5/10) app/controllers/api/v1/users_controller.rb:18 — Possible N+1 query, verify with production logs\`
|
||||
|
||||
### Pre-emit verification gate (#1539 — kills the "field doesn't exist" FP class)
|
||||
|
||||
Before any finding is promoted to the report, the gate requires:
|
||||
|
||||
1. **Quote the specific code line that motivates the finding** — file:line plus
|
||||
the verbatim text of the line(s) that triggered it. If the finding is "field
|
||||
X doesn't exist on model Y", quote the lines of class Y where the field
|
||||
would live. If "dict.get() might return None", quote the dict initialization.
|
||||
If "race condition between A and B", quote both A and B.
|
||||
|
||||
2. **If you cannot quote the motivating line(s), the finding is unverified.**
|
||||
Force its confidence to 4-5 (suppressed from the main report). It still goes
|
||||
into the appendix so reviewers can audit calibration, but the user does NOT
|
||||
see it in the critical-pass output. Do not work around this by inventing
|
||||
speculative confidence 7+ — that defeats the gate.
|
||||
|
||||
**Framework-meta nudge:** When the symbol is generated by a framework
|
||||
metaclass, descriptor, ORM Meta inner-class, or migration history (Django
|
||||
`Meta`, Rails `has_many`/`scope`, SQLAlchemy `relationship`/`Column`,
|
||||
TypeORM decorators, Sequelize `init`/`belongsTo`, Prisma generated client),
|
||||
quote the meta-construct (the `Meta` block, the migration, the decorator,
|
||||
the schema file) instead of expecting the literal name in the class body.
|
||||
The verification is "I read the source that creates this symbol", not "I
|
||||
grep'd for the name and didn't find it." Deeper framework-aware verification
|
||||
(model introspection, migration-history-aware checks, ORM dialect detection)
|
||||
is deliberately out of scope for the lighter gate — see the deferred
|
||||
`~/.gstack-dev/plans/1539-framework-aware-review.md` design doc.
|
||||
|
||||
The FP classes the gate kills (measured against Django Sprint 2.5 #1539):
|
||||
|
||||
| FP class | Why the gate catches it |
|
||||
|---|---|
|
||||
| "field doesn't exist on model" | Requires quoting the model class body or Meta; the field's absence becomes obvious |
|
||||
| "dict.get() might be None" | Requires quoting the dict initialization (e.g. Django form's `cleaned_data` is `{}`-initialized) |
|
||||
| "save() might lose fields" | Requires quoting the ORM signature or model definition |
|
||||
| "update_fields might miss X" | Requires quoting the field set; if X doesn't exist, the FP is self-evident |
|
||||
|
||||
**Calibration learning:** If you report a finding with confidence < 7 and the user
|
||||
confirms it IS a real issue, that is a calibration event. Your initial confidence was
|
||||
too low. Log the corrected pattern as a learning so future reviews catch it with
|
||||
higher confidence.
|
||||
|
||||
## Design Review (conditional, diff-scoped)
|
||||
|
||||
Check if the diff touches frontend files using `gstack-diff-scope`:
|
||||
|
||||
```bash
|
||||
source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
|
||||
```
|
||||
|
||||
**If `SCOPE_FRONTEND=false`:** Skip design review silently. No output.
|
||||
|
||||
**If `SCOPE_FRONTEND=true`:**
|
||||
|
||||
1. **Check for DESIGN.md.** If `DESIGN.md` or `design-system.md` exists in the repo root, read it. All design findings are calibrated against it — patterns blessed in DESIGN.md are not flagged. If not found, use universal design principles.
|
||||
|
||||
2. **Read `.claude/skills/review/design-checklist.md`.** If the file cannot be read, skip design review with a note: "Design checklist not found — skipping design review."
|
||||
|
||||
3. **Read each changed frontend file** (full file, not just diff hunks). Frontend files are identified by the patterns listed in the checklist.
|
||||
|
||||
4. **Apply the design checklist** against the changed files. For each item:
|
||||
- **[HIGH] mechanical CSS fix** (`outline: none`, `!important`, `font-size < 16px`): classify as AUTO-FIX
|
||||
- **[HIGH/MEDIUM] design judgment needed**: classify as ASK
|
||||
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /design-review"
|
||||
|
||||
5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow.
|
||||
|
||||
6. **Log the result** for the Review Readiness Dashboard:
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"design-review-lite","timestamp":"TIMESTAMP","status":"STATUS","findings":N,"auto_fixed":M,"commit":"COMMIT"}'
|
||||
```
|
||||
|
||||
Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count, COMMIT = output of `git rev-parse --short HEAD`.
|
||||
|
||||
7. **Codex design voice** (optional, automatic if available):
|
||||
|
||||
```bash
|
||||
command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, run a lightweight design check on the diff:
|
||||
|
||||
```bash
|
||||
TMPERR_DRL=$(mktemp /tmp/codex-drl-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): 1. Brand/product unmistakable in first screen? 2. One strong visual anchor present? 3. Page understandable by scanning headlines only? 4. Each section has one job? 5. Are cards actually necessary? 6. Does motion improve hierarchy or atmosphere? 7. Would design feel premium with all decorative shadows removed? Flag any hard rejections: 1. Generic SaaS card grid as first impression 2. Beautiful image with weak brand 3. Strong headline with no clear action 4. Busy imagery behind text 5. Sections repeating same mood statement 6. Carousel with no narrative purpose 7. App UI made of stacked cards instead of layout 5 most important design findings only. Reference file:line." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_DRL"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
```bash
|
||||
cat "$TMPERR_DRL" && rm -f "$TMPERR_DRL"
|
||||
```
|
||||
|
||||
**Error handling:** All errors are non-blocking. On auth failure, timeout, or empty response — skip with a brief note and continue.
|
||||
|
||||
Present Codex output under a `CODEX (design):` header, merged with the checklist findings above.
|
||||
|
||||
Include any design findings alongside the code review findings. They follow the same Fix-First flow below.
|
||||
|
||||
## Step 9.1: Review Army — Specialist Dispatch
|
||||
|
||||
### Detect stack and scope
|
||||
|
||||
```bash
|
||||
source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null) || true
|
||||
# Detect stack for specialist context
|
||||
STACK=""
|
||||
[ -f Gemfile ] && STACK="${STACK}ruby "
|
||||
[ -f package.json ] && STACK="${STACK}node "
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && STACK="${STACK}python "
|
||||
[ -f go.mod ] && STACK="${STACK}go "
|
||||
[ -f Cargo.toml ] && STACK="${STACK}rust "
|
||||
echo "STACK: ${STACK:-unknown}"
|
||||
DIFF_BASE=$(git merge-base origin/<base> HEAD)
|
||||
DIFF_INS=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
|
||||
DIFF_DEL=$(git diff "$DIFF_BASE" --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
|
||||
DIFF_LINES=$((DIFF_INS + DIFF_DEL))
|
||||
echo "DIFF_LINES: $DIFF_LINES"
|
||||
# Detect test framework for specialist test stub generation
|
||||
TEST_FW=""
|
||||
{ [ -f jest.config.ts ] || [ -f jest.config.js ]; } && TEST_FW="jest"
|
||||
[ -f vitest.config.ts ] && TEST_FW="vitest"
|
||||
{ [ -f spec/spec_helper.rb ] || [ -f .rspec ]; } && TEST_FW="rspec"
|
||||
{ [ -f pytest.ini ] || [ -f conftest.py ]; } && TEST_FW="pytest"
|
||||
[ -f go.mod ] && TEST_FW="go-test"
|
||||
echo "TEST_FW: ${TEST_FW:-unknown}"
|
||||
```
|
||||
|
||||
### Read specialist hit rates (adaptive gating)
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-specialist-stats 2>/dev/null || true
|
||||
```
|
||||
|
||||
### Select specialists
|
||||
|
||||
Based on the scope signals above, select which specialists to dispatch.
|
||||
|
||||
**Always-on (dispatch on every review with 50+ changed lines):**
|
||||
1. **Testing** — read `~/.claude/skills/gstack/review/specialists/testing.md`
|
||||
2. **Maintainability** — read `~/.claude/skills/gstack/review/specialists/maintainability.md`
|
||||
|
||||
**If DIFF_LINES < 50:** Skip all specialists. Print: "Small diff ($DIFF_LINES lines) — specialists skipped." Continue to the Fix-First flow (item 4).
|
||||
|
||||
**Conditional (dispatch if the matching scope signal is true):**
|
||||
3. **Security** — if SCOPE_AUTH=true, OR if SCOPE_BACKEND=true AND DIFF_LINES > 100. Read `~/.claude/skills/gstack/review/specialists/security.md`
|
||||
4. **Performance** — if SCOPE_BACKEND=true OR SCOPE_FRONTEND=true. Read `~/.claude/skills/gstack/review/specialists/performance.md`
|
||||
5. **Data Migration** — if SCOPE_MIGRATIONS=true. Read `~/.claude/skills/gstack/review/specialists/data-migration.md`
|
||||
6. **API Contract** — if SCOPE_API=true. Read `~/.claude/skills/gstack/review/specialists/api-contract.md`
|
||||
7. **Design** — if SCOPE_FRONTEND=true. Use the existing design review checklist at `~/.claude/skills/gstack/review/design-checklist.md`
|
||||
|
||||
### Adaptive gating
|
||||
|
||||
After scope-based selection, apply adaptive gating based on specialist hit rates:
|
||||
|
||||
For each conditional specialist that passed scope gating, check the `gstack-specialist-stats` output above:
|
||||
- If tagged `[GATE_CANDIDATE]` (0 findings in 10+ dispatches): skip it. Print: "[specialist] auto-gated (0 findings in N reviews)."
|
||||
- If tagged `[NEVER_GATE]`: always dispatch regardless of hit rate. Security and data-migration are insurance policy specialists — they should run even when silent.
|
||||
|
||||
**Force flags:** If the user's prompt includes `--security`, `--performance`, `--testing`, `--maintainability`, `--data-migration`, `--api-contract`, `--design`, or `--all-specialists`, force-include that specialist regardless of gating.
|
||||
|
||||
Note which specialists were selected, gated, and skipped. Print the selection:
|
||||
"Dispatching N specialists: [names]. Skipped: [names] (scope not detected). Gated: [names] (0 findings in N+ reviews)."
|
||||
|
||||
---
|
||||
|
||||
### Dispatch specialists in parallel
|
||||
|
||||
For each selected specialist, launch an independent subagent via the Agent tool.
|
||||
**Launch ALL selected specialists in a single message** (multiple Agent tool calls)
|
||||
so they run in parallel. Each subagent has fresh context — no prior review bias.
|
||||
|
||||
**Each specialist subagent prompt:**
|
||||
|
||||
Construct the prompt for each specialist. The prompt includes:
|
||||
|
||||
1. The specialist's checklist content (you already read the file above)
|
||||
2. Stack context: "This is a {STACK} project."
|
||||
3. Past learnings for this domain (if any exist):
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-learnings-search --type pitfall --query "{specialist domain}" --limit 5 2>/dev/null || true
|
||||
```
|
||||
|
||||
If learnings are found, include them: "Past learnings for this domain: {learnings}"
|
||||
|
||||
4. Instructions:
|
||||
|
||||
"You are a specialist code reviewer. Read the checklist below, then run
|
||||
`DIFF_BASE=$(git merge-base origin/<base> HEAD) && git diff "$DIFF_BASE"` to get the full diff. Apply the checklist against the diff.
|
||||
|
||||
For each finding, output a JSON object on its own line:
|
||||
{\"severity\":\"CRITICAL|INFORMATIONAL\",\"confidence\":N,\"path\":\"file\",\"line\":N,\"category\":\"category\",\"summary\":\"description\",\"fix\":\"recommended fix\",\"fingerprint\":\"path:line:category\",\"specialist\":\"name\"}
|
||||
|
||||
Required fields: severity, confidence, path, category, summary, specialist.
|
||||
Optional: line, fix, fingerprint, evidence, test_stub.
|
||||
|
||||
If you can write a test that would catch this issue, include it in the `test_stub` field.
|
||||
Use the detected test framework ({TEST_FW}). Write a minimal skeleton — describe/it/test
|
||||
blocks with clear intent. Skip test_stub for architectural or design-only findings.
|
||||
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
Do not output anything else — no preamble, no summary, no commentary.
|
||||
|
||||
Stack context: {STACK}
|
||||
Past learnings: {learnings or 'none'}
|
||||
|
||||
CHECKLIST:
|
||||
{checklist content}"
|
||||
|
||||
**Subagent configuration:**
|
||||
- Use `subagent_type: "general-purpose"`
|
||||
- Do NOT use `run_in_background` — all specialists must complete before merge
|
||||
- If any specialist subagent fails or times out, log the failure and continue with results from successful specialists. Specialists are additive — partial results are better than no results.
|
||||
|
||||
---
|
||||
|
||||
### Step 9.2: Collect and merge findings
|
||||
|
||||
After all specialist subagents complete, collect their outputs.
|
||||
|
||||
**Parse findings:**
|
||||
For each specialist's output:
|
||||
1. If output is "NO FINDINGS" — skip, this specialist found nothing
|
||||
2. Otherwise, parse each line as a JSON object. Skip lines that are not valid JSON.
|
||||
3. Collect all parsed findings into a single list, tagged with their specialist name.
|
||||
|
||||
**Fingerprint and deduplicate:**
|
||||
For each finding, compute its fingerprint:
|
||||
- If `fingerprint` field is present, use it
|
||||
- Otherwise: `{path}:{line}:{category}` (if line is present) or `{path}:{category}`
|
||||
|
||||
Group findings by fingerprint. For findings sharing the same fingerprint:
|
||||
- Keep the finding with the highest confidence score
|
||||
- Tag it: "MULTI-SPECIALIST CONFIRMED ({specialist1} + {specialist2})"
|
||||
- Boost confidence by +1 (cap at 10)
|
||||
- Note the confirming specialists in the output
|
||||
|
||||
**Apply confidence gates:**
|
||||
- Confidence 7+: show normally in the findings output
|
||||
- Confidence 5-6: show with caveat "Medium confidence — verify this is actually an issue"
|
||||
- Confidence 3-4: move to appendix (suppress from main findings)
|
||||
- Confidence 1-2: suppress entirely
|
||||
|
||||
**Compute PR Quality Score:**
|
||||
After merging, compute the quality score:
|
||||
`quality_score = max(0, 10 - (critical_count * 2 + informational_count * 0.5))`
|
||||
Cap at 10. Log this in the review result at the end.
|
||||
|
||||
**Output merged findings:**
|
||||
Present the merged findings in the same format as the current review:
|
||||
|
||||
```
|
||||
SPECIALIST REVIEW: N findings (X critical, Y informational) from Z specialists
|
||||
|
||||
[For each finding, in order: CRITICAL first, then INFORMATIONAL, sorted by confidence descending]
|
||||
[SEVERITY] (confidence: N/10, specialist: name) path:line — summary
|
||||
Fix: recommended fix
|
||||
[If MULTI-SPECIALIST CONFIRMED: show confirmation note]
|
||||
|
||||
PR Quality Score: X/10
|
||||
```
|
||||
|
||||
These findings flow into the Fix-First flow (item 4) alongside the checklist pass (Step 9).
|
||||
The Fix-First heuristic applies identically — specialist findings follow the same AUTO-FIX vs ASK classification.
|
||||
|
||||
**Compile per-specialist stats:**
|
||||
After merging findings, compile a `specialists` object for the review-log persist.
|
||||
For each specialist (testing, maintainability, security, performance, data-migration, api-contract, design, red-team):
|
||||
- If dispatched: `{"dispatched": true, "findings": N, "critical": N, "informational": N}`
|
||||
- If skipped by scope: `{"dispatched": false, "reason": "scope"}`
|
||||
- If skipped by gating: `{"dispatched": false, "reason": "gated"}`
|
||||
- If not applicable (e.g., red-team not activated): omit from the object
|
||||
|
||||
Include the Design specialist even though it uses `design-checklist.md` instead of the specialist schema files.
|
||||
Remember these stats — you will need them for the review-log entry in Step 5.8.
|
||||
|
||||
---
|
||||
|
||||
### Red Team dispatch (conditional)
|
||||
|
||||
**Activation:** Only if DIFF_LINES > 200 OR any specialist produced a CRITICAL finding.
|
||||
|
||||
If activated, dispatch one more subagent via the Agent tool (foreground, not background).
|
||||
|
||||
The Red Team subagent receives:
|
||||
1. The red-team checklist from `~/.claude/skills/gstack/review/specialists/red-team.md`
|
||||
2. The merged specialist findings from Step 9.2 (so it knows what was already caught)
|
||||
3. The git diff command
|
||||
|
||||
Prompt: "You are a red team reviewer. The code has already been reviewed by N specialists
|
||||
who found the following issues: {merged findings summary}. Your job is to find what they
|
||||
MISSED. Read the checklist, run `DIFF_BASE=$(git merge-base origin/<base> HEAD) && git diff "$DIFF_BASE"`, and look for gaps.
|
||||
Output findings as JSON objects (same schema as the specialists). Focus on cross-cutting
|
||||
concerns, integration boundary issues, and failure modes that specialist checklists
|
||||
don't cover."
|
||||
|
||||
If the Red Team finds additional issues, merge them into the findings list before
|
||||
the Fix-First flow (item 4). Red Team findings are tagged with `"specialist":"red-team"`.
|
||||
|
||||
If the Red Team returns NO FINDINGS, note: "Red Team review: no additional issues found."
|
||||
If the Red Team subagent fails or times out, skip silently and continue.
|
||||
|
||||
### Step 9.3: Cross-review finding dedup
|
||||
|
||||
Before classifying findings, check if any were previously skipped by the user in a prior review on this branch.
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-read
|
||||
```
|
||||
|
||||
Parse the output: only lines BEFORE `---CONFIG---` are JSONL entries (the output also contains `---CONFIG---` and `---HEAD---` footer sections that are not JSONL — ignore those).
|
||||
|
||||
For each JSONL entry that has a `findings` array:
|
||||
1. Collect all fingerprints where `action: "skipped"`
|
||||
2. Note the `commit` field from that entry
|
||||
|
||||
If skipped fingerprints exist, get the list of files changed since that review:
|
||||
|
||||
```bash
|
||||
git diff --name-only <prior-review-commit> HEAD
|
||||
```
|
||||
|
||||
For each current finding (from both the checklist pass (Step 9) and specialist review (Step 9.1-9.2)), check:
|
||||
- Does its fingerprint match a previously skipped finding?
|
||||
- Is the finding's file path NOT in the changed-files set?
|
||||
|
||||
If both conditions are true: suppress the finding. It was intentionally skipped and the relevant code hasn't changed.
|
||||
|
||||
Print: "Suppressed N findings from prior reviews (previously skipped by user)"
|
||||
|
||||
**Only suppress `skipped` findings — never `fixed` or `auto-fixed`** (those might regress and should be re-checked).
|
||||
|
||||
If no prior reviews exist or none have a `findings` array, skip this step silently.
|
||||
|
||||
Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
|
||||
|
||||
4. **Classify each finding from both the checklist pass and specialist review (Step 9.1-Step 9.2) as AUTO-FIX or ASK** per the Fix-First Heuristic in
|
||||
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
|
||||
|
||||
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
|
||||
`[AUTO-FIXED] [file:line] Problem → what you did`
|
||||
|
||||
6. **If ASK items remain,** present them in ONE AskUserQuestion:
|
||||
- List each with number, severity, problem, recommended fix
|
||||
- Per-item options: A) Fix B) Skip
|
||||
- Overall RECOMMENDATION
|
||||
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
|
||||
|
||||
7. **After all fixes (auto + user-approved):**
|
||||
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
|
||||
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 12.
|
||||
|
||||
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
|
||||
|
||||
If no issues found: `Pre-Landing Review: No issues found.`
|
||||
|
||||
9. Persist the review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
|
||||
```
|
||||
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
|
||||
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
|
||||
- `quality_score` = the PR Quality Score computed in Step 9.2 (e.g., 7.5). If specialists were skipped (small diff), use `10.0`
|
||||
- `specialists` = the per-specialist stats object compiled in Step 9.2. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}`
|
||||
- `findings` = array of per-finding records. For each finding (from checklist pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"`, `"fixed"` (user approved), or `"skipped"` (user chose Skip).
|
||||
|
||||
Save the review output — it goes into the PR body in Step 19.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
## Step 9: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
|
||||
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
|
||||
|
||||
2. Run `git diff origin/<base>` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
|
||||
|
||||
3. Apply the review checklist in two passes:
|
||||
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
|
||||
- **Pass 2 (INFORMATIONAL):** All remaining categories
|
||||
|
||||
{{CONFIDENCE_CALIBRATION}}
|
||||
|
||||
{{DESIGN_REVIEW_LITE}}
|
||||
|
||||
Include any design findings alongside the code review findings. They follow the same Fix-First flow below.
|
||||
|
||||
{{REVIEW_ARMY}}
|
||||
|
||||
{{CROSS_REVIEW_DEDUP}}
|
||||
|
||||
4. **Classify each finding from both the checklist pass and specialist review (Step 9.1-Step 9.2) as AUTO-FIX or ASK** per the Fix-First Heuristic in
|
||||
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
|
||||
|
||||
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
|
||||
`[AUTO-FIXED] [file:line] Problem → what you did`
|
||||
|
||||
6. **If ASK items remain,** present them in ONE AskUserQuestion:
|
||||
- List each with number, severity, problem, recommended fix
|
||||
- Per-item options: A) Fix B) Skip
|
||||
- Overall RECOMMENDATION
|
||||
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
|
||||
|
||||
7. **After all fixes (auto + user-approved):**
|
||||
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
|
||||
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 12.
|
||||
|
||||
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
|
||||
|
||||
If no issues found: `Pre-Landing Review: No issues found.`
|
||||
|
||||
9. Persist the review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
|
||||
```
|
||||
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
|
||||
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
|
||||
- `quality_score` = the PR Quality Score computed in Step 9.2 (e.g., 7.5). If specialists were skipped (small diff), use `10.0`
|
||||
- `specialists` = the per-specialist stats object compiled in Step 9.2. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}`
|
||||
- `findings` = array of per-finding records. For each finding (from checklist pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"`, `"fixed"` (user approved), or `"skipped"` (user chose Skip).
|
||||
|
||||
Save the review output — it goes into the PR body in Step 19.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,259 @@
|
|||
<!-- AUTO-GENERATED from test-coverage.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 7: Test Coverage Audit
|
||||
|
||||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent runs the coverage audit in a fresh context window — the parent only sees the conclusion, not intermediate file reads. This is context-rot defense.
|
||||
|
||||
**Subagent prompt:** Pass the following instructions to the subagent, with `<base>` substituted with the base branch:
|
||||
|
||||
> You are running a ship-workflow test coverage audit. Run `git diff <base>...HEAD` as needed. Do not commit or push — report only.
|
||||
>
|
||||
> 100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned.
|
||||
|
||||
### Test Framework Detection
|
||||
|
||||
Before analyzing coverage, detect the project's test framework:
|
||||
|
||||
1. **Read CLAUDE.md** — look for a `## Testing` section with test command and framework name. If found, use that as the authoritative source.
|
||||
2. **If CLAUDE.md has no testing section, auto-detect:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* cypress.config.* .rspec pytest.ini phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
```
|
||||
|
||||
3. **If no framework detected:** falls through to the Test Framework Bootstrap step (Step 4) which handles full setup.
|
||||
|
||||
**0. Before/after test count:**
|
||||
|
||||
```bash
|
||||
# Count test files before any generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
Store this number for the PR body.
|
||||
|
||||
**1. Trace every codepath changed** using `git diff origin/<base>...HEAD`:
|
||||
|
||||
Read every changed file. For each one, trace how data flows through the code — don't just list functions, actually follow the execution:
|
||||
|
||||
1. **Read the diff.** For each changed file, read the full file (not just the diff hunk) to understand context.
|
||||
2. **Trace data flow.** Starting from each entry point (route handler, exported function, event listener, component render), follow the data through every branch:
|
||||
- Where does input come from? (request params, props, database, API call)
|
||||
- What transforms it? (validation, mapping, computation)
|
||||
- Where does it go? (database write, API response, rendered output, side effect)
|
||||
- What can go wrong at each step? (null/undefined, invalid input, network failure, empty collection)
|
||||
3. **Diagram the execution.** For each changed file, draw an ASCII diagram showing:
|
||||
- Every function/method that was added or modified
|
||||
- Every conditional branch (if/else, switch, ternary, guard clause, early return)
|
||||
- Every error path (try/catch, rescue, error boundary, fallback)
|
||||
- Every call to another function (trace into it — does IT have untested branches?)
|
||||
- Every edge: what happens with null input? Empty array? Invalid type?
|
||||
|
||||
This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test.
|
||||
|
||||
**2. Map user flows, interactions, and error states:**
|
||||
|
||||
Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through:
|
||||
|
||||
- **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test.
|
||||
- **Interaction edge cases:** What happens when the user does something unexpected?
|
||||
- Double-click/rapid resubmit
|
||||
- Navigate away mid-operation (back button, close tab, click another link)
|
||||
- Submit with stale data (page sat open for 30 minutes, session expired)
|
||||
- Slow connection (API takes 10 seconds — what does the user see?)
|
||||
- Concurrent actions (two tabs, same form)
|
||||
- **Error states the user can see:** For every error the code handles, what does the user actually experience?
|
||||
- Is there a clear error message or a silent failure?
|
||||
- Can the user recover (retry, go back, fix input) or are they stuck?
|
||||
- What happens with no network? With a 500 from the API? With invalid data from the server?
|
||||
- **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input?
|
||||
|
||||
Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else.
|
||||
|
||||
**3. Check each branch against existing tests:**
|
||||
|
||||
Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it:
|
||||
- Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb`
|
||||
- An if/else → look for tests covering BOTH the true AND false path
|
||||
- An error handler → look for a test that triggers that specific error condition
|
||||
- A call to `helperFn()` that has its own branches → those branches need tests too
|
||||
- A user flow → look for an integration or E2E test that walks through the journey
|
||||
- An interaction edge case → look for a test that simulates the unexpected action
|
||||
|
||||
Quality scoring rubric:
|
||||
- ★★★ Tests behavior with edge cases AND error paths
|
||||
- ★★ Tests correct behavior, happy path only
|
||||
- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw")
|
||||
|
||||
### E2E Test Decision Matrix
|
||||
|
||||
When checking each branch, also determine whether a unit test or E2E/integration test is the right tool:
|
||||
|
||||
**RECOMMEND E2E (mark as [→E2E] in the diagram):**
|
||||
- Common user flow spanning 3+ components/services (e.g., signup → verify email → first login)
|
||||
- Integration point where mocking hides real failures (e.g., API → queue → worker → DB)
|
||||
- Auth/payment/data-destruction flows — too important to trust unit tests alone
|
||||
|
||||
**RECOMMEND EVAL (mark as [→EVAL] in the diagram):**
|
||||
- Critical LLM call that needs a quality eval (e.g., prompt change → test output still meets quality bar)
|
||||
- Changes to prompt templates, system instructions, or tool definitions
|
||||
|
||||
**STICK WITH UNIT TESTS:**
|
||||
- Pure function with clear inputs/outputs
|
||||
- Internal helper with no side effects
|
||||
- Edge case of a single function (null input, empty array)
|
||||
- Obscure/rare flow that isn't customer-facing
|
||||
|
||||
### REGRESSION RULE (mandatory)
|
||||
|
||||
**IRON RULE:** When the coverage audit identifies a REGRESSION — code that previously worked but the diff broke — a regression test is written immediately. No AskUserQuestion. No skipping. Regressions are the highest-priority test because they prove something broke.
|
||||
|
||||
A regression is when:
|
||||
- The diff modifies existing behavior (not new code)
|
||||
- The existing test suite (if any) doesn't cover the changed path
|
||||
- The change introduces a new failure mode for existing callers
|
||||
|
||||
When uncertain whether a change is a regression, err on the side of writing the test.
|
||||
|
||||
Format: commit as `test: regression test for {what broke}`
|
||||
|
||||
**4. Output ASCII coverage diagram:**
|
||||
|
||||
Include BOTH code paths and user flows in the same diagram. Mark E2E-worthy and eval-worthy paths:
|
||||
|
||||
```
|
||||
CODE PATHS USER FLOWS
|
||||
[+] src/services/billing.ts [+] Payment checkout
|
||||
├── processPayment() ├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15
|
||||
│ ├── [★★★ TESTED] happy + declined + timeout ├── [GAP] [→E2E] Double-click submit
|
||||
│ ├── [GAP] Network timeout └── [GAP] Navigate away mid-payment
|
||||
│ └── [GAP] Invalid currency
|
||||
└── refundPayment() [+] Error states
|
||||
├── [★★ TESTED] Full refund — :89 ├── [★★ TESTED] Card declined message
|
||||
└── [★ TESTED] Partial (non-throw only) — :101 └── [GAP] Network timeout UX
|
||||
|
||||
LLM integration: [GAP] [→EVAL] Prompt template change — needs eval test
|
||||
|
||||
COVERAGE: 5/13 paths tested (38%) | Code paths: 3/5 (60%) | User flows: 2/8 (25%)
|
||||
QUALITY: ★★★:2 ★★:2 ★:1 | GAPS: 8 (2 E2E, 1 eval)
|
||||
```
|
||||
|
||||
Legend: ★★★ behavior + edge + error | ★★ happy path | ★ smoke check
|
||||
[→E2E] = needs integration test | [→EVAL] = needs LLM eval
|
||||
|
||||
**Fast path:** All paths covered → "Step 7: All new code paths have test coverage ✓" Continue.
|
||||
|
||||
**5. Generate tests for uncovered paths:**
|
||||
|
||||
If test framework detected (or bootstrapped in Step 4):
|
||||
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
|
||||
- Read 2-3 existing test files to match conventions exactly
|
||||
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
|
||||
- For paths marked [→E2E]: generate integration/E2E tests using the project's E2E framework (Playwright, Cypress, Capybara, etc.)
|
||||
- For paths marked [→EVAL]: generate eval tests using the project's eval framework, or flag for manual eval if none exists
|
||||
- Write tests that exercise the specific uncovered path with real assertions
|
||||
- Run each test. Passes → commit as `test: coverage for {feature}`
|
||||
- Fails → fix once. Still fails → revert, note gap in diagram.
|
||||
|
||||
Caps: 30 code paths max, 20 tests generated max (code + user flow combined), 2-min per-test exploration cap.
|
||||
|
||||
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
|
||||
|
||||
**Diff is test-only changes:** Skip Step 7 entirely: "No new application code paths to audit."
|
||||
|
||||
**6. After-count and coverage summary:**
|
||||
|
||||
```bash
|
||||
# Count test files after generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
For PR body: `Tests: {before} → {after} (+{delta} new)`
|
||||
Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.`
|
||||
|
||||
**7. Coverage gate:**
|
||||
|
||||
Before proceeding, check CLAUDE.md for a `## Test Coverage` section with `Minimum:` and `Target:` fields. If found, use those percentages. Otherwise use defaults: Minimum = 60%, Target = 80%.
|
||||
|
||||
Using the coverage percentage from the diagram in substep 4 (the `COVERAGE: X/Y (Z%)` line):
|
||||
|
||||
- **>= target:** Pass. "Coverage gate: PASS ({X}%)." Continue.
|
||||
- **>= minimum, < target:** Use AskUserQuestion:
|
||||
- "AI-assessed coverage is {X}%. {N} code paths are untested. Target is {target}%."
|
||||
- RECOMMENDATION: Choose A because untested code paths are where production bugs hide.
|
||||
- Options:
|
||||
A) Generate more tests for remaining gaps (recommended)
|
||||
B) Ship anyway — I accept the coverage risk
|
||||
C) These paths don't need tests — mark as intentionally uncovered
|
||||
- If A: Loop back to substep 5 (generate tests) targeting the remaining gaps. After second pass, if still below target, present AskUserQuestion again with updated numbers. Maximum 2 generation passes total.
|
||||
- If B: Continue. Include in PR body: "Coverage gate: {X}% — user accepted risk."
|
||||
- If C: Continue. Include in PR body: "Coverage gate: {X}% — {N} paths intentionally uncovered."
|
||||
|
||||
- **< minimum:** Use AskUserQuestion:
|
||||
- "AI-assessed coverage is critically low ({X}%). {N} of {M} code paths have no tests. Minimum threshold is {minimum}%."
|
||||
- RECOMMENDATION: Choose A because less than {minimum}% means more code is untested than tested.
|
||||
- Options:
|
||||
A) Generate tests for remaining gaps (recommended)
|
||||
B) Override — ship with low coverage (I understand the risk)
|
||||
- If A: Loop back to substep 5. Maximum 2 passes. If still below minimum after 2 passes, present the override choice again.
|
||||
- If B: Continue. Include in PR body: "Coverage gate: OVERRIDDEN at {X}%."
|
||||
|
||||
**Coverage percentage undetermined:** If the coverage diagram doesn't produce a clear numeric percentage (ambiguous output, parse error), **skip the gate** with: "Coverage gate: could not determine percentage — skipping." Do not default to 0% or block.
|
||||
|
||||
**Test-only diffs:** Skip the gate (same as the existing fast-path).
|
||||
|
||||
**100% coverage:** "Coverage gate: PASS (100%)." Continue.
|
||||
|
||||
### Test Plan Artifact
|
||||
|
||||
After producing the coverage diagram, write a test plan artifact so `/qa` and `/qa-only` can consume it:
|
||||
|
||||
```bash
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG
|
||||
USER=$(whoami)
|
||||
DATETIME=$(date +%Y%m%d-%H%M%S)
|
||||
```
|
||||
|
||||
Write to `~/.gstack/projects/{slug}/{user}-{branch}-ship-test-plan-{datetime}.md`:
|
||||
|
||||
```markdown
|
||||
# Test Plan
|
||||
Generated by /ship on {date}
|
||||
Branch: {branch}
|
||||
Repo: {owner/repo}
|
||||
|
||||
## Affected Pages/Routes
|
||||
- {URL path} — {what to test and why}
|
||||
|
||||
## Key Interactions to Verify
|
||||
- {interaction description} on {page}
|
||||
|
||||
## Edge Cases
|
||||
- {edge case} on {page}
|
||||
|
||||
## Critical Paths
|
||||
- {end-to-end flow that must work}
|
||||
```
|
||||
>
|
||||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"coverage_pct":N,"gaps":N,"diagram":"<full markdown coverage diagram for PR body>","tests_added":["path",...]}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Read the subagent's final output. Parse the LAST line as JSON.
|
||||
2. Store `coverage_pct` (for Step 20 metrics), `gaps` (user summary), `tests_added` (for the commit).
|
||||
3. Embed `diagram` verbatim in the PR body's `## Test Coverage` section (Step 19).
|
||||
4. Print a one-line summary: `Coverage: {coverage_pct}%, {gaps} gaps. {tests_added.length} tests added.`
|
||||
|
||||
**If the subagent fails, times out, or returns invalid JSON:** Fall back to running the audit inline in the parent. Do not block /ship on subagent failure — partial results are better than none.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
## Step 7: Test Coverage Audit
|
||||
|
||||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent runs the coverage audit in a fresh context window — the parent only sees the conclusion, not intermediate file reads. This is context-rot defense.
|
||||
|
||||
**Subagent prompt:** Pass the following instructions to the subagent, with `<base>` substituted with the base branch:
|
||||
|
||||
> You are running a ship-workflow test coverage audit. Run `git diff <base>...HEAD` as needed. Do not commit or push — report only.
|
||||
>
|
||||
> {{TEST_COVERAGE_AUDIT_SHIP}}
|
||||
>
|
||||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||||
> `{"coverage_pct":N,"gaps":N,"diagram":"<full markdown coverage diagram for PR body>","tests_added":["path",...]}`
|
||||
|
||||
**Parent processing:**
|
||||
|
||||
1. Read the subagent's final output. Parse the LAST line as JSON.
|
||||
2. Store `coverage_pct` (for Step 20 metrics), `gaps` (user summary), `tests_added` (for the commit).
|
||||
3. Embed `diagram` verbatim in the PR body's `## Test Coverage` section (Step 19).
|
||||
4. Print a one-line summary: `Coverage: {coverage_pct}%, {gaps} gaps. {tests_added.length} tests added.`
|
||||
|
||||
**If the subagent fails, times out, or returns invalid JSON:** Fall back to running the audit inline in the parent. Do not block /ship on subagent failure — partial results are better than none.
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,349 @@
|
|||
<!-- AUTO-GENERATED from tests.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
## Step 4: Test Framework Bootstrap
|
||||
|
||||
## Test Framework Bootstrap
|
||||
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
[ -f composer.json ] && echo "RUNTIME:php"
|
||||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||||
# Detect sub-frameworks
|
||||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
# Check opt-out marker
|
||||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||||
```
|
||||
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 7. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
||||
"I couldn't detect your project's language. What runtime are you using?"
|
||||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||||
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
||||
|
||||
**If runtime detected but no test framework — bootstrap:**
|
||||
|
||||
### B2. Research best practices
|
||||
|
||||
Use WebSearch to find current best practices for the detected runtime:
|
||||
- `"[runtime] best test framework 2025 2026"`
|
||||
- `"[framework A] vs [framework B] comparison"`
|
||||
|
||||
If WebSearch is unavailable, use this built-in knowledge table:
|
||||
|
||||
| Runtime | Primary recommendation | Alternative |
|
||||
|---------|----------------------|-------------|
|
||||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||||
| Python | pytest + pytest-cov | unittest |
|
||||
| Go | stdlib testing + testify | stdlib only |
|
||||
| Rust | cargo test (built-in) + mockall | — |
|
||||
| PHP | phpunit + mockery | pest |
|
||||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||||
|
||||
### B3. Framework selection
|
||||
|
||||
Use AskUserQuestion:
|
||||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||||
B) [Alternative] — [rationale]. Includes: [packages]
|
||||
C) Skip — don't set up testing right now
|
||||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||||
|
||||
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
||||
|
||||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||||
|
||||
### B4. Install and configure
|
||||
|
||||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||||
2. Create minimal config file
|
||||
3. Create directory structure (test/, spec/, etc.)
|
||||
4. Create one example test matching the project's code to verify setup works
|
||||
|
||||
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
||||
|
||||
### B4.5. First real tests
|
||||
|
||||
Generate 3-5 real tests for existing code:
|
||||
|
||||
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
||||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
||||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||||
5. Generate at least 1 test, cap at 5.
|
||||
|
||||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||||
|
||||
### B5. Verify
|
||||
|
||||
```bash
|
||||
# Run the full test suite to confirm everything works
|
||||
{detected test command}
|
||||
```
|
||||
|
||||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||||
|
||||
### B5.5. CI/CD pipeline
|
||||
|
||||
```bash
|
||||
# Check CI provider
|
||||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||||
```
|
||||
|
||||
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
||||
Create `.github/workflows/test.yml` with:
|
||||
- `runs-on: ubuntu-latest`
|
||||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||||
- The same test command verified in B5
|
||||
- Trigger: push + pull_request
|
||||
|
||||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||||
|
||||
### B6. Create TESTING.md
|
||||
|
||||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||||
|
||||
Write TESTING.md with:
|
||||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||||
- Framework name and version
|
||||
- How to run tests (the verified command from B5)
|
||||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||||
|
||||
### B7. Update CLAUDE.md
|
||||
|
||||
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
||||
|
||||
Append a `## Testing` section:
|
||||
- Run command and test directory
|
||||
- Reference to TESTING.md
|
||||
- Test expectations:
|
||||
- 100% test coverage is the goal — tests make vibe coding safe
|
||||
- When writing new functions, write a corresponding test
|
||||
- When fixing a bug, write a regression test
|
||||
- When adding error handling, write a test that triggers the error
|
||||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||||
- Never commit code that makes existing tests fail
|
||||
|
||||
### B8. Commit
|
||||
|
||||
```bash
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||||
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Run tests (on merged code)
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
`db:test:prepare` internally, which loads the schema into the correct lane database.
|
||||
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
|
||||
|
||||
Run both test suites in parallel:
|
||||
|
||||
```bash
|
||||
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
|
||||
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
|
||||
wait
|
||||
```
|
||||
|
||||
After both complete, read the output files and check pass/fail.
|
||||
|
||||
**If any test fails:** Do NOT immediately stop. Apply the Test Failure Ownership Triage:
|
||||
|
||||
## Test Failure Ownership Triage
|
||||
|
||||
When tests fail, do NOT immediately stop. First, determine ownership:
|
||||
|
||||
### Step T1: Classify each failure
|
||||
|
||||
For each failing test:
|
||||
|
||||
1. **Get the files changed on this branch:**
|
||||
```bash
|
||||
git diff origin/<base>...HEAD --name-only
|
||||
```
|
||||
|
||||
2. **Classify the failure:**
|
||||
- **In-branch** if: the failing test file itself was modified on this branch, OR the test output references code that was changed on this branch, OR you can trace the failure to a change in the branch diff.
|
||||
- **Likely pre-existing** if: neither the test file nor the code it tests was modified on this branch, AND the failure is unrelated to any branch change you can identify.
|
||||
- **When ambiguous, default to in-branch.** It is safer to stop the developer than to let a broken test ship. Only classify as pre-existing when you are confident.
|
||||
|
||||
This classification is heuristic — use your judgment reading the diff and the test output. You do not have a programmatic dependency graph.
|
||||
|
||||
### Step T2: Handle in-branch failures
|
||||
|
||||
**STOP.** These are your failures. Show them and do not proceed. The developer must fix their own broken tests before shipping.
|
||||
|
||||
### Step T3: Handle pre-existing failures
|
||||
|
||||
Check `REPO_MODE` from the preamble output.
|
||||
|
||||
**If REPO_MODE is `solo`:**
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
> These test failures appear pre-existing (not caused by your branch changes):
|
||||
>
|
||||
> [list each failure with file:line and brief error description]
|
||||
>
|
||||
> Since this is a solo repo, you're the only one who will fix these.
|
||||
>
|
||||
> RECOMMENDATION: Choose A — fix now while the context is fresh. Completeness: 9/10.
|
||||
> A) Investigate and fix now (human: ~2-4h / CC: ~15min) — Completeness: 10/10
|
||||
> B) Add as P0 TODO — fix after this branch lands — Completeness: 7/10
|
||||
> C) Skip — I know about this, ship anyway — Completeness: 3/10
|
||||
|
||||
**If REPO_MODE is `collaborative` or `unknown`:**
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
> These test failures appear pre-existing (not caused by your branch changes):
|
||||
>
|
||||
> [list each failure with file:line and brief error description]
|
||||
>
|
||||
> This is a collaborative repo — these may be someone else's responsibility.
|
||||
>
|
||||
> RECOMMENDATION: Choose B — assign it to whoever broke it so the right person fixes it. Completeness: 9/10.
|
||||
> A) Investigate and fix now anyway — Completeness: 10/10
|
||||
> B) Blame + assign GitHub issue to the author — Completeness: 9/10
|
||||
> C) Add as P0 TODO — Completeness: 7/10
|
||||
> D) Skip — ship anyway — Completeness: 3/10
|
||||
|
||||
### Step T4: Execute the chosen action
|
||||
|
||||
**If "Investigate and fix now":**
|
||||
- Switch to /investigate mindset: root cause first, then minimal fix.
|
||||
- Fix the pre-existing failure.
|
||||
- Commit the fix separately from the branch's changes: `git commit -m "fix: pre-existing test failure in <test-file>"`
|
||||
- Continue with the workflow.
|
||||
|
||||
**If "Add as P0 TODO":**
|
||||
- If `TODOS.md` exists, add the entry following the format in `review/TODOS-format.md` (or `.claude/skills/review/TODOS-format.md`).
|
||||
- If `TODOS.md` does not exist, create it with the standard header and add the entry.
|
||||
- Entry should include: title, the error output, which branch it was noticed on, and priority P0.
|
||||
- Continue with the workflow — treat the pre-existing failure as non-blocking.
|
||||
|
||||
**If "Blame + assign GitHub issue" (collaborative only):**
|
||||
- Find who likely broke it. Check BOTH the test file AND the production code it tests:
|
||||
```bash
|
||||
# Who last touched the failing test?
|
||||
git log --format="%an (%ae)" -1 -- <failing-test-file>
|
||||
# Who last touched the production code the test covers? (often the actual breaker)
|
||||
git log --format="%an (%ae)" -1 -- <source-file-under-test>
|
||||
```
|
||||
If these are different people, prefer the production code author — they likely introduced the regression.
|
||||
- Create an issue assigned to that person (use the platform detected in Step 0):
|
||||
- **If GitHub:**
|
||||
```bash
|
||||
gh issue create \
|
||||
--title "Pre-existing test failure: <test-name>" \
|
||||
--body "Found failing on branch <current-branch>. Failure is pre-existing.\n\n**Error:**\n```\n<first 10 lines>\n```\n\n**Last modified by:** <author>\n**Noticed by:** gstack /ship on <date>" \
|
||||
--assignee "<github-username>"
|
||||
```
|
||||
- **If GitLab:**
|
||||
```bash
|
||||
glab issue create \
|
||||
-t "Pre-existing test failure: <test-name>" \
|
||||
-d "Found failing on branch <current-branch>. Failure is pre-existing.\n\n**Error:**\n```\n<first 10 lines>\n```\n\n**Last modified by:** <author>\n**Noticed by:** gstack /ship on <date>" \
|
||||
-a "<gitlab-username>"
|
||||
```
|
||||
- If neither CLI is available or `--assignee`/`-a` fails (user not in org, etc.), create the issue without assignee and note who should look at it in the body.
|
||||
- Continue with the workflow.
|
||||
|
||||
**If "Skip":**
|
||||
- Continue with the workflow.
|
||||
- Note in output: "Pre-existing test failure skipped: <test-name>"
|
||||
|
||||
**After triage:** If any in-branch failures remain unfixed, **STOP**. Do not proceed. If all failures were pre-existing and handled (fixed, TODOed, assigned, or skipped), continue to Step 6.
|
||||
|
||||
**If all pass:** Continue silently — just note the counts briefly.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Eval Suites (conditional)
|
||||
|
||||
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
|
||||
|
||||
**1. Check if the diff touches prompt-related files:**
|
||||
|
||||
```bash
|
||||
git diff origin/<base> --name-only
|
||||
```
|
||||
|
||||
Match against these patterns (from CLAUDE.md):
|
||||
- `app/services/*_prompt_builder.rb`
|
||||
- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
|
||||
- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
|
||||
- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
|
||||
- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
|
||||
- `config/system_prompts/*.txt`
|
||||
- `test/evals/**/*` (eval infrastructure changes affect all suites)
|
||||
|
||||
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 9.
|
||||
|
||||
**2. Identify affected eval suites:**
|
||||
|
||||
Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
|
||||
|
||||
```bash
|
||||
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
|
||||
```
|
||||
|
||||
Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
|
||||
|
||||
**Special cases:**
|
||||
- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
|
||||
- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
|
||||
- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
|
||||
|
||||
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
|
||||
|
||||
`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
|
||||
|
||||
```bash
|
||||
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
|
||||
```
|
||||
|
||||
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
||||
|
||||
**4. Check results:**
|
||||
|
||||
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
||||
- **If all pass:** Note pass counts and cost. Continue to Step 9.
|
||||
|
||||
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 19).
|
||||
|
||||
**Tier reference (for context — /ship always uses `full`):**
|
||||
| Tier | When | Speed (cached) | Cost |
|
||||
|------|------|----------------|------|
|
||||
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
|
||||
| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
|
||||
| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
|
||||
|
||||
---
|
||||
|
|
@ -0,0 +1,93 @@
|
|||
## Step 4: Test Framework Bootstrap
|
||||
|
||||
{{TEST_BOOTSTRAP}}
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Run tests (on merged code)
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
`db:test:prepare` internally, which loads the schema into the correct lane database.
|
||||
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
|
||||
|
||||
Run both test suites in parallel:
|
||||
|
||||
```bash
|
||||
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
|
||||
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
|
||||
wait
|
||||
```
|
||||
|
||||
After both complete, read the output files and check pass/fail.
|
||||
|
||||
**If any test fails:** Do NOT immediately stop. Apply the Test Failure Ownership Triage:
|
||||
|
||||
{{TEST_FAILURE_TRIAGE}}
|
||||
|
||||
**After triage:** If any in-branch failures remain unfixed, **STOP**. Do not proceed. If all failures were pre-existing and handled (fixed, TODOed, assigned, or skipped), continue to Step 6.
|
||||
|
||||
**If all pass:** Continue silently — just note the counts briefly.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Eval Suites (conditional)
|
||||
|
||||
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
|
||||
|
||||
**1. Check if the diff touches prompt-related files:**
|
||||
|
||||
```bash
|
||||
git diff origin/<base> --name-only
|
||||
```
|
||||
|
||||
Match against these patterns (from CLAUDE.md):
|
||||
- `app/services/*_prompt_builder.rb`
|
||||
- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
|
||||
- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
|
||||
- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
|
||||
- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
|
||||
- `config/system_prompts/*.txt`
|
||||
- `test/evals/**/*` (eval infrastructure changes affect all suites)
|
||||
|
||||
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 9.
|
||||
|
||||
**2. Identify affected eval suites:**
|
||||
|
||||
Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
|
||||
|
||||
```bash
|
||||
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
|
||||
```
|
||||
|
||||
Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
|
||||
|
||||
**Special cases:**
|
||||
- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
|
||||
- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
|
||||
- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
|
||||
|
||||
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
|
||||
|
||||
`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
|
||||
|
||||
```bash
|
||||
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
|
||||
```
|
||||
|
||||
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
||||
|
||||
**4. Check results:**
|
||||
|
||||
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
||||
- **If all pass:** Note pass counts and cost. Continue to Step 9.
|
||||
|
||||
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 19).
|
||||
|
||||
**Tier reference (for context — /ship always uses `full`):**
|
||||
| Tier | When | Speed (cached) | Cost |
|
||||
|------|------|----------------|------|
|
||||
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
|
||||
| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
|
||||
| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
|
||||
|
||||
---
|
||||
|
|
@ -990,6 +990,12 @@ file globs. Run `/sync-gbrain` after meaningful code changes; for ongoing
|
|||
auto-sync across all worktrees, run `gbrain autopilot --install` once per
|
||||
machine — gbrain's daemon handles incremental refresh on a schedule.
|
||||
|
||||
Safety: don't run `/sync-gbrain` while `gbrain autopilot` is active — the
|
||||
orchestrator refuses destructive source ops when it detects a running autopilot
|
||||
to avoid racing it (#1734). Prefer registering user repos with `gbrain sources
|
||||
add --path <dir>` (no `--url`): URL-managed sources can auto-reclone, and the
|
||||
sync code walk for them requires an explicit `--allow-reclone` opt-in.
|
||||
|
||||
<!-- gstack-gbrain-search-guidance:end -->
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -295,6 +295,12 @@ file globs. Run `/sync-gbrain` after meaningful code changes; for ongoing
|
|||
auto-sync across all worktrees, run `gbrain autopilot --install` once per
|
||||
machine — gbrain's daemon handles incremental refresh on a schedule.
|
||||
|
||||
Safety: don't run `/sync-gbrain` while `gbrain autopilot` is active — the
|
||||
orchestrator refuses destructive source ops when it detects a running autopilot
|
||||
to avoid racing it (#1734). Prefer registering user repos with `gbrain sources
|
||||
add --path <dir>` (no `--url`): URL-managed sources can auto-reclone, and the
|
||||
sync code walk for them requires an explicit `--allow-reclone` opt-in.
|
||||
|
||||
<!-- gstack-gbrain-search-guidance:end -->
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -60,7 +60,9 @@ describe('--catalog-mode=full opt-out behavior (smoke)', () => {
|
|||
test('--catalog-mode=full produces multi-line description in frontmatter', () => {
|
||||
// Save the trim'd state so we can restore it.
|
||||
const trimmedShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
|
||||
expect(trimmedShip).toMatch(/^description: Ship workflow:[^\n]*\(gstack\)\n/m);
|
||||
// #1778: the trimmed ship description has an interior colon ("Ship workflow:")
|
||||
// and is now YAML-quoted — tolerate the optional surrounding quotes.
|
||||
expect(trimmedShip).toMatch(/^description: "?Ship workflow:[^\n]*\(gstack\)"?\n/m);
|
||||
|
||||
try {
|
||||
// Run with --catalog-mode=full. Mutates working tree.
|
||||
|
|
@ -100,7 +102,8 @@ describe('--catalog-mode=full opt-out behavior (smoke)', () => {
|
|||
}
|
||||
// Sanity-check the restored state matches what we saw at the start.
|
||||
const restoredShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
|
||||
expect(restoredShip).toMatch(/^description: Ship workflow:[^\n]*\(gstack\)\n/m);
|
||||
// #1778: restored trim state has the YAML-quoted (interior-colon) description.
|
||||
expect(restoredShip).toMatch(/^description: "?Ship workflow:[^\n]*\(gstack\)"?\n/m);
|
||||
}
|
||||
}, 180_000);
|
||||
|
||||
|
|
|
|||
|
|
@ -227,8 +227,10 @@ Original body content here.
|
|||
const result = applyCatalogTrim(minimalSkill, 'example');
|
||||
expect(result).not.toBeNull();
|
||||
const { content, parts } = result!;
|
||||
// Frontmatter description is now ONE line ending with (gstack)
|
||||
expect(content).toMatch(/^description: Example skill:[^\n]*\(gstack\)\n/m);
|
||||
// Frontmatter description is now ONE line ending with (gstack). #1778: a
|
||||
// description with an interior colon ("Example skill:") is YAML-quoted, so
|
||||
// the value is wrapped in double quotes — tolerate the optional quotes.
|
||||
expect(content).toMatch(/^description: "?Example skill:[^\n]*\(gstack\)"?\n/m);
|
||||
// Body has the When to invoke section
|
||||
expect(content).toContain('## When to invoke this skill');
|
||||
expect(content).toContain('Use when asked to do an example task.');
|
||||
|
|
@ -257,7 +259,8 @@ Original body content here.
|
|||
expect(result).not.toBeNull();
|
||||
expect(result!.content).not.toMatch(/\(gstack\)preamble-tier/);
|
||||
expect(result!.content).not.toMatch(/\(gstack\)allowed-tools/);
|
||||
expect(result!.content).toMatch(/\(gstack\)\n[a-z-]+:/);
|
||||
// #1778: optional closing quote when the description was YAML-quoted.
|
||||
expect(result!.content).toMatch(/\(gstack\)"?\n[a-z-]+:/);
|
||||
});
|
||||
|
||||
test('returns null on content without proper frontmatter', () => {
|
||||
|
|
|
|||
|
|
@ -0,0 +1,57 @@
|
|||
/**
|
||||
* Unit coverage for discoverSectionTemplates — the section-discovery half of the
|
||||
* v2 plan T9 pipeline. Drives it against a temp fixture tree so it doesn't
|
||||
* depend on which skills have been carved in the real repo.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import { discoverSectionTemplates } from '../scripts/discover-skills';
|
||||
|
||||
const root = fs.mkdtempSync(path.join(os.tmpdir(), 'sections-disc-'));
|
||||
afterAll(() => { try { fs.rmSync(root, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
// ship/ has two section templates + a non-template file; review/ has none;
|
||||
// hidden + node_modules dirs must be skipped by the shared subdirs() filter.
|
||||
fs.mkdirSync(path.join(root, 'ship', 'sections'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'ship', 'SKILL.md.tmpl'), '---\nname: ship\n---\nbody');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'version-bump.md.tmpl'), 'bump');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'changelog.md.tmpl'), 'changelog');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'manifest.json'), '{}'); // not a .md.tmpl
|
||||
fs.mkdirSync(path.join(root, 'review'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'review', 'SKILL.md.tmpl'), '---\nname: review\n---\nbody');
|
||||
fs.mkdirSync(path.join(root, 'node_modules', 'sections'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'node_modules', 'sections', 'x.md.tmpl'), 'nope');
|
||||
|
||||
describe('discoverSectionTemplates', () => {
|
||||
const found = discoverSectionTemplates(root);
|
||||
|
||||
test('finds only *.md.tmpl files inside <skill>/sections/', () => {
|
||||
expect(found.map(f => f.tmpl)).toEqual([
|
||||
'ship/sections/changelog.md.tmpl',
|
||||
'ship/sections/version-bump.md.tmpl',
|
||||
]);
|
||||
});
|
||||
|
||||
test('strips .tmpl for the output path and records the owning skill dir', () => {
|
||||
const bump = found.find(f => f.tmpl.endsWith('version-bump.md.tmpl'))!;
|
||||
expect(bump.output).toBe('ship/sections/version-bump.md');
|
||||
expect(bump.skillDir).toBe('ship');
|
||||
});
|
||||
|
||||
test('ignores non-template files (manifest.json) and skipped dirs (node_modules)', () => {
|
||||
expect(found.some(f => f.tmpl.includes('manifest.json'))).toBe(false);
|
||||
expect(found.some(f => f.tmpl.includes('node_modules'))).toBe(false);
|
||||
});
|
||||
|
||||
test('returns deterministic (sorted) order', () => {
|
||||
const tmpls = found.map(f => f.tmpl);
|
||||
expect([...tmpls].sort()).toEqual(tmpls);
|
||||
});
|
||||
|
||||
test('skills without a sections/ dir contribute nothing', () => {
|
||||
expect(found.some(f => f.skillDir === 'review')).toBe(false);
|
||||
});
|
||||
});
|
||||
File diff suppressed because it is too large
Load Diff
|
|
@ -805,6 +805,10 @@ Only *actions* are idempotent:
|
|||
- Step 19: If PR exists, update the body instead of creating a new PR
|
||||
Never skip a verification step because a prior `/ship` run already performed it.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Pre-flight
|
||||
|
|
@ -2098,150 +2102,37 @@ If any learnings come back, name which one applies to the version bump or CHANGE
|
|||
|
||||
## Step 12: Version bump (auto-decide)
|
||||
|
||||
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
|
||||
|
||||
```bash
|
||||
if ! git rev-parse --verify origin/<base> >/dev/null 2>&1; then
|
||||
echo "ERROR: Unable to resolve origin/<base>. Run 'git fetch origin' or verify the base branch exists."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
|
||||
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
|
||||
PKG_VERSION=""
|
||||
PKG_EXISTS=0
|
||||
if [ -f package.json ]; then
|
||||
PKG_EXISTS=1
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PARSE_EXIT" != "0" ]; then
|
||||
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
|
||||
|
||||
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_UNEXPECTED"
|
||||
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
|
||||
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
|
||||
exit 1
|
||||
fi
|
||||
echo "STATE: FRESH"
|
||||
else
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_STALE_PKG"
|
||||
else
|
||||
echo "STATE: ALREADY_BUMPED"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
Read the `STATE:` line and dispatch:
|
||||
|
||||
- **FRESH** → proceed with the bump action below (steps 1–4).
|
||||
- **ALREADY_BUMPED** → skip the bump by default, BUT check for queue drift first: call `bin/gstack-next-version` with the implied bump level (derived from `CURRENT_VERSION` vs `BASE_VERSION`), compare its `.version` against `CURRENT_VERSION`. If they differ (queue moved since last ship), use **AskUserQuestion**: "VERSION drift detected: you claim v<CURRENT> but next available is v<NEW> (queue moved). A) Rebump to v<NEW> and rewrite CHANGELOG header + PR title (recommended), B) Keep v<CURRENT> — will be rejected by CI version-gate until resolved." If A, treat this as FRESH with `NEW_VERSION=<new>` and run steps 1-4 (which will also trigger Step 13 CHANGELOG header rewrite and Step 19 PR title rewrite). If B, reuse `CURRENT_VERSION` and warn that CI will likely reject. If util is offline, warn and reuse `CURRENT_VERSION`.
|
||||
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. (Queue check still runs in ALREADY_BUMPED terms after repair.)
|
||||
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
|
||||
|
||||
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
|
||||
|
||||
2. **Auto-decide the bump level based on the diff:**
|
||||
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
|
||||
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
|
||||
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
|
||||
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
|
||||
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
|
||||
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
|
||||
|
||||
Save the chosen level as `BUMP_LEVEL` (one of `major`, `minor`, `patch`, `micro`). This is the user-intended level. The next step decides *placement* — the level stays the same even if queue-aware allocation has to advance past a claimed slot.
|
||||
|
||||
3. **Queue-aware version pick (workspace-aware ship, v1.6.4.0+).** Call `bin/gstack-next-version` to see what's already claimed by open PRs + active sibling Conductor worktrees, then render the queue state to the user:
|
||||
The deterministic version-state logic is the tested **`gstack-version-bump`** CLI
|
||||
(classify / write / repair). The bump-LEVEL decision and queue-collision handling
|
||||
stay agent judgment; the slot pick stays `gstack-next-version`.
|
||||
|
||||
1. **Classify state** — pure reader, never writes:
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run bin/gstack-next-version \
|
||||
--base <base> \
|
||||
--bump "$BUMP_LEVEL" \
|
||||
--current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
CLAIMED_COUNT=$(echo "$QUEUE_JSON" | jq -r '.claimed | length')
|
||||
ACTIVE_SIBLING_COUNT=$(echo "$QUEUE_JSON" | jq -r '.active_siblings | length')
|
||||
OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
|
||||
REASON=$(echo "$QUEUE_JSON" | jq -r '.reason // ""')
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump classify --base <base>
|
||||
```
|
||||
Read the JSON `state` and dispatch:
|
||||
- **FRESH** → do the bump (steps 2-4).
|
||||
- **ALREADY_BUMPED** → skip the bump, but run the queue-drift check (step 3) with the reported `currentVersion`. If the queue moved (next free version differs), **AskUserQuestion**: rebump to the new version (rewrites CHANGELOG header + PR title) or keep current (CI version-gate will reject until resolved).
|
||||
- **DRIFT_STALE_PKG** → run `gstack-version-bump repair` (syncs package.json to VERSION). No re-bump; reuse `currentVersion` for CHANGELOG + PR.
|
||||
- **DRIFT_UNEXPECTED** → **STOP**. package.json disagrees with VERSION while VERSION matches base — a manual edit bypassed /ship. Reconcile manually, then re-run.
|
||||
|
||||
- If `OFFLINE=true` or the util fails (auth expired, no `gh`/`glab`, network): fall back to local `BUMP_LEVEL` arithmetic (bump `BASE_VERSION` at the chosen level). Print `⚠ workspace-aware ship offline — using local bump only`. Continue.
|
||||
- If `CLAIMED_COUNT > 0`: render the queue table to the user so they can see landing order at a glance:
|
||||
```
|
||||
Queue on <base> (vBASE_VERSION):
|
||||
#<pr> <branch> → v<version> [⚠ collision with #<other>]
|
||||
Active sibling workspaces (WIP, not yet PR'd):
|
||||
<path> → v<version> (committed Nh ago)
|
||||
Your branch will claim: vNEW_VERSION (<reason>)
|
||||
```
|
||||
- If `ACTIVE_SIBLING_COUNT > 0` and any active sibling's VERSION is `>= NEW_VERSION`, use **AskUserQuestion**: "Sibling workspace <path> has v<X> committed <N>h ago but hasn't PR'd yet. Wait for them to ship first, or advance past? A) Advance past (recommended for unrelated work), B) Abort /ship and sync up with sibling first."
|
||||
- Validate `NEW_VERSION` matches `MAJOR.MINOR.PATCH.MICRO`. If util returns an empty or malformed version, fall back to local bump.
|
||||
2. **Decide the bump level** from the diff (agent judgment):
|
||||
- **MICRO**: <50 lines, trivial tweaks/config. **PATCH**: 50+ lines, no feature signals.
|
||||
- **MINOR**: **ASK** if any feature signal (new route/page, migration, new module), OR 500+ lines. **MAJOR**: **ASK** — milestones or breaking changes only.
|
||||
Save as `BUMP_LEVEL`. The level is the user-intended bump; queue-aware placement may advance the slot without changing the level.
|
||||
|
||||
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
|
||||
3. **Queue-aware pick** (workspace-aware ship):
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run $GSTACK_ROOT/bin/gstack-next-version --base <base> --bump "$BUMP_LEVEL" --current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
```
|
||||
If `offline`/util fails: fall back to local `BUMP_LEVEL` arithmetic and print `⚠ workspace-aware ship offline — using local bump only`. If `claimed` is non-empty, render the queue table so the user sees landing order. If an active sibling workspace holds a version `>= NEW_VERSION`, **AskUserQuestion**: advance past (unrelated work) or abort and sync with the sibling.
|
||||
|
||||
```bash
|
||||
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
|
||||
exit 1
|
||||
fi
|
||||
echo "$NEW_VERSION" > VERSION
|
||||
if [ -f package.json ]; then
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
|
||||
exit 1
|
||||
}
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
|
||||
|
||||
```bash
|
||||
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
|
||||
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
|
||||
exit 1
|
||||
fi
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed — could not update package.json."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed."
|
||||
exit 1
|
||||
}
|
||||
fi
|
||||
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
|
||||
```
|
||||
|
||||
---
|
||||
4. **Write the bump** (FRESH, or an approved rebump):
|
||||
```bash
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump write --version "$NEW_VERSION"
|
||||
```
|
||||
The CLI validates the 4-digit `MAJOR.MINOR.PATCH.MICRO` pattern and writes **both** VERSION and package.json. On a half-write (VERSION written, package.json failed) it exits 3 — re-run, and classify will report DRIFT_STALE_PKG for `repair` to fix.
|
||||
|
||||
## Step 13: CHANGELOG (auto-generate)
|
||||
|
||||
|
|
@ -2746,6 +2637,16 @@ no-op. The marker guarantees at-most-once per machine. To re-enable:
|
|||
|
||||
---
|
||||
|
||||
## Section self-check (before you finish)
|
||||
|
||||
You ran a carved skill. For your situation, list every section the Section index
|
||||
named as applying, and confirm you issued a Read for each one. If you executed any
|
||||
of those steps from memory without reading its section, you skipped the source of
|
||||
truth — STOP, Read it now, and redo that step. Deterministic version work goes
|
||||
through `gstack-version-bump`; never hand-roll the VERSION/package.json write.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
|
|
|
|||
|
|
@ -807,6 +807,10 @@ Only *actions* are idempotent:
|
|||
- Step 19: If PR exists, update the body instead of creating a new PR
|
||||
Never skip a verification step because a prior `/ship` run already performed it.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Pre-flight
|
||||
|
|
@ -2476,150 +2480,37 @@ If any learnings come back, name which one applies to the version bump or CHANGE
|
|||
|
||||
## Step 12: Version bump (auto-decide)
|
||||
|
||||
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
|
||||
|
||||
```bash
|
||||
if ! git rev-parse --verify origin/<base> >/dev/null 2>&1; then
|
||||
echo "ERROR: Unable to resolve origin/<base>. Run 'git fetch origin' or verify the base branch exists."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
|
||||
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
|
||||
PKG_VERSION=""
|
||||
PKG_EXISTS=0
|
||||
if [ -f package.json ]; then
|
||||
PKG_EXISTS=1
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PARSE_EXIT" != "0" ]; then
|
||||
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
|
||||
|
||||
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_UNEXPECTED"
|
||||
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
|
||||
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
|
||||
exit 1
|
||||
fi
|
||||
echo "STATE: FRESH"
|
||||
else
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_STALE_PKG"
|
||||
else
|
||||
echo "STATE: ALREADY_BUMPED"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
Read the `STATE:` line and dispatch:
|
||||
|
||||
- **FRESH** → proceed with the bump action below (steps 1–4).
|
||||
- **ALREADY_BUMPED** → skip the bump by default, BUT check for queue drift first: call `bin/gstack-next-version` with the implied bump level (derived from `CURRENT_VERSION` vs `BASE_VERSION`), compare its `.version` against `CURRENT_VERSION`. If they differ (queue moved since last ship), use **AskUserQuestion**: "VERSION drift detected: you claim v<CURRENT> but next available is v<NEW> (queue moved). A) Rebump to v<NEW> and rewrite CHANGELOG header + PR title (recommended), B) Keep v<CURRENT> — will be rejected by CI version-gate until resolved." If A, treat this as FRESH with `NEW_VERSION=<new>` and run steps 1-4 (which will also trigger Step 13 CHANGELOG header rewrite and Step 19 PR title rewrite). If B, reuse `CURRENT_VERSION` and warn that CI will likely reject. If util is offline, warn and reuse `CURRENT_VERSION`.
|
||||
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. (Queue check still runs in ALREADY_BUMPED terms after repair.)
|
||||
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
|
||||
|
||||
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
|
||||
|
||||
2. **Auto-decide the bump level based on the diff:**
|
||||
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
|
||||
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
|
||||
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
|
||||
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
|
||||
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
|
||||
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
|
||||
|
||||
Save the chosen level as `BUMP_LEVEL` (one of `major`, `minor`, `patch`, `micro`). This is the user-intended level. The next step decides *placement* — the level stays the same even if queue-aware allocation has to advance past a claimed slot.
|
||||
|
||||
3. **Queue-aware version pick (workspace-aware ship, v1.6.4.0+).** Call `bin/gstack-next-version` to see what's already claimed by open PRs + active sibling Conductor worktrees, then render the queue state to the user:
|
||||
The deterministic version-state logic is the tested **`gstack-version-bump`** CLI
|
||||
(classify / write / repair). The bump-LEVEL decision and queue-collision handling
|
||||
stay agent judgment; the slot pick stays `gstack-next-version`.
|
||||
|
||||
1. **Classify state** — pure reader, never writes:
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run bin/gstack-next-version \
|
||||
--base <base> \
|
||||
--bump "$BUMP_LEVEL" \
|
||||
--current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
CLAIMED_COUNT=$(echo "$QUEUE_JSON" | jq -r '.claimed | length')
|
||||
ACTIVE_SIBLING_COUNT=$(echo "$QUEUE_JSON" | jq -r '.active_siblings | length')
|
||||
OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
|
||||
REASON=$(echo "$QUEUE_JSON" | jq -r '.reason // ""')
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump classify --base <base>
|
||||
```
|
||||
Read the JSON `state` and dispatch:
|
||||
- **FRESH** → do the bump (steps 2-4).
|
||||
- **ALREADY_BUMPED** → skip the bump, but run the queue-drift check (step 3) with the reported `currentVersion`. If the queue moved (next free version differs), **AskUserQuestion**: rebump to the new version (rewrites CHANGELOG header + PR title) or keep current (CI version-gate will reject until resolved).
|
||||
- **DRIFT_STALE_PKG** → run `gstack-version-bump repair` (syncs package.json to VERSION). No re-bump; reuse `currentVersion` for CHANGELOG + PR.
|
||||
- **DRIFT_UNEXPECTED** → **STOP**. package.json disagrees with VERSION while VERSION matches base — a manual edit bypassed /ship. Reconcile manually, then re-run.
|
||||
|
||||
- If `OFFLINE=true` or the util fails (auth expired, no `gh`/`glab`, network): fall back to local `BUMP_LEVEL` arithmetic (bump `BASE_VERSION` at the chosen level). Print `⚠ workspace-aware ship offline — using local bump only`. Continue.
|
||||
- If `CLAIMED_COUNT > 0`: render the queue table to the user so they can see landing order at a glance:
|
||||
```
|
||||
Queue on <base> (vBASE_VERSION):
|
||||
#<pr> <branch> → v<version> [⚠ collision with #<other>]
|
||||
Active sibling workspaces (WIP, not yet PR'd):
|
||||
<path> → v<version> (committed Nh ago)
|
||||
Your branch will claim: vNEW_VERSION (<reason>)
|
||||
```
|
||||
- If `ACTIVE_SIBLING_COUNT > 0` and any active sibling's VERSION is `>= NEW_VERSION`, use **AskUserQuestion**: "Sibling workspace <path> has v<X> committed <N>h ago but hasn't PR'd yet. Wait for them to ship first, or advance past? A) Advance past (recommended for unrelated work), B) Abort /ship and sync up with sibling first."
|
||||
- Validate `NEW_VERSION` matches `MAJOR.MINOR.PATCH.MICRO`. If util returns an empty or malformed version, fall back to local bump.
|
||||
2. **Decide the bump level** from the diff (agent judgment):
|
||||
- **MICRO**: <50 lines, trivial tweaks/config. **PATCH**: 50+ lines, no feature signals.
|
||||
- **MINOR**: **ASK** if any feature signal (new route/page, migration, new module), OR 500+ lines. **MAJOR**: **ASK** — milestones or breaking changes only.
|
||||
Save as `BUMP_LEVEL`. The level is the user-intended bump; queue-aware placement may advance the slot without changing the level.
|
||||
|
||||
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
|
||||
3. **Queue-aware pick** (workspace-aware ship):
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run $GSTACK_ROOT/bin/gstack-next-version --base <base> --bump "$BUMP_LEVEL" --current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
```
|
||||
If `offline`/util fails: fall back to local `BUMP_LEVEL` arithmetic and print `⚠ workspace-aware ship offline — using local bump only`. If `claimed` is non-empty, render the queue table so the user sees landing order. If an active sibling workspace holds a version `>= NEW_VERSION`, **AskUserQuestion**: advance past (unrelated work) or abort and sync with the sibling.
|
||||
|
||||
```bash
|
||||
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
|
||||
exit 1
|
||||
fi
|
||||
echo "$NEW_VERSION" > VERSION
|
||||
if [ -f package.json ]; then
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
|
||||
exit 1
|
||||
}
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
|
||||
|
||||
```bash
|
||||
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
|
||||
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
|
||||
exit 1
|
||||
fi
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed — could not update package.json."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed."
|
||||
exit 1
|
||||
}
|
||||
fi
|
||||
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
|
||||
```
|
||||
|
||||
---
|
||||
4. **Write the bump** (FRESH, or an approved rebump):
|
||||
```bash
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump write --version "$NEW_VERSION"
|
||||
```
|
||||
The CLI validates the 4-digit `MAJOR.MINOR.PATCH.MICRO` pattern and writes **both** VERSION and package.json. On a half-write (VERSION written, package.json failed) it exits 3 — re-run, and classify will report DRIFT_STALE_PKG for `repair` to fix.
|
||||
|
||||
## Step 13: CHANGELOG (auto-generate)
|
||||
|
||||
|
|
@ -3124,6 +3015,16 @@ no-op. The marker guarantees at-most-once per machine. To re-enable:
|
|||
|
||||
---
|
||||
|
||||
## Section self-check (before you finish)
|
||||
|
||||
You ran a carved skill. For your situation, list every section the Section index
|
||||
named as applying, and confirm you issued a Read for each one. If you executed any
|
||||
of those steps from memory without reading its section, you skipped the source of
|
||||
truth — STOP, Read it now, and redo that step. Deterministic version work goes
|
||||
through `gstack-version-bump`; never hand-roll the VERSION/package.json write.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
|
|
|
|||
|
|
@ -204,14 +204,30 @@ describe('gstack-gbrain-install D19 PATH-shadow validation', () => {
|
|||
}
|
||||
|
||||
test('passes when install-dir version matches `gbrain --version` on PATH', () => {
|
||||
// Version must be >= MIN_GBRAIN_VERSION (0.20.0) floor (#1744).
|
||||
const installDir = seedInstallDir('0.41.29');
|
||||
const fakeBin = seedFakeGbrainBinary('0.41.29');
|
||||
try {
|
||||
const r = run(INSTALL, ['--validate-only', '--install-dir', installDir], {
|
||||
env: { PATH: `${fakeBin}:${SAFE_PATH}` },
|
||||
});
|
||||
expect(r.status).toBe(0);
|
||||
expect(r.stdout).toContain('installed gbrain 0.41.29');
|
||||
} finally {
|
||||
fs.rmSync(installDir, { recursive: true, force: true });
|
||||
fs.rmSync(fakeBin, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test('hard-fails (exit 3) when the installed gbrain is below the version floor (#1744)', () => {
|
||||
const installDir = seedInstallDir('0.18.2');
|
||||
const fakeBin = seedFakeGbrainBinary('0.18.2');
|
||||
try {
|
||||
const r = run(INSTALL, ['--validate-only', '--install-dir', installDir], {
|
||||
env: { PATH: `${fakeBin}:${SAFE_PATH}` },
|
||||
});
|
||||
expect(r.status).toBe(0);
|
||||
expect(r.stdout).toContain('installed gbrain 0.18.2');
|
||||
expect(r.status).toBe(3);
|
||||
expect(r.stderr).toContain('below the minimum gstack-tested version');
|
||||
} finally {
|
||||
fs.rmSync(installDir, { recursive: true, force: true });
|
||||
fs.rmSync(fakeBin, { recursive: true, force: true });
|
||||
|
|
@ -219,8 +235,8 @@ describe('gstack-gbrain-install D19 PATH-shadow validation', () => {
|
|||
});
|
||||
|
||||
test('tolerates a leading "v" in `gbrain --version` output', () => {
|
||||
const installDir = seedInstallDir('0.18.2');
|
||||
const fakeBin = seedFakeGbrainBinary('v0.18.2');
|
||||
const installDir = seedInstallDir('0.41.29');
|
||||
const fakeBin = seedFakeGbrainBinary('v0.41.29');
|
||||
try {
|
||||
const r = run(INSTALL, ['--validate-only', '--install-dir', installDir], {
|
||||
env: { PATH: `${fakeBin}:${SAFE_PATH}` },
|
||||
|
|
|
|||
|
|
@ -0,0 +1,140 @@
|
|||
import { describe, test, expect, afterEach } from "bun:test";
|
||||
import * as fs from "fs";
|
||||
import * as os from "os";
|
||||
import { join } from "path";
|
||||
import {
|
||||
detectAutopilot,
|
||||
decideSourceRemove,
|
||||
decideCodeSync,
|
||||
isInside,
|
||||
_resetCapabilityMemo,
|
||||
type GbrainSourceRow,
|
||||
} from "../lib/gbrain-guards";
|
||||
|
||||
const HOME = os.homedir();
|
||||
const clonesPath = (name: string) => join(HOME, ".gbrain", "clones", name);
|
||||
|
||||
afterEach(() => _resetCapabilityMemo());
|
||||
|
||||
// ── #1734 autopilot detection (E1: affirmative multi-signal) ────────────────
|
||||
describe("detectAutopilot", () => {
|
||||
test("refuses on a present lock file (secondary signal)", () => {
|
||||
const tmp = fs.mkdtempSync(join(os.tmpdir(), "ap-"));
|
||||
const lock = join(tmp, "autopilot.lock");
|
||||
fs.writeFileSync(lock, "");
|
||||
const r = detectAutopilot(process.env, { lockPaths: [lock], processRunning: () => false });
|
||||
expect(r.active).toBe(true);
|
||||
expect(r.signal).toContain("lock:");
|
||||
});
|
||||
|
||||
test("refuses on a live autopilot process (primary signal)", () => {
|
||||
const r = detectAutopilot(process.env, { lockPaths: [], processRunning: () => true });
|
||||
expect(r.active).toBe(true);
|
||||
expect(r.signal).toBe("process:gbrain autopilot");
|
||||
});
|
||||
|
||||
test("proceeds when no signal fires (never blanket-refuses)", () => {
|
||||
const r = detectAutopilot(process.env, { lockPaths: [], processRunning: () => false });
|
||||
expect(r.active).toBe(false);
|
||||
expect(r.signal).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
// ── #1734 remove safety (E7: fail closed on user-managed without keep-storage) ─
|
||||
describe("decideSourceRemove", () => {
|
||||
const rows = (extra: GbrainSourceRow[] = []): GbrainSourceRow[] => [
|
||||
{ id: "gbrain-managed", local_path: clonesPath("repo"), config: { remote_url: "https://x/r.git" } },
|
||||
{ id: "user-managed", local_path: "/tmp/user-repo", config: { remote_url: "https://x/r.git" } },
|
||||
{ id: "path-managed", local_path: "/tmp/path-repo" }, // no remote_url
|
||||
...extra,
|
||||
];
|
||||
const fetchRows = (extra?: GbrainSourceRow[]) => () => rows(extra);
|
||||
|
||||
test("absent source → allow (no-op)", () => {
|
||||
const d = decideSourceRemove("nope", process.env, { keepStorage: false, fetchRows: fetchRows() });
|
||||
expect(d.allow).toBe(true);
|
||||
expect(d.reason).toContain("absent");
|
||||
});
|
||||
|
||||
test("user-managed + no --keep-storage → FAIL CLOSED", () => {
|
||||
const d = decideSourceRemove("user-managed", process.env, { keepStorage: false, fetchRows: fetchRows() });
|
||||
expect(d.allow).toBe(false);
|
||||
expect(d.reason).toContain("user-managed");
|
||||
});
|
||||
|
||||
test("user-managed + --keep-storage supported → allow with flag", () => {
|
||||
const d = decideSourceRemove("user-managed", process.env, { keepStorage: true, fetchRows: fetchRows() });
|
||||
expect(d.allow).toBe(true);
|
||||
expect(d.extraArgs).toContain("--keep-storage");
|
||||
});
|
||||
|
||||
test("gbrain-managed (inside clones) → allow even without keep-storage", () => {
|
||||
const d = decideSourceRemove("gbrain-managed", process.env, { keepStorage: false, fetchRows: fetchRows() });
|
||||
expect(d.allow).toBe(true);
|
||||
});
|
||||
|
||||
test("path-managed without remote_url → allow (normal --path case)", () => {
|
||||
const d = decideSourceRemove("path-managed", process.env, { keepStorage: false, fetchRows: fetchRows() });
|
||||
expect(d.allow).toBe(true);
|
||||
});
|
||||
|
||||
test("sources unreadable → FAIL CLOSED", () => {
|
||||
const d = decideSourceRemove("user-managed", process.env, {
|
||||
keepStorage: false,
|
||||
fetchRows: () => { throw new Error("boom"); },
|
||||
});
|
||||
expect(d.allow).toBe(false);
|
||||
expect(d.reason).toContain("fail closed");
|
||||
});
|
||||
});
|
||||
|
||||
// ── #1734 reclone guard (E-level: require --allow-reclone for URL-managed) ───
|
||||
describe("decideCodeSync", () => {
|
||||
const rows: GbrainSourceRow[] = [
|
||||
{ id: "url-managed", local_path: "/tmp/u", config: { remote_url: "https://x/r.git" } },
|
||||
{ id: "plain", local_path: "/tmp/p" },
|
||||
];
|
||||
const fetch = () => rows;
|
||||
|
||||
test("URL-managed + no --allow-reclone → refuse", () => {
|
||||
const d = decideCodeSync("url-managed", process.env, false, fetch);
|
||||
expect(d.allow).toBe(false);
|
||||
expect(d.reason).toContain("auto-reclone");
|
||||
});
|
||||
|
||||
test("URL-managed + --allow-reclone → allow", () => {
|
||||
const d = decideCodeSync("url-managed", process.env, true, fetch);
|
||||
expect(d.allow).toBe(true);
|
||||
});
|
||||
|
||||
test("no remote_url → allow", () => {
|
||||
const d = decideCodeSync("plain", process.env, false, fetch);
|
||||
expect(d.allow).toBe(true);
|
||||
});
|
||||
|
||||
test("sources unreadable → fail OPEN (sync read is non-destructive)", () => {
|
||||
const d = decideCodeSync("url-managed", process.env, false, () => { throw new Error("boom"); });
|
||||
expect(d.allow).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
// ── path containment uses realpath (symlink can't smuggle a delete out) ──────
|
||||
describe("isInside", () => {
|
||||
test("plain path inside dir", () => {
|
||||
expect(isInside("/a/b/c", "/a/b")).toBe(true);
|
||||
expect(isInside("/a/x", "/a/b")).toBe(false);
|
||||
});
|
||||
|
||||
test("sibling-prefix is not 'inside' (clonesX vs clones)", () => {
|
||||
expect(isInside("/a/clones-evil/x", "/a/clones")).toBe(false);
|
||||
});
|
||||
|
||||
test("symlink pointing outside resolves outside", () => {
|
||||
const base = fs.mkdtempSync(join(os.tmpdir(), "clones-"));
|
||||
const outside = fs.mkdtempSync(join(os.tmpdir(), "outside-"));
|
||||
const link = join(base, "sneaky");
|
||||
fs.symlinkSync(outside, link);
|
||||
// link lives under base, but realpath resolves to `outside` → not inside base.
|
||||
expect(isInside(link, base)).toBe(false);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
import { describe, test, expect } from "bun:test";
|
||||
import { parseSourcesList } from "../lib/gbrain-sources";
|
||||
|
||||
// #1576 hardening: `gbrain sources list --json` has shipped two shapes — a
|
||||
// wrapped `{ sources: [...] }` object (v0.20+) and a bare top-level array.
|
||||
// parseSourcesList is the single place that normalizes both, so every reader
|
||||
// (probeSource, sourcePageCount, sourceLocalPath, the #1734 remote_url audit)
|
||||
// agrees on the shape. These tests pin both shapes plus the garbage paths.
|
||||
describe("parseSourcesList", () => {
|
||||
const rows = [
|
||||
{ id: "a", local_path: "/x", page_count: 3 },
|
||||
{ id: "b", local_path: "/y", config: { remote_url: "https://example.com/r.git" } },
|
||||
];
|
||||
|
||||
test("wrapped { sources: [...] } shape", () => {
|
||||
expect(parseSourcesList({ sources: rows })).toEqual(rows);
|
||||
});
|
||||
|
||||
test("bare top-level array shape", () => {
|
||||
expect(parseSourcesList(rows)).toEqual(rows);
|
||||
});
|
||||
|
||||
test("both shapes yield identical rows (shape-independent)", () => {
|
||||
expect(parseSourcesList({ sources: rows })).toEqual(parseSourcesList(rows));
|
||||
});
|
||||
|
||||
test("null / undefined → empty array (no throw)", () => {
|
||||
expect(parseSourcesList(null)).toEqual([]);
|
||||
expect(parseSourcesList(undefined)).toEqual([]);
|
||||
});
|
||||
|
||||
test("object without sources key → empty array", () => {
|
||||
expect(parseSourcesList({ pages: [] })).toEqual([]);
|
||||
});
|
||||
|
||||
test("sources key present but not an array → empty array", () => {
|
||||
expect(parseSourcesList({ sources: "oops" })).toEqual([]);
|
||||
});
|
||||
|
||||
test("scalar garbage → empty array", () => {
|
||||
expect(parseSourcesList("nope")).toEqual([]);
|
||||
expect(parseSourcesList(42)).toEqual([]);
|
||||
});
|
||||
|
||||
test("preserves config.remote_url for the #1734 audit", () => {
|
||||
const parsed = parseSourcesList({ sources: rows });
|
||||
expect(parsed.find((r) => r.id === "b")?.config?.remote_url).toBe("https://example.com/r.git");
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
import { describe, test, expect } from "bun:test";
|
||||
import * as fs from "fs";
|
||||
import * as path from "path";
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, "..");
|
||||
const read = (rel: string) => fs.readFileSync(path.join(ROOT, rel), "utf-8");
|
||||
|
||||
// #1731 tripwire. Windows can't spawn the `gbrain` shim (gbrain.cmd) or the bash
|
||||
// shebang script gstack-brain-sync without a shell; the fix gates `shell: true`
|
||||
// behind NEEDS_SHELL_ON_WINDOWS. These static checks fail CI if a refactor adds
|
||||
// a gbrain/brain-sync child spawn without the Windows shell flag, since macOS/
|
||||
// Linux CI can't exercise the Windows path at runtime.
|
||||
describe("#1731 gbrain spawns carry the Windows shell flag", () => {
|
||||
test("NEEDS_SHELL_ON_WINDOWS is platform-gated in gbrain-exec.ts", () => {
|
||||
const src = read("lib/gbrain-exec.ts");
|
||||
expect(src).toMatch(/export const NEEDS_SHELL_ON_WINDOWS\s*=\s*process\.platform === "win32"/);
|
||||
});
|
||||
|
||||
// Every direct `gbrain` child spawn in these files must be matched by a
|
||||
// shell:NEEDS_SHELL_ON_WINDOWS flag. Count openers vs flags as a cheap,
|
||||
// refactor-resistant invariant.
|
||||
const gbrainSpawnFiles = [
|
||||
"lib/gbrain-exec.ts",
|
||||
"lib/gbrain-sources.ts",
|
||||
"lib/gbrain-local-status.ts",
|
||||
];
|
||||
for (const rel of gbrainSpawnFiles) {
|
||||
test(`${rel}: every gbrain spawn has shell:NEEDS_SHELL_ON_WINDOWS`, () => {
|
||||
const src = read(rel);
|
||||
const spawnOpeners = src.match(/(spawnSync|spawn|execFileSync)\("gbrain"/g)?.length ?? 0;
|
||||
const shellFlags = src.match(/shell:\s*NEEDS_SHELL_ON_WINDOWS/g)?.length ?? 0;
|
||||
expect(spawnOpeners).toBeGreaterThan(0);
|
||||
expect(shellFlags).toBeGreaterThanOrEqual(spawnOpeners);
|
||||
});
|
||||
}
|
||||
|
||||
test("orchestrator brain-sync spawns carry the Windows shell flag", () => {
|
||||
const src = read("bin/gstack-gbrain-sync.ts");
|
||||
const brainSyncSpawns = src.match(/spawnSync\(brainSyncPath,/g)?.length ?? 0;
|
||||
expect(brainSyncSpawns).toBe(2);
|
||||
// Both spawnSync(brainSyncPath, ...) blocks must include the shell flag.
|
||||
const withShell = src.match(/spawnSync\(brainSyncPath,[\s\S]*?shell:\s*NEEDS_SHELL_ON_WINDOWS/g)?.length ?? 0;
|
||||
expect(withShell).toBe(2);
|
||||
});
|
||||
});
|
||||
|
|
@ -8,6 +8,24 @@ import * as os from 'os';
|
|||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const MAX_SKILL_DESCRIPTION_LENGTH = 1024;
|
||||
|
||||
// Carved-skill aware (v2 plan T9): ship is now a skeleton SKILL.md + sections/*.md.
|
||||
// Read the union so assertions about content that MOVED into a section still pass.
|
||||
// The skeleton is a subset of the union, so skeleton-only assertions also hold,
|
||||
// and negative assertions stay safe (the absent phrases live in neither file).
|
||||
function readSkillUnion(skill: string): string {
|
||||
let t = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
const secDir = path.join(ROOT, skill, 'sections');
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf-8');
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
function readShipUnion(): string {
|
||||
return readSkillUnion('ship');
|
||||
}
|
||||
|
||||
function extractDescription(content: string): string {
|
||||
const fmEnd = content.indexOf('\n---', 4);
|
||||
expect(fmEnd).toBeGreaterThan(0);
|
||||
|
|
@ -155,12 +173,39 @@ describe('gen-skill-docs', () => {
|
|||
}
|
||||
});
|
||||
|
||||
test('every generated SKILL.md has valid YAML frontmatter', () => {
|
||||
// #1778: strict YAML parsers (Codex/OpenAI skill loading) reject frontmatter
|
||||
// whose plain `description:` scalar contains an interior ": " (read as a nested
|
||||
// mapping). Parse EVERY generated frontmatter block with a strict YAML parser,
|
||||
// not just string-check that name:/description: exist.
|
||||
function frontmatterBlock(content: string): string {
|
||||
expect(content.startsWith('---\n')).toBe(true);
|
||||
const end = content.indexOf('\n---', 4);
|
||||
expect(end).toBeGreaterThan(0);
|
||||
return content.slice(4, end);
|
||||
}
|
||||
|
||||
test('every generated SKILL.md frontmatter parses as strict YAML', () => {
|
||||
for (const skill of CLAUDE_GENERATED_SKILLS) {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
||||
expect(content.startsWith('---\n')).toBe(true);
|
||||
expect(content).toContain('name:');
|
||||
expect(content).toContain('description:');
|
||||
const fm = frontmatterBlock(content);
|
||||
let parsed: any;
|
||||
expect(() => { parsed = Bun.YAML.parse(fm); },
|
||||
`frontmatter for ${skill.dir} must be valid YAML`).not.toThrow();
|
||||
expect(typeof parsed?.name).toBe('string');
|
||||
expect(typeof parsed?.description).toBe('string');
|
||||
}
|
||||
});
|
||||
|
||||
test('every generated Codex (.agents/skills) frontmatter parses as strict YAML', () => {
|
||||
const agentsDir = path.join(ROOT, '.agents', 'skills');
|
||||
if (!fs.existsSync(agentsDir)) return; // skip if external hosts not generated
|
||||
for (const entry of fs.readdirSync(agentsDir, { withFileTypes: true })) {
|
||||
if (!entry.isDirectory()) continue;
|
||||
const mdPath = path.join(agentsDir, entry.name, 'SKILL.md');
|
||||
if (!fs.existsSync(mdPath)) continue;
|
||||
const fm = frontmatterBlock(fs.readFileSync(mdPath, 'utf-8'));
|
||||
expect(() => Bun.YAML.parse(fm),
|
||||
`Codex frontmatter for ${entry.name} must be valid YAML`).not.toThrow();
|
||||
}
|
||||
});
|
||||
|
||||
|
|
@ -485,7 +530,7 @@ describe('gen-skill-docs', () => {
|
|||
|
||||
describe('BASE_BRANCH_DETECT resolver', () => {
|
||||
// Find a generated SKILL.md that uses the placeholder (ship is guaranteed to)
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
test('resolver output contains PR base detection command', () => {
|
||||
expect(shipContent).toContain('gh pr view --json baseRefName');
|
||||
|
|
@ -518,7 +563,7 @@ describe('BASE_BRANCH_DETECT resolver', () => {
|
|||
|
||||
describe('GitLab support in generated skills', () => {
|
||||
const retroContent = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||||
const shipSkillContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkillContent = readShipUnion();
|
||||
|
||||
test('retro contains GitLab MR number extraction', () => {
|
||||
expect(retroContent).toContain('[#!]');
|
||||
|
|
@ -634,13 +679,13 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
|||
}
|
||||
|
||||
test('review dashboard appears in ship generated file', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('reviews.jsonl');
|
||||
expect(content).toContain('REVIEW READINESS DASHBOARD');
|
||||
});
|
||||
|
||||
test('dashboard treats review as a valid Eng Review source', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('plan-eng-review, review, plan-design-review');
|
||||
expect(content).toContain('`review` (diff-scoped pre-landing review)');
|
||||
expect(content).toContain('`plan-eng-review` (plan-stage architecture review)');
|
||||
|
|
@ -708,7 +753,7 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
|||
});
|
||||
|
||||
test('ship does NOT contain review chaining', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).not.toContain('Review Chaining');
|
||||
});
|
||||
});
|
||||
|
|
@ -717,7 +762,7 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
|||
|
||||
describe('TEST_COVERAGE_AUDIT placeholders', () => {
|
||||
const planSkill = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('plan and ship modes share codepath tracing methodology', () => {
|
||||
|
|
@ -874,7 +919,7 @@ describe('TEST_COVERAGE_AUDIT placeholders', () => {
|
|||
// --- {{TEST_FAILURE_TRIAGE}} resolver tests ---
|
||||
|
||||
describe('TEST_FAILURE_TRIAGE resolver', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
|
||||
test('contains all 4 triage steps', () => {
|
||||
expect(shipSkill).toContain('Step T1: Classify each failure');
|
||||
|
|
@ -938,7 +983,7 @@ describe('PLAN_FILE_REVIEW_REPORT resolver', () => {
|
|||
// --- {{PLAN_COMPLETION_AUDIT}} resolver tests ---
|
||||
|
||||
describe('PLAN_COMPLETION_AUDIT placeholders', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('ship SKILL.md contains plan completion audit step', () => {
|
||||
|
|
@ -989,7 +1034,7 @@ describe('PLAN_COMPLETION_AUDIT placeholders', () => {
|
|||
// --- {{PLAN_VERIFICATION_EXEC}} resolver tests ---
|
||||
|
||||
describe('PLAN_VERIFICATION_EXEC placeholder', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
|
||||
test('ship SKILL.md contains plan verification step', () => {
|
||||
expect(shipSkill).toContain('Step 8.1');
|
||||
|
|
@ -1018,7 +1063,7 @@ describe('PLAN_VERIFICATION_EXEC placeholder', () => {
|
|||
// --- Coverage gate tests ---
|
||||
|
||||
describe('Coverage gate in ship', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('ship SKILL.md contains coverage gate with thresholds', () => {
|
||||
|
|
@ -1047,7 +1092,7 @@ describe('Coverage gate in ship', () => {
|
|||
// --- Ship metrics logging ---
|
||||
|
||||
describe('Ship metrics logging', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
|
||||
test('ship SKILL.md contains metrics persistence step', () => {
|
||||
expect(shipSkill).toContain('Step 20');
|
||||
|
|
@ -1063,7 +1108,7 @@ describe('Ship metrics logging', () => {
|
|||
describe('Plan file discovery shared helper', () => {
|
||||
// The shared helper should appear in ship (via PLAN_COMPLETION_AUDIT_SHIP)
|
||||
// and in review (via PLAN_COMPLETION_AUDIT_REVIEW)
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('plan file discovery appears in both ship and review', () => {
|
||||
|
|
@ -1276,7 +1321,8 @@ describe('Codex filesystem boundary', () => {
|
|||
|
||||
test('boundary instruction appears in all skills that call codex', () => {
|
||||
for (const skill of CODEX_CALLING_SKILLS) {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
// Union: ship's codex call lives in sections/adversarial.md after the carve.
|
||||
const content = readSkillUnion(skill);
|
||||
expect(content).toContain(BOUNDARY_MARKER);
|
||||
}
|
||||
});
|
||||
|
|
@ -1462,7 +1508,7 @@ describe('/land skill composition', () => {
|
|||
// --- {{CHANGELOG_WORKFLOW}} resolver tests ---
|
||||
|
||||
describe('CHANGELOG_WORKFLOW resolver', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
test('ship SKILL.md contains changelog workflow', () => {
|
||||
expect(shipContent).toContain('CHANGELOG (auto-generate)');
|
||||
|
|
@ -1479,10 +1525,13 @@ describe('CHANGELOG_WORKFLOW resolver', () => {
|
|||
});
|
||||
|
||||
test('template uses {{CHANGELOG_WORKFLOW}} placeholder', () => {
|
||||
const tmpl = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md.tmpl'), 'utf-8');
|
||||
expect(tmpl).toContain('{{CHANGELOG_WORKFLOW}}');
|
||||
// Should NOT contain the old inline changelog content
|
||||
expect(tmpl).not.toContain('Group commits by theme');
|
||||
// Post-carve (T9): the skeleton points to the changelog section, which carries
|
||||
// the resolver. Neither should inline the old changelog content.
|
||||
const skel = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md.tmpl'), 'utf-8');
|
||||
const changelogSection = fs.readFileSync(path.join(ROOT, 'ship', 'sections', 'changelog.md.tmpl'), 'utf-8');
|
||||
expect(skel).toContain('{{SECTION:changelog}}');
|
||||
expect(changelogSection).toContain('{{CHANGELOG_WORKFLOW}}');
|
||||
expect(skel + changelogSection).not.toContain('Group commits by theme');
|
||||
});
|
||||
|
||||
test('changelog workflow includes keep-changelog format', () => {
|
||||
|
|
@ -1519,7 +1568,7 @@ describe('parameterized resolver support', () => {
|
|||
// --- Preamble routing injection tests ---
|
||||
|
||||
describe('preamble routing injection', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
test('preamble bash checks for routing section in CLAUDE.md', () => {
|
||||
expect(shipContent).toContain('grep -q "## Skill routing" CLAUDE.md');
|
||||
|
|
@ -1663,7 +1712,7 @@ describe('DESIGN_SKETCH extended with outside voices', () => {
|
|||
// --- Extended DESIGN_REVIEW_LITE resolver tests ---
|
||||
|
||||
describe('DESIGN_REVIEW_LITE extended with Codex', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
|
||||
test('contains Codex design voice block', () => {
|
||||
expect(content).toContain('Codex design voice');
|
||||
|
|
@ -1966,7 +2015,7 @@ describe('Codex generation (--host codex)', () => {
|
|||
});
|
||||
|
||||
test('Claude output unchanged: ship skill still uses .claude/skills/ paths', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('~/.claude/skills/gstack');
|
||||
expect(content).not.toContain('.agents/skills');
|
||||
expect(content).not.toContain('~/.codex/');
|
||||
|
|
@ -2655,7 +2704,7 @@ describe('community fixes wave', () => {
|
|||
|
||||
// #573 — Feature signals: ship/SKILL.md contains feature signal detection
|
||||
test('ship/SKILL.md contains feature signal detection in Step 4', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content.toLowerCase()).toContain('feature signal');
|
||||
});
|
||||
|
||||
|
|
@ -2805,7 +2854,8 @@ describe('codex commands must not use inline $(git rev-parse --show-toplevel) fo
|
|||
];
|
||||
|
||||
for (const rel of checkedFiles) {
|
||||
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
// ship's codex/adversarial command moved into sections/adversarial.md (T9 carve).
|
||||
const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
|
||||
expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
|
||||
}
|
||||
|
|
@ -2819,7 +2869,7 @@ describe('LEARNINGS_SEARCH resolver', () => {
|
|||
|
||||
for (const skill of SEARCH_SKILLS) {
|
||||
test(`${skill} generated SKILL.md contains learnings search`, () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
const content = readSkillUnion(skill); // ship: moved to sections/plan-completion.md
|
||||
expect(content).toContain('Prior Learnings');
|
||||
expect(content).toContain('gstack-learnings-search');
|
||||
});
|
||||
|
|
@ -2880,7 +2930,7 @@ describe('CONFIDENCE_CALIBRATION resolver', () => {
|
|||
|
||||
for (const skill of CONFIDENCE_SKILLS) {
|
||||
test(`${skill} generated SKILL.md contains confidence calibration`, () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
const content = readSkillUnion(skill); // ship: moved to sections/review-army.md
|
||||
expect(content).toContain('Confidence Calibration');
|
||||
expect(content).toContain('confidence score');
|
||||
});
|
||||
|
|
|
|||
|
|
@ -0,0 +1,133 @@
|
|||
/**
|
||||
* Tests for the gstack-version-bump CLI (v2 plan T9 hybrid extraction). Covers
|
||||
* the idempotency classifier (pure) + the write/repair mutations (temp fs).
|
||||
* The classifier is the one that prevents re-bumping an already-shipped branch —
|
||||
* the worst /ship footgun — so it gets exhaustive state coverage.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import { execFileSync } from 'child_process';
|
||||
import { classifyState, VERSION_RE } from '../bin/gstack-version-bump';
|
||||
|
||||
const BIN = path.join(import.meta.dir, '..', 'bin', 'gstack-version-bump');
|
||||
|
||||
describe('classifyState (idempotency)', () => {
|
||||
test('FRESH when VERSION matches base and pkg agrees', () => {
|
||||
expect(classifyState('1.1.0.0', '1.1.0.0', true, '1.1.0.0')).toBe('FRESH');
|
||||
});
|
||||
test('FRESH when VERSION matches base and no package.json', () => {
|
||||
expect(classifyState('1.1.0.0', '1.1.0.0', false, '')).toBe('FRESH');
|
||||
});
|
||||
test('ALREADY_BUMPED when VERSION moved past base and pkg agrees (re-run)', () => {
|
||||
expect(classifyState('1.2.0.0', '1.1.0.0', true, '1.2.0.0')).toBe('ALREADY_BUMPED');
|
||||
});
|
||||
test('ALREADY_BUMPED when VERSION moved past base, no package.json', () => {
|
||||
expect(classifyState('1.2.0.0', '1.1.0.0', false, '')).toBe('ALREADY_BUMPED');
|
||||
});
|
||||
test('DRIFT_STALE_PKG when VERSION bumped but pkg lagging', () => {
|
||||
expect(classifyState('1.2.0.0', '1.1.0.0', true, '1.1.0.0')).toBe('DRIFT_STALE_PKG');
|
||||
});
|
||||
test('DRIFT_UNEXPECTED when VERSION matches base but pkg diverges (manual edit)', () => {
|
||||
expect(classifyState('1.1.0.0', '1.1.0.0', true, '1.2.0.0')).toBe('DRIFT_UNEXPECTED');
|
||||
});
|
||||
});
|
||||
|
||||
describe('VERSION_RE', () => {
|
||||
test('accepts 4-digit semver', () => {
|
||||
expect(VERSION_RE.test('1.2.3.4')).toBe(true);
|
||||
});
|
||||
test('rejects 3-digit and garbage', () => {
|
||||
expect(VERSION_RE.test('1.2.3')).toBe(false);
|
||||
expect(VERSION_RE.test('v1.2.3.4')).toBe(false);
|
||||
expect(VERSION_RE.test('1.2.3.4-rc')).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('write (FRESH bump)', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-write-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
test('writes VERSION + package.json.version, preserving other pkg fields', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '1.0.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.0.0.0', scripts: { t: 'y' } }, null, 2) + '\n');
|
||||
const out = execFileSync('bun', [BIN, 'write', '--version', '1.1.0.0'], { cwd: dir }).toString();
|
||||
expect(JSON.parse(out)).toEqual({ wrote: '1.1.0.0', packageJson: true });
|
||||
expect(fs.readFileSync(path.join(dir, 'VERSION'), 'utf-8').trim()).toBe('1.1.0.0');
|
||||
const pkg = JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf-8'));
|
||||
expect(pkg.version).toBe('1.1.0.0');
|
||||
expect(pkg.scripts).toEqual({ t: 'y' }); // untouched
|
||||
});
|
||||
|
||||
test('rejects a malformed version with exit 2', () => {
|
||||
let code = 0;
|
||||
try { execFileSync('bun', [BIN, 'write', '--version', '1.2.3'], { cwd: dir, stdio: 'pipe' }); }
|
||||
catch (e: any) { code = e.status; }
|
||||
expect(code).toBe(2);
|
||||
});
|
||||
|
||||
test('VERSION-only repo (no package.json) writes just VERSION', () => {
|
||||
const d2 = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-noPkg-'));
|
||||
fs.writeFileSync(path.join(d2, 'VERSION'), '0.1.0.0\n');
|
||||
const out = execFileSync('bun', [BIN, 'write', '--version', '0.2.0.0'], { cwd: d2 }).toString();
|
||||
expect(JSON.parse(out)).toEqual({ wrote: '0.2.0.0', packageJson: false });
|
||||
expect(fs.readFileSync(path.join(d2, 'VERSION'), 'utf-8').trim()).toBe('0.2.0.0');
|
||||
fs.rmSync(d2, { recursive: true, force: true });
|
||||
});
|
||||
});
|
||||
|
||||
describe('repair (DRIFT_STALE_PKG)', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-repair-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
test('syncs package.json.version up to VERSION, no re-bump', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '2.0.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.9.0.0' }, null, 2) + '\n');
|
||||
const out = execFileSync('bun', [BIN, 'repair'], { cwd: dir }).toString();
|
||||
expect(JSON.parse(out)).toEqual({ repaired: '2.0.0.0' });
|
||||
expect(JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf-8')).version).toBe('2.0.0.0');
|
||||
expect(fs.readFileSync(path.join(dir, 'VERSION'), 'utf-8').trim()).toBe('2.0.0.0'); // unchanged
|
||||
});
|
||||
|
||||
test('refuses to propagate an invalid VERSION (exit 2)', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), 'not-a-version\n');
|
||||
let code = 0;
|
||||
try { execFileSync('bun', [BIN, 'repair'], { cwd: dir, stdio: 'pipe' }); }
|
||||
catch (e: any) { code = e.status; }
|
||||
expect(code).toBe(2);
|
||||
});
|
||||
});
|
||||
|
||||
describe('classify (idempotency over a real git base)', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-classify-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
// Build a tiny repo with an "origin/main" carrying VERSION=1.0.0.0.
|
||||
const git = (...a: string[]) => execFileSync('git', a, { cwd: dir, stdio: 'pipe' });
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '1.0.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.0.0.0' }, null, 2) + '\n');
|
||||
git('init', '-q', '-b', 'main');
|
||||
git('config', 'user.email', 't@t'); git('config', 'user.name', 't');
|
||||
git('add', '-A'); git('commit', '-q', '-m', 'base');
|
||||
// Fake an "origin/main" remote-tracking ref pointing at this commit.
|
||||
const head = execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir }).toString().trim();
|
||||
fs.mkdirSync(path.join(dir, '.git', 'refs', 'remotes', 'origin'), { recursive: true });
|
||||
fs.writeFileSync(path.join(dir, '.git', 'refs', 'remotes', 'origin', 'main'), head + '\n');
|
||||
|
||||
test('reports FRESH before any bump', () => {
|
||||
const out = execFileSync('bun', [BIN, 'classify', '--base', 'main'], { cwd: dir }).toString();
|
||||
expect(JSON.parse(out).state).toBe('FRESH');
|
||||
});
|
||||
|
||||
test('reports ALREADY_BUMPED after VERSION+pkg move together', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '1.1.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.1.0.0' }, null, 2) + '\n');
|
||||
const out = execFileSync('bun', [BIN, 'classify', '--base', 'main'], { cwd: dir }).toString();
|
||||
const parsed = JSON.parse(out);
|
||||
expect(parsed.state).toBe('ALREADY_BUMPED');
|
||||
expect(parsed.baseVersion).toBe('1.0.0.0');
|
||||
expect(parsed.currentVersion).toBe('1.1.0.0');
|
||||
});
|
||||
});
|
||||
|
|
@ -33,6 +33,22 @@ export interface ParityInvariant {
|
|||
maxSizeRatio?: number;
|
||||
/** Minimum byte size (catches over-stripping cliffs). */
|
||||
minBytes?: number;
|
||||
/**
|
||||
* Carved skill (v2 plan T9): the skill is a skeleton SKILL.md plus on-demand
|
||||
* sections/*.md. When true:
|
||||
* - mustContain / mustHaveHeadings run against skeleton + ALL sections unioned,
|
||||
* so a phrase that moved into a section still counts (content preserved, just
|
||||
* relocated — that's the whole point of the carve).
|
||||
* - minBytes / maxSizeRatio run against the UNION bytes, not the skeleton alone
|
||||
* (total behavior must not shrink; the win is what's no longer always-loaded,
|
||||
* which the union size deliberately does NOT measure — maxSkeletonBytes does).
|
||||
* - maxSkeletonBytes asserts the always-loaded skeleton actually shrank.
|
||||
* Without this, lowering minBytes to fit a 65KB skeleton would make the size
|
||||
* floor toothless (Codex outside-voice #12).
|
||||
*/
|
||||
sectioned?: boolean;
|
||||
/** Max bytes for the always-loaded skeleton SKILL.md (carved skills only). */
|
||||
maxSkeletonBytes?: number;
|
||||
}
|
||||
|
||||
export interface ParityCheckResult {
|
||||
|
|
@ -41,6 +57,35 @@ export interface ParityCheckResult {
|
|||
failures: string[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Read a skill's check text + sizes. For a carved skill, union the skeleton with
|
||||
* every sections/*.md so relocated content still counts and the union size
|
||||
* measures total preserved behavior; skeletonBytes is reported separately so the
|
||||
* always-loaded shrink can be asserted. For a monolith, text == skeleton.
|
||||
*/
|
||||
export function readSkillForParity(
|
||||
repoRoot: string,
|
||||
skill: string,
|
||||
sectioned: boolean,
|
||||
): { text: string; unionBytes: number; skeletonBytes: number } {
|
||||
const skeleton = fs.readFileSync(path.join(repoRoot, skill, 'SKILL.md'), 'utf-8');
|
||||
const skeletonBytes = Buffer.byteLength(skeleton, 'utf-8');
|
||||
if (!sectioned) return { text: skeleton, unionBytes: skeletonBytes, skeletonBytes };
|
||||
|
||||
let text = skeleton;
|
||||
let unionBytes = skeletonBytes;
|
||||
const sectionsDir = path.join(repoRoot, skill, 'sections');
|
||||
if (fs.existsSync(sectionsDir)) {
|
||||
for (const f of fs.readdirSync(sectionsDir).sort()) {
|
||||
if (!f.endsWith('.md')) continue;
|
||||
const sec = fs.readFileSync(path.join(sectionsDir, f), 'utf-8');
|
||||
text += '\n' + sec;
|
||||
unionBytes += Buffer.byteLength(sec, 'utf-8');
|
||||
}
|
||||
}
|
||||
return { text, unionBytes, skeletonBytes };
|
||||
}
|
||||
|
||||
export function checkSkillParity(
|
||||
invariant: ParityInvariant,
|
||||
current: SkillBaselineEntry,
|
||||
|
|
@ -48,38 +93,54 @@ export function checkSkillParity(
|
|||
repoRoot: string,
|
||||
): ParityCheckResult {
|
||||
const failures: string[] = [];
|
||||
const needText = !!(invariant.mustContain?.length || invariant.mustHaveHeadings?.length);
|
||||
|
||||
// SIZE checks
|
||||
// Resolve the text + size to check against. Carved skills union skeleton +
|
||||
// sections; monoliths use the skeleton alone. Read on demand so size-only
|
||||
// invariants don't pay for a file read they don't need (monolith path).
|
||||
let checkText: string | null = null;
|
||||
let checkBytes = current.skillMdBytes;
|
||||
if (invariant.sectioned) {
|
||||
try {
|
||||
const r = readSkillForParity(repoRoot, invariant.skill, true);
|
||||
checkText = r.text;
|
||||
checkBytes = r.unionBytes;
|
||||
if (invariant.maxSkeletonBytes !== undefined && r.skeletonBytes > invariant.maxSkeletonBytes) {
|
||||
failures.push(`skeleton ${r.skeletonBytes} > maxSkeletonBytes ${invariant.maxSkeletonBytes}`);
|
||||
}
|
||||
} catch (err) {
|
||||
failures.push(`cannot read carved skill ${invariant.skill}: ${(err as Error).message}`);
|
||||
}
|
||||
} else if (needText) {
|
||||
try {
|
||||
checkText = fs.readFileSync(path.join(repoRoot, invariant.skill, 'SKILL.md'), 'utf-8');
|
||||
} catch (err) {
|
||||
failures.push(`cannot read ${path.join(repoRoot, invariant.skill, 'SKILL.md')}: ${(err as Error).message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// SIZE checks (union bytes for carved skills, skeleton bytes for monoliths)
|
||||
if (invariant.maxSizeRatio !== undefined && baseline) {
|
||||
const ratio = current.skillMdBytes / baseline.skillMdBytes;
|
||||
const ratio = checkBytes / baseline.skillMdBytes;
|
||||
if (ratio > invariant.maxSizeRatio) {
|
||||
failures.push(`size ratio ${ratio.toFixed(3)} > maxSizeRatio ${invariant.maxSizeRatio}`);
|
||||
}
|
||||
}
|
||||
if (invariant.minBytes !== undefined && current.skillMdBytes < invariant.minBytes) {
|
||||
failures.push(`size ${current.skillMdBytes} < minBytes ${invariant.minBytes}`);
|
||||
if (invariant.minBytes !== undefined && checkBytes < invariant.minBytes) {
|
||||
failures.push(`size ${checkBytes} < minBytes ${invariant.minBytes}`);
|
||||
}
|
||||
|
||||
// CONTENT checks (read live file for fresh content)
|
||||
if (invariant.mustContain?.length || invariant.mustHaveHeadings?.length) {
|
||||
const skillMdPath = path.join(repoRoot, invariant.skill, 'SKILL.md');
|
||||
let content: string | null = null;
|
||||
try {
|
||||
content = fs.readFileSync(skillMdPath, 'utf-8');
|
||||
} catch (err) {
|
||||
failures.push(`cannot read ${skillMdPath}: ${(err as Error).message}`);
|
||||
}
|
||||
if (content) {
|
||||
const lower = content.toLowerCase();
|
||||
for (const phrase of invariant.mustContain ?? []) {
|
||||
if (!lower.includes(phrase.toLowerCase())) {
|
||||
failures.push(`missing required phrase: "${phrase}"`);
|
||||
}
|
||||
// CONTENT checks
|
||||
if (needText && checkText !== null) {
|
||||
const lower = checkText.toLowerCase();
|
||||
for (const phrase of invariant.mustContain ?? []) {
|
||||
if (!lower.includes(phrase.toLowerCase())) {
|
||||
failures.push(`missing required phrase: "${phrase}"`);
|
||||
}
|
||||
for (const heading of invariant.mustHaveHeadings ?? []) {
|
||||
if (!content.includes(heading)) {
|
||||
failures.push(`missing required heading: "${heading}"`);
|
||||
}
|
||||
}
|
||||
for (const heading of invariant.mustHaveHeadings ?? []) {
|
||||
if (!checkText.includes(heading)) {
|
||||
failures.push(`missing required heading: "${heading}"`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -146,7 +207,13 @@ export const PARITY_INVARIANTS: ParityInvariant[] = [
|
|||
minBytes: 30_000,
|
||||
},
|
||||
{
|
||||
// Carved (v2 plan T9): skeleton SKILL.md + sections/*.md. Content checks run
|
||||
// against the union (relocated phrases still count); size floors run against
|
||||
// the union (total behavior preserved); maxSkeletonBytes asserts the
|
||||
// always-loaded skeleton actually shrank from the ~167KB monolith.
|
||||
skill: 'ship',
|
||||
sectioned: true,
|
||||
maxSkeletonBytes: 90_000,
|
||||
mustContain: [
|
||||
'VERSION',
|
||||
'CHANGELOG',
|
||||
|
|
@ -156,7 +223,7 @@ export const PARITY_INVARIANTS: ParityInvariant[] = [
|
|||
],
|
||||
mustHaveHeadings: ['## Preamble', '## When to invoke'],
|
||||
maxSizeRatio: 1.05,
|
||||
minBytes: 80_000,
|
||||
minBytes: 120_000,
|
||||
},
|
||||
{
|
||||
skill: 'plan-ceo-review',
|
||||
|
|
|
|||
|
|
@ -0,0 +1,40 @@
|
|||
/**
|
||||
* requiredReads enforcement (v2 plan T9, mitigation layer 5 — the only CI-failing
|
||||
* layer against silent section-skip).
|
||||
*
|
||||
* Given a /ship run's tool calls and the set of section files the run's SITUATION
|
||||
* required, assert the agent actually Read each one. The required set comes from
|
||||
* the TEST FIXTURE (which situation it set up), NOT from the manifest — the
|
||||
* manifest is passive (CM2). This keeps "when is a section required" in exactly
|
||||
* one machine-checkable place: the eval fixtures.
|
||||
*
|
||||
* Builds on extractSectionReads from transcript-section-logger so section-path
|
||||
* matching (the `/sections/<file>.md` segment, host-layout agnostic) lives in one
|
||||
* place.
|
||||
*/
|
||||
|
||||
import { extractSectionReads, type TranscriptResultLike } from './transcript-section-logger';
|
||||
|
||||
export interface RequiredReadsResult {
|
||||
required: string[];
|
||||
read: string[];
|
||||
missing: string[];
|
||||
ok: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param result the skill run (anything with toolCalls)
|
||||
* @param requiredFiles section basenames the situation required, e.g.
|
||||
* ['version-bump.md','changelog.md'] (or with a sections/
|
||||
* prefix — normalized to basename here)
|
||||
*/
|
||||
export function assertRequiredReads(
|
||||
result: TranscriptResultLike,
|
||||
requiredFiles: string[],
|
||||
): RequiredReadsResult {
|
||||
const read = extractSectionReads(result);
|
||||
const readSet = new Set(read);
|
||||
const required = requiredFiles.map(f => f.replace(/^.*\//, '')); // tolerate sections/<f>
|
||||
const missing = required.filter(f => !readSet.has(f));
|
||||
return { required, read, missing, ok: missing.length === 0 };
|
||||
}
|
||||
|
|
@ -120,7 +120,8 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
|||
'plan-ceo-mode-routing': ['plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'plan-design-with-ui-scope': ['plan-design-review/**', 'test/fixtures/plans/ui-heavy-feature.md', 'test/helpers/claude-pty-runner.ts'],
|
||||
'budget-regression-pty': ['test/helpers/eval-store.ts', 'test/skill-budget-regression.test.ts'],
|
||||
'ship-idempotency-pty': ['ship/**', 'bin/gstack-next-version', 'lib/worktree.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'ship-idempotency-pty': ['ship/**', 'bin/gstack-next-version', 'bin/gstack-version-bump', 'scripts/resolvers/sections.ts', 'lib/worktree.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'ship-section-loading': ['ship/**', 'scripts/resolvers/sections.ts', 'scripts/gen-skill-docs.ts', 'test/helpers/required-reads.ts', 'test/helpers/transcript-section-logger.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'autoplan-chain-pty': ['autoplan/**', 'plan-ceo-review/**', 'plan-design-review/**', 'plan-eng-review/**', 'plan-devex-review/**', 'test/fixtures/plans/ui-heavy-feature.md', 'test/helpers/claude-pty-runner.ts'],
|
||||
'e2e-harness-audit': ['plan-ceo-review/**', 'plan-eng-review/**', 'plan-design-review/**', 'plan-devex-review/**', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/agent-sdk-runner.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
|
||||
|
|
@ -511,6 +512,7 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
|
|||
'plan-design-with-ui-scope': 'gate', // ~$0.80/run
|
||||
'budget-regression-pty': 'gate', // free, library-only assertion
|
||||
'ship-idempotency-pty': 'periodic', // ~$3/run, real /ship in plan mode
|
||||
'ship-section-loading': 'periodic', // ~$3/run, real /ship; asserts section reads
|
||||
'autoplan-chain-pty': 'periodic', // ~$8/run, all 3 phases sequential
|
||||
|
||||
// Per-finding count + review-report-at-bottom — periodic because each
|
||||
|
|
|
|||
|
|
@ -0,0 +1,196 @@
|
|||
/**
|
||||
* Transcript section logger (v2 plan T10).
|
||||
*
|
||||
* Two jobs, both pure analysis over a SkillTestResult / NDJSON transcript:
|
||||
*
|
||||
* 1. extractSectionReads() — which `sections/*.md` files a run actually Read.
|
||||
* Used by the sectioned world (post-carve) to verify the agent opened the
|
||||
* chapters its situation required.
|
||||
*
|
||||
* 2. extractShipActions() — an observable ACTION fingerprint of a /ship run
|
||||
* (ran tests, bumped VERSION, wrote CHANGELOG, created PR, ...). This works
|
||||
* on BOTH the monolith and the sectioned skill, which is the whole point:
|
||||
* capture a baseline on the current monolith ship FIRST, then assert the
|
||||
* sectioned ship still performs the same actions. A section-read check alone
|
||||
* can't catch "agent read the chapter but skipped the step"; the action
|
||||
* fingerprint can.
|
||||
*
|
||||
* Why baseline-first (Codex outside-voice critique on the T9 plan): a logger
|
||||
* shipped in the same PR as the carve is post-failure telemetry unless it has a
|
||||
* pre-carve reference. captureShipBaseline() records the monolith's action
|
||||
* fingerprint so compareShipActions() can flag a regression introduced by the
|
||||
* carve.
|
||||
*
|
||||
* Pure functions, no I/O except the explicit read/write baseline helpers. The
|
||||
* unit tests drive these with synthetic transcripts — no paid run needed to
|
||||
* validate the logic.
|
||||
*/
|
||||
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
/** Minimal shape we need from SkillTestResult — kept structural so callers can
|
||||
* pass a full SkillTestResult or a hand-built fixture in unit tests. */
|
||||
export interface ToolCallLike {
|
||||
tool: string;
|
||||
input: unknown;
|
||||
output?: string;
|
||||
}
|
||||
export interface TranscriptResultLike {
|
||||
toolCalls: ToolCallLike[];
|
||||
output?: string;
|
||||
}
|
||||
|
||||
/** Pull the file_path off a tool-call input, tolerating unknown shapes. */
|
||||
function readFilePath(input: unknown): string | null {
|
||||
if (input && typeof input === 'object') {
|
||||
const fp = (input as Record<string, unknown>).file_path;
|
||||
if (typeof fp === 'string') return fp;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/** Pull the command string off a Bash tool-call input. */
|
||||
function bashCommand(input: unknown): string | null {
|
||||
if (input && typeof input === 'object') {
|
||||
const cmd = (input as Record<string, unknown>).command;
|
||||
if (typeof cmd === 'string') return cmd;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Every `sections/<name>.md` file the run Read, normalized to the section
|
||||
* basename (e.g. "version-bump.md"). Deduped, in first-Read order. Matching is
|
||||
* on the path segment `/sections/<file>.md` so it works regardless of whether
|
||||
* the host resolved a relative, absolute, or prefixed install path.
|
||||
*/
|
||||
export function extractSectionReads(result: TranscriptResultLike): string[] {
|
||||
const seen = new Set<string>();
|
||||
const ordered: string[] = [];
|
||||
for (const call of result.toolCalls) {
|
||||
if (call.tool !== 'Read') continue;
|
||||
const fp = readFilePath(call.input);
|
||||
if (!fp) continue;
|
||||
const m = fp.match(/(?:^|\/)sections\/([A-Za-z0-9._-]+\.md)$/);
|
||||
if (!m) continue;
|
||||
const name = m[1];
|
||||
if (!seen.has(name)) {
|
||||
seen.add(name);
|
||||
ordered.push(name);
|
||||
}
|
||||
}
|
||||
return ordered;
|
||||
}
|
||||
|
||||
/**
|
||||
* The canonical /ship action vocabulary. Each action is detected from the Bash
|
||||
* commands the agent ran (plus a couple of Write/Edit signals). Order is the
|
||||
* rough ship sequence; detection is order-independent.
|
||||
*
|
||||
* Keep this list aligned with the ship skeleton's numbered steps. The
|
||||
* section-loading eval asserts the sectioned ship still triggers the same
|
||||
* actions a monolith run did for the same fixture situation.
|
||||
*/
|
||||
export const SHIP_ACTIONS = [
|
||||
'merged_base', // git merge <base>
|
||||
'ran_tests', // bun test / npm test / the project test cmd
|
||||
'bumped_version', // wrote VERSION / package.json version / ran gstack-version-bump
|
||||
'wrote_changelog', // edited CHANGELOG.md
|
||||
'committed', // git commit
|
||||
'pushed', // git push
|
||||
'opened_pr', // gh pr create / glab mr create
|
||||
] as const;
|
||||
export type ShipAction = (typeof SHIP_ACTIONS)[number];
|
||||
|
||||
const BASH_ACTION_PATTERNS: Array<{ action: ShipAction; re: RegExp }> = [
|
||||
{ action: 'merged_base', re: /\bgit\s+merge\b/ },
|
||||
{ action: 'ran_tests', re: /\b(bun\s+test|npm\s+(run\s+)?test|yarn\s+test|pytest|go\s+test|cargo\s+test|rspec)\b/ },
|
||||
{ action: 'bumped_version', re: /gstack-version-bump\b|gstack-next-version\b|>\s*VERSION\b|npm\s+version\b/ },
|
||||
{ action: 'wrote_changelog', re: /CHANGELOG\.md/ },
|
||||
{ action: 'committed', re: /\bgit\s+commit\b/ },
|
||||
{ action: 'pushed', re: /\bgit\s+push\b/ },
|
||||
{ action: 'opened_pr', re: /\bgh\s+pr\s+create\b|\bglab\s+mr\s+create\b/ },
|
||||
];
|
||||
|
||||
/**
|
||||
* The observable action fingerprint of a ship run. Works on monolith AND
|
||||
* sectioned skills because it reads what the agent DID (Bash + file writes),
|
||||
* not which prose it loaded.
|
||||
*/
|
||||
export function extractShipActions(result: TranscriptResultLike): ShipAction[] {
|
||||
const found = new Set<ShipAction>();
|
||||
for (const call of result.toolCalls) {
|
||||
if (call.tool === 'Bash') {
|
||||
const cmd = bashCommand(call.input);
|
||||
if (!cmd) continue;
|
||||
for (const { action, re } of BASH_ACTION_PATTERNS) {
|
||||
if (re.test(cmd)) found.add(action);
|
||||
}
|
||||
} else if (call.tool === 'Write' || call.tool === 'Edit') {
|
||||
const fp = readFilePath(call.input);
|
||||
if (fp && /CHANGELOG\.md$/.test(fp)) found.add('wrote_changelog');
|
||||
if (fp && /(?:^|\/)VERSION$/.test(fp)) found.add('bumped_version');
|
||||
}
|
||||
}
|
||||
// Preserve canonical order.
|
||||
return SHIP_ACTIONS.filter(a => found.has(a));
|
||||
}
|
||||
|
||||
export interface ShipBaseline {
|
||||
tag: string;
|
||||
/** Fixture/situation id this baseline was captured for. */
|
||||
situation: string;
|
||||
/** Action fingerprint observed on the monolith ship. */
|
||||
actions: ShipAction[];
|
||||
/** Section reads observed (empty on the monolith — present after carve). */
|
||||
sectionReads: string[];
|
||||
capturedAt: string;
|
||||
}
|
||||
|
||||
const DEFAULT_BASELINE_DIR = path.join(os.homedir(), '.gstack-dev', 'ship-baselines');
|
||||
|
||||
/** Where a baseline for a given situation lives. */
|
||||
export function baselinePath(situation: string, dir = DEFAULT_BASELINE_DIR): string {
|
||||
return path.join(dir, `${situation}.json`);
|
||||
}
|
||||
|
||||
/** Persist a ship baseline (used once on the monolith, before the carve). */
|
||||
export function writeShipBaseline(baseline: ShipBaseline, dir = DEFAULT_BASELINE_DIR): string {
|
||||
fs.mkdirSync(dir, { recursive: true });
|
||||
const p = baselinePath(baseline.situation, dir);
|
||||
fs.writeFileSync(p, JSON.stringify(baseline, null, 2) + '\n');
|
||||
return p;
|
||||
}
|
||||
|
||||
/** Read a previously-captured baseline, or null if none exists yet. */
|
||||
export function readShipBaseline(situation: string, dir = DEFAULT_BASELINE_DIR): ShipBaseline | null {
|
||||
try {
|
||||
return JSON.parse(fs.readFileSync(baselinePath(situation, dir), 'utf-8')) as ShipBaseline;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export interface ShipActionDiff {
|
||||
/** Actions the baseline performed that the current run did NOT (the regression set). */
|
||||
missing: ShipAction[];
|
||||
/** Actions the current run performed that the baseline did not (usually fine). */
|
||||
added: ShipAction[];
|
||||
/** True when no baseline action was dropped. */
|
||||
ok: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compare a current sectioned-ship run against the monolith baseline. A dropped
|
||||
* action (in baseline, not in current) is the carve regression we care about:
|
||||
* the sectioned ship stopped doing something the monolith did.
|
||||
*/
|
||||
export function compareShipActions(baseline: ShipBaseline, current: ShipAction[]): ShipActionDiff {
|
||||
const cur = new Set(current);
|
||||
const base = new Set(baseline.actions);
|
||||
const missing = baseline.actions.filter(a => !cur.has(a));
|
||||
const added = current.filter(a => !base.has(a));
|
||||
return { missing, added, ok: missing.length === 0 };
|
||||
}
|
||||
|
|
@ -0,0 +1,96 @@
|
|||
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
|
||||
import { execFileSync } from 'child_process';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const DRIVER = path.join(ROOT, 'bin', 'gstack-jsonl-merge');
|
||||
|
||||
let tmpDir: string;
|
||||
|
||||
beforeEach(() => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-jsonl-merge-'));
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
/**
|
||||
* Run the merge driver the way git does: `driver <base> <ours> <theirs>`.
|
||||
* The driver writes the merged result back to the <ours> file. Returns that
|
||||
* file's content. `base`/`ours`/`theirs` are arrays of JSONL lines (the file
|
||||
* is created from them); pass `null` to omit a file entirely (git passes an
|
||||
* absent path for an added file, which the driver must tolerate).
|
||||
*/
|
||||
function runMerge(
|
||||
base: string[] | null,
|
||||
ours: string[] | null,
|
||||
theirs: string[] | null,
|
||||
): string {
|
||||
const write = (name: string, lines: string[] | null): string => {
|
||||
const p = path.join(tmpDir, name);
|
||||
if (lines === null) return path.join(tmpDir, `${name}.absent`);
|
||||
fs.writeFileSync(p, lines.length ? lines.join('\n') + '\n' : '');
|
||||
return p;
|
||||
};
|
||||
const basePath = write('base', base);
|
||||
const oursPath = write('ours', ours);
|
||||
const theirsPath = write('theirs', theirs);
|
||||
execFileSync(DRIVER, [basePath, oursPath, theirsPath], {
|
||||
encoding: 'utf-8',
|
||||
timeout: 15000,
|
||||
});
|
||||
return fs.readFileSync(oursPath, 'utf-8');
|
||||
}
|
||||
|
||||
describe('gstack-jsonl-merge', () => {
|
||||
test('equal-ts entries resolve identically regardless of side (convergence)', () => {
|
||||
// Two machines append a different event in the same second, then each
|
||||
// merges the other's push. Machine A sees its own line as "ours"; machine
|
||||
// B sees the same line as "theirs". The merge must produce the same file
|
||||
// on both, or the repos diverge and never reconcile.
|
||||
const a = '{"ts":"2026-05-28T10:00:00Z","event":"a"}';
|
||||
const b = '{"ts":"2026-05-28T10:00:00Z","event":"b"}';
|
||||
|
||||
const machineA = runMerge([], [a], [b]); // a = ours, b = theirs
|
||||
const machineB = runMerge([], [b], [a]); // b = ours, a = theirs
|
||||
|
||||
expect(machineA).toBe(machineB);
|
||||
// Both lines survive.
|
||||
expect(machineA).toContain('"event":"a"');
|
||||
expect(machineA).toContain('"event":"b"');
|
||||
});
|
||||
|
||||
test('non-timestamped lines also resolve identically regardless of side', () => {
|
||||
const a = '{"event":"a"}'; // no ts -> hash-ordered
|
||||
const b = '{"event":"b"}';
|
||||
expect(runMerge([], [a], [b])).toBe(runMerge([], [b], [a]));
|
||||
});
|
||||
|
||||
test('plain (non-JSON) lines resolve identically regardless of side', () => {
|
||||
expect(runMerge([], ['zebra'], ['apple'])).toBe(
|
||||
runMerge([], ['apple'], ['zebra']),
|
||||
);
|
||||
});
|
||||
|
||||
test('exact-duplicate lines are deduped', () => {
|
||||
const line = '{"ts":"2026-05-28T10:00:00Z","event":"a"}';
|
||||
const out = runMerge([line], [line], [line]);
|
||||
expect(out.trimEnd().split('\n')).toEqual([line]);
|
||||
});
|
||||
|
||||
test('timestamped entries sort ascending by ts', () => {
|
||||
const early = '{"ts":"2026-05-28T09:00:00Z","event":"early"}';
|
||||
const late = '{"ts":"2026-05-28T11:00:00Z","event":"late"}';
|
||||
const out = runMerge([], [late], [early]).trimEnd().split('\n');
|
||||
expect(out).toEqual([early, late]);
|
||||
});
|
||||
|
||||
test('absent ours/theirs files are tolerated (added-file merge)', () => {
|
||||
const a = '{"ts":"2026-05-28T10:00:00Z","event":"a"}';
|
||||
const out = runMerge(null, [a], null);
|
||||
expect(out.trimEnd()).toBe(a);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,27 @@
|
|||
import { describe, test, expect } from "bun:test";
|
||||
import { resolveImportTimeoutMs } from "../bin/gstack-memory-ingest";
|
||||
|
||||
// #1611: the gbrain import timeout is configurable via GSTACK_INGEST_TIMEOUT_MS
|
||||
// (default 30 min) so big-brain --full ingests aren't SIGTERM'd mid-import.
|
||||
const DEFAULT = 30 * 60 * 1000;
|
||||
|
||||
describe("resolveImportTimeoutMs", () => {
|
||||
test("unset → 30 min default", () => {
|
||||
expect(resolveImportTimeoutMs(undefined)).toBe(DEFAULT);
|
||||
expect(resolveImportTimeoutMs("")).toBe(DEFAULT);
|
||||
});
|
||||
|
||||
test("valid override is honored", () => {
|
||||
expect(resolveImportTimeoutMs("3600000")).toBe(3_600_000); // 1h
|
||||
expect(resolveImportTimeoutMs("60000")).toBe(60_000); // floor
|
||||
expect(resolveImportTimeoutMs("86400000")).toBe(86_400_000); // ceiling
|
||||
});
|
||||
|
||||
test("invalid / out-of-range → default (no SIGTERM-too-soon footgun)", () => {
|
||||
expect(resolveImportTimeoutMs("nope")).toBe(DEFAULT);
|
||||
expect(resolveImportTimeoutMs("0")).toBe(DEFAULT);
|
||||
expect(resolveImportTimeoutMs("59999")).toBe(DEFAULT); // below 1min floor
|
||||
expect(resolveImportTimeoutMs("86400001")).toBe(DEFAULT); // above 24h ceiling
|
||||
expect(resolveImportTimeoutMs("-5")).toBe(DEFAULT);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,88 @@
|
|||
/**
|
||||
* Unit coverage for the sectioned-parity capability (v2 plan T9, guards the
|
||||
* carve). Proves that a carved skill's relocated content still counts (union of
|
||||
* skeleton + sections), the always-loaded skeleton shrink is asserted
|
||||
* separately (maxSkeletonBytes), and size floors run against the union so they
|
||||
* stay meaningful (Codex outside-voice #12). Synthetic fixture — no ship carve
|
||||
* needed to validate the logic.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import { checkSkillParity, readSkillForParity, type ParityInvariant } from './helpers/parity-harness';
|
||||
import type { SkillBaselineEntry } from './helpers/capture-parity-baseline';
|
||||
|
||||
const root = fs.mkdtempSync(path.join(os.tmpdir(), 'parity-sectioned-'));
|
||||
afterAll(() => { try { fs.rmSync(root, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
// Carved "ship": a small skeleton + two sections holding the relocated prose.
|
||||
fs.mkdirSync(path.join(root, 'ship', 'sections'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'ship', 'SKILL.md'),
|
||||
'## Preamble\nskeleton body, decision tree, VERSION bump step calls the CLI.\n## When to invoke\n');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'changelog.md'), '# Changelog\nWrite the CHANGELOG entry here.\n');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'review-army.md'), '# Review\nDispatch the pre-landing review army.\n');
|
||||
|
||||
// A monolith control skill.
|
||||
fs.mkdirSync(path.join(root, 'mono'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'mono', 'SKILL.md'), '## Preamble\nVERSION CHANGELOG review all inline here.\n');
|
||||
|
||||
const skeletonBytes = Buffer.byteLength(fs.readFileSync(path.join(root, 'ship', 'SKILL.md'), 'utf-8'), 'utf-8');
|
||||
const unionBytes = readSkillForParity(root, 'ship', true).unionBytes;
|
||||
const baseline: SkillBaselineEntry = { skillMdBytes: unionBytes } as SkillBaselineEntry;
|
||||
|
||||
describe('readSkillForParity', () => {
|
||||
test('unions skeleton + sections for carved skills', () => {
|
||||
const r = readSkillForParity(root, 'ship', true);
|
||||
expect(r.text).toContain('CHANGELOG'); // from changelog.md
|
||||
expect(r.text).toContain('review army'); // from review-army.md
|
||||
expect(r.skeletonBytes).toBe(skeletonBytes);
|
||||
expect(r.unionBytes).toBeGreaterThan(r.skeletonBytes);
|
||||
});
|
||||
test('monolith text == skeleton, union == skeleton', () => {
|
||||
const r = readSkillForParity(root, 'mono', false);
|
||||
expect(r.unionBytes).toBe(r.skeletonBytes);
|
||||
});
|
||||
});
|
||||
|
||||
describe('checkSkillParity (sectioned)', () => {
|
||||
test('finds phrases that moved into sections (union content check)', () => {
|
||||
const inv: ParityInvariant = {
|
||||
skill: 'ship', sectioned: true,
|
||||
mustContain: ['VERSION', 'CHANGELOG', 'review army'],
|
||||
mustHaveHeadings: ['## Preamble', '## When to invoke'],
|
||||
};
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(true);
|
||||
});
|
||||
|
||||
test('maxSkeletonBytes catches a skeleton that did not shrink', () => {
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, maxSkeletonBytes: 10 };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(false);
|
||||
expect(res.failures.join()).toContain('maxSkeletonBytes');
|
||||
});
|
||||
|
||||
test('minBytes runs against the union, not the skeleton (content preserved)', () => {
|
||||
// A floor between skeletonBytes and unionBytes must PASS for sectioned skills,
|
||||
// because the union (total behavior) is what must not shrink.
|
||||
const floor = Math.floor((skeletonBytes + unionBytes) / 2);
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, minBytes: floor };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(true);
|
||||
});
|
||||
|
||||
test('flags a phrase that truly went missing', () => {
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, mustContain: ['this-phrase-is-not-anywhere'] };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(false);
|
||||
expect(res.failures.join()).toContain('missing required phrase');
|
||||
});
|
||||
|
||||
test('maxSizeRatio uses union bytes vs baseline (carve preserves ~total size)', () => {
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, maxSizeRatio: 1.05 };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(true); // union == baseline here → ratio 1.0
|
||||
});
|
||||
});
|
||||
|
|
@ -83,9 +83,22 @@ describe("#1539 generated SKILL.md files — gate propagated to all consumers",
|
|||
"ship/SKILL.md",
|
||||
];
|
||||
|
||||
// ship's confidence-calibration gate moved into sections/review-army.md (T9 carve);
|
||||
// read the skeleton+sections union so the gate is still found.
|
||||
const readUnion = (rel: string): string => {
|
||||
let body = fs.readFileSync(path.join(ROOT, rel), "utf-8");
|
||||
const secDir = path.join(ROOT, path.dirname(rel), "sections");
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith(".md")) body += "\n" + fs.readFileSync(path.join(secDir, f), "utf-8");
|
||||
}
|
||||
}
|
||||
return body;
|
||||
};
|
||||
|
||||
for (const rel of consumers) {
|
||||
test(`${rel} carries the Pre-emit verification gate`, () => {
|
||||
const body = fs.readFileSync(path.join(ROOT, rel), "utf-8");
|
||||
const body = readUnion(rel);
|
||||
expect(body).toMatch(/Pre-emit verification gate/);
|
||||
expect(body).toMatch(/Quote the specific code line/);
|
||||
});
|
||||
|
|
|
|||
|
|
@ -0,0 +1,41 @@
|
|||
/**
|
||||
* Unit tests for assertRequiredReads (v2 plan T9 mitigation layer 5). Pure logic
|
||||
* over synthetic tool-call transcripts — the section-loading E2E (paid) drives
|
||||
* this against real /ship runs.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { assertRequiredReads } from './helpers/required-reads';
|
||||
import type { ToolCallLike } from './helpers/transcript-section-logger';
|
||||
|
||||
const read = (fp: string): ToolCallLike => ({ tool: 'Read', input: { file_path: fp }, output: '' });
|
||||
|
||||
describe('assertRequiredReads', () => {
|
||||
test('passes when every required section was Read', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('/Users/x/.claude/skills/gstack/ship/sections/version-bump.md'),
|
||||
read('ship/sections/changelog.md'),
|
||||
],
|
||||
};
|
||||
const r = assertRequiredReads(result, ['version-bump.md', 'changelog.md']);
|
||||
expect(r.ok).toBe(true);
|
||||
expect(r.missing).toEqual([]);
|
||||
});
|
||||
|
||||
test('flags a required section the agent never opened', () => {
|
||||
const result = { toolCalls: [read('ship/sections/changelog.md')] };
|
||||
const r = assertRequiredReads(result, ['version-bump.md', 'changelog.md']);
|
||||
expect(r.ok).toBe(false);
|
||||
expect(r.missing).toEqual(['version-bump.md']);
|
||||
});
|
||||
|
||||
test('tolerates a sections/ prefix in the required list', () => {
|
||||
const result = { toolCalls: [read('/abs/gstack/ship/sections/review-army.md')] };
|
||||
expect(assertRequiredReads(result, ['sections/review-army.md']).ok).toBe(true);
|
||||
});
|
||||
|
||||
test('empty required set always passes', () => {
|
||||
expect(assertRequiredReads({ toolCalls: [] }, []).ok).toBe(true);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,77 @@
|
|||
/**
|
||||
* Section manifest ↔ filesystem consistency (v2 plan T9 / Phase C orphan check).
|
||||
*
|
||||
* Implements the 3-tier orphan classification from v2_PLAN.md:
|
||||
* - generated orphan (sections/X.md with no sections/X.md.tmpl) → FAIL
|
||||
* - hand-edited generated file (X.md missing the AUTO-GENERATED header) → FAIL
|
||||
* - manifest orphan (sections/X.md.tmpl not listed in manifest) → WARN (v2.0)
|
||||
*
|
||||
* Also pins the PASSIVE-manifest contract (CM2 / v2_PLAN.md:663): manifest entries
|
||||
* carry only id/file/title/trigger — no machine predicate (applies_when/required_for).
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const SHIP_SECTIONS = path.join(ROOT, 'ship', 'sections');
|
||||
const manifest = JSON.parse(fs.readFileSync(path.join(SHIP_SECTIONS, 'manifest.json'), 'utf-8'));
|
||||
|
||||
const sectionTmpls = fs.readdirSync(SHIP_SECTIONS).filter(f => f.endsWith('.md.tmpl'));
|
||||
const sectionMds = fs.readdirSync(SHIP_SECTIONS).filter(f => f.endsWith('.md') && !f.endsWith('.md.tmpl'));
|
||||
|
||||
describe('section manifest ↔ filesystem consistency', () => {
|
||||
test('manifest parses with skill + sections array', () => {
|
||||
expect(manifest.skill).toBe('ship');
|
||||
expect(Array.isArray(manifest.sections)).toBe(true);
|
||||
expect(manifest.sections.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
test('every manifest entry has a .md.tmpl source AND a generated .md', () => {
|
||||
for (const s of manifest.sections) {
|
||||
expect(fs.existsSync(path.join(SHIP_SECTIONS, `${s.file}.tmpl`))).toBe(true);
|
||||
expect(fs.existsSync(path.join(SHIP_SECTIONS, s.file))).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
test('manifest is PASSIVE — no applies_when / required_for predicate (CM2)', () => {
|
||||
for (const s of manifest.sections) {
|
||||
expect(s).not.toHaveProperty('applies_when');
|
||||
expect(s).not.toHaveProperty('required_for');
|
||||
// The allowed passive shape:
|
||||
expect(typeof s.id).toBe('string');
|
||||
expect(typeof s.file).toBe('string');
|
||||
expect(typeof s.title).toBe('string');
|
||||
expect(typeof s.trigger).toBe('string');
|
||||
}
|
||||
});
|
||||
|
||||
test('no generated orphan: every sections/X.md has a sections/X.md.tmpl → FAIL', () => {
|
||||
const orphans = sectionMds.filter(md => !sectionTmpls.includes(`${md}.tmpl`));
|
||||
expect(orphans).toEqual([]);
|
||||
});
|
||||
|
||||
test('no hand-edited generated file: every sections/X.md has the AUTO-GENERATED header → FAIL', () => {
|
||||
for (const md of sectionMds) {
|
||||
const head = fs.readFileSync(path.join(SHIP_SECTIONS, md), 'utf-8').slice(0, 120);
|
||||
expect(head).toContain('AUTO-GENERATED');
|
||||
}
|
||||
});
|
||||
|
||||
test('manifest orphan check (WARN in v2.0): every .md.tmpl is listed', () => {
|
||||
const listed = new Set(manifest.sections.map((s: { file: string }) => `${s.file}.tmpl`));
|
||||
const unlisted = sectionTmpls.filter(t => !listed.has(t));
|
||||
if (unlisted.length > 0) {
|
||||
// v2_PLAN.md: WARN now, FAIL in v2.1. Surface, don't fail the build yet.
|
||||
// eslint-disable-next-line no-console
|
||||
console.warn(`[section-manifest] manifest orphan(s) (not in manifest.json): ${unlisted.join(', ')}`);
|
||||
}
|
||||
expect(unlisted.length).toBeLessThanOrEqual(unlisted.length); // always passes; WARN only
|
||||
});
|
||||
|
||||
test('section ids are unique', () => {
|
||||
const ids = manifest.sections.map((s: { id: string }) => s.id);
|
||||
expect(new Set(ids).size).toBe(ids.length);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
/**
|
||||
* Static invariant: the two install targets that cherry-pick SKILL.md (Claude
|
||||
* prefixed dirs + Kiro) must ALSO install the sections/ subdir, or a carved
|
||||
* skill's runtime "Read sections/<name>.md" 404s. codex/factory/opencode link
|
||||
* the whole generated dir, so sections ride along for free there.
|
||||
*
|
||||
* Matches the repo's static-tripwire style (setup-windows-fallback,
|
||||
* cdp-session-cleanup). End-to-end "sections resolve in a temp install" runs in
|
||||
* the group-5/6 functional pass once real ship/sections/ exist.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const SETUP = fs.readFileSync(path.join(import.meta.dir, '..', 'setup'), 'utf-8');
|
||||
|
||||
/** Body of a shell function `name() { ... }` up to the closing line `}`. */
|
||||
function fnBody(src: string, name: string): string {
|
||||
const start = src.indexOf(`${name}() {`);
|
||||
if (start === -1) return '';
|
||||
const end = src.indexOf('\n}', start);
|
||||
return src.slice(start, end === -1 ? undefined : end);
|
||||
}
|
||||
|
||||
describe('setup links sections/ for cherry-pick install targets', () => {
|
||||
test('link_claude_skill_dirs links sections/ via _link_or_copy', () => {
|
||||
const body = fnBody(SETUP, 'link_claude_skill_dirs');
|
||||
expect(body).toContain('sections');
|
||||
// sections install must route through the windows-safe helper, not raw ln.
|
||||
expect(body).toMatch(/_link_or_copy\s+"\$gstack_dir\/\$dir_name\/sections"\s+"\$target\/sections"/);
|
||||
expect(body).toMatch(/if \[ -d "\$gstack_dir\/\$dir_name\/sections" \]/);
|
||||
});
|
||||
|
||||
test('kiro per-skill loop rewrites + copies sections/*', () => {
|
||||
// Kiro builds from the codex output and sed-rewrites paths; sections must get
|
||||
// the same rewrite so they resolve under ~/.kiro, not ~/.codex or ~/.claude.
|
||||
expect(SETUP).toMatch(/if \[ -d "\$skill_dir\/sections" \]/);
|
||||
expect(SETUP).toMatch(/mkdir -p "\$target_dir\/sections"/);
|
||||
expect(SETUP).toContain('$target_dir/sections/$(basename "$section_file")');
|
||||
});
|
||||
|
||||
test('no raw ln introduced (windows-fallback invariant still holds)', () => {
|
||||
// Every new line touching sections uses _link_or_copy or sed redirect, never ln.
|
||||
const sectionLines = SETUP.split('\n').filter(l => l.includes('sections') && /\bln\s+-/.test(l));
|
||||
expect(sectionLines).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
|
@ -2,10 +2,23 @@ import { describe, test, expect } from 'bun:test';
|
|||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const SHIP_SKILL = path.join(__dirname, '..', 'ship', 'SKILL.md');
|
||||
const SHIP_DIR = path.join(__dirname, '..', 'ship');
|
||||
|
||||
// Carved (v2 plan T9): the Plan Completion gate moved into sections/plan-completion.md.
|
||||
// Read the skeleton + sections union so these invariants follow the content.
|
||||
function readShipUnion(): string {
|
||||
let t = fs.readFileSync(path.join(SHIP_DIR, 'SKILL.md'), 'utf8');
|
||||
const secDir = path.join(SHIP_DIR, 'sections');
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf8');
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
describe('ship/SKILL.md — Plan Completion gate invariants (VAS-449 remediation)', () => {
|
||||
const skill = fs.readFileSync(SHIP_SKILL, 'utf8');
|
||||
const skill = readShipUnion();
|
||||
|
||||
test('Path concreteness rule: filesystem-pathed items must be test -f checked', () => {
|
||||
expect(skill).toContain('**Path concreteness rule.**');
|
||||
|
|
|
|||
|
|
@ -9,7 +9,20 @@ import * as path from "path";
|
|||
import { scan } from "../lib/redact-engine";
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, "..");
|
||||
const TMPL = fs.readFileSync(path.join(ROOT, "ship", "SKILL.md.tmpl"), "utf-8");
|
||||
// Carved (v2 plan T9): ship is a skeleton template + sections/*.md.tmpl. The
|
||||
// PR-body redaction wiring moved into sections/pr-body.md.tmpl, so assert against
|
||||
// the union of the skeleton template and its section templates.
|
||||
function readShipTemplateUnion(): string {
|
||||
let t = fs.readFileSync(path.join(ROOT, "ship", "SKILL.md.tmpl"), "utf-8");
|
||||
const secDir = path.join(ROOT, "ship", "sections");
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith(".md.tmpl")) t += "\n" + fs.readFileSync(path.join(secDir, f), "utf-8");
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
const TMPL = readShipTemplateUnion();
|
||||
|
||||
describe("/ship redaction wiring", () => {
|
||||
test("scans the PR body via the shared bin before create", () => {
|
||||
|
|
|
|||
|
|
@ -197,20 +197,26 @@ describeE2E('/ship idempotency E2E (periodic, real-PTY)', () => {
|
|||
}
|
||||
}
|
||||
|
||||
// Positive: the idempotency-check echoed ALREADY_BUMPED.
|
||||
if (/STATE:\s*ALREADY_BUMPED/.test(visible)) {
|
||||
// Positive: idempotency classify reported ALREADY_BUMPED. Post-carve
|
||||
// (T9), Step 12 runs `gstack-version-bump classify` which emits JSON
|
||||
// (`"state":"ALREADY_BUMPED"`); the legacy inline bash echoed
|
||||
// `STATE: ALREADY_BUMPED`. Accept either so the test survives the carve.
|
||||
if (/STATE:\s*ALREADY_BUMPED|"state":\s*"ALREADY_BUMPED"/.test(visible)) {
|
||||
outcome = 'detected';
|
||||
evidence = visible.slice(-3000);
|
||||
break;
|
||||
}
|
||||
|
||||
// Negative regressions:
|
||||
// - bump-action bash block ran (would echo on FRESH path)
|
||||
// - classify reported FRESH (CLI JSON or legacy echo) → would re-bump
|
||||
// - agent attempted git commit -m "chore: bump version"
|
||||
// - agent attempted git push
|
||||
// - agent rendered an Edit/Write to CHANGELOG.md or VERSION (acceptable in plan mode but flagged here)
|
||||
// - agent ran the CLI write path (gstack-version-bump write) — a
|
||||
// re-bump on an already-shipped branch
|
||||
if (
|
||||
/"state":\s*"FRESH"/.test(visible) ||
|
||||
/STATE:\s*FRESH(?![\w-])/i.test(visible) ||
|
||||
/gstack-version-bump\s+write/i.test(visible) ||
|
||||
/git\s+commit\s+.*chore:\s*bump\s+version/i.test(visible) ||
|
||||
/git\s+push.*origin/i.test(visible)
|
||||
) {
|
||||
|
|
|
|||
|
|
@ -0,0 +1,120 @@
|
|||
/**
|
||||
* /ship section-loading E2E (periodic, paid, real-PTY) — v2 plan T9 mitigation
|
||||
* layer 5, the ONLY CI-failing guard against silent section-skip.
|
||||
*
|
||||
* After the carve, ship is a skeleton whose STOP-Read directives point at
|
||||
* sections/*.md. This test runs the REAL /ship skill in plan mode against a
|
||||
* fresh version-changing fixture and asserts the agent actually Read the
|
||||
* sections its situation requires (review-army + changelog at minimum — every
|
||||
* version-changing ship needs the pre-landing review and a CHANGELOG entry).
|
||||
*
|
||||
* Runs against the INSTALLED skill at ~/.claude/skills/gstack/ship (Codex
|
||||
* outside-voice #5: an E2E that reads repo paths would miss install-layout
|
||||
* 404s). Section reads are detected from the PTY scrollback — when the agent
|
||||
* Reads a section the tool render shows the `sections/<file>.md` path.
|
||||
*
|
||||
* Plan-mode framing keeps the agent from committing/pushing; producing a plan
|
||||
* is the terminal signal. Cost: ~$2-4/run. Periodic tier.
|
||||
*
|
||||
* Situation matrix (T1 = B): this file covers the fresh version-changing ship;
|
||||
* the already-bumped re-run is covered by skill-e2e-ship-idempotency.test.ts,
|
||||
* and a no-plan-file variant can be added to FIXTURES below.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { spawnSync } from 'child_process';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
import {
|
||||
launchClaudePty,
|
||||
isPermissionDialogVisible,
|
||||
isNumberedOptionListVisible,
|
||||
} from './helpers/claude-pty-runner';
|
||||
|
||||
const shouldRun = !!process.env.EVALS && process.env.EVALS_TIER === 'periodic';
|
||||
const describeE2E = shouldRun ? describe : describe.skip;
|
||||
|
||||
/** Fresh fixture: feature branch with a real change but VERSION still == base,
|
||||
* so /ship must bump (FRESH) and walk the full pre-landing + changelog flow. */
|
||||
function buildFreshFixture(): { workTree: string; root: string } {
|
||||
const root = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-ship-secload-'));
|
||||
const workTree = path.join(root, 'workspace');
|
||||
const bareRemote = path.join(root, 'origin.git');
|
||||
fs.mkdirSync(workTree, { recursive: true });
|
||||
const sh = (cmd: string, cwd: string): void => {
|
||||
const r = spawnSync('bash', ['-c', cmd], { cwd, stdio: 'pipe', timeout: 15_000 });
|
||||
if (r.status !== 0) throw new Error(`fixture setup failed at "${cmd}":\n${r.stderr?.toString()}`);
|
||||
};
|
||||
sh(`git init --bare "${bareRemote}"`, root);
|
||||
sh('git init -b main', workTree);
|
||||
sh('git config user.email "t@t.com" && git config user.name "T" && git config commit.gpgsign false', workTree);
|
||||
fs.writeFileSync(path.join(workTree, 'VERSION'), '0.0.1\n');
|
||||
fs.writeFileSync(path.join(workTree, 'package.json'), JSON.stringify({ name: 'fx', version: '0.0.1', private: true }, null, 2) + '\n');
|
||||
fs.writeFileSync(path.join(workTree, 'CHANGELOG.md'), '# Changelog\n\n## [0.0.1] - 2026-01-01\n\n- Initial release\n');
|
||||
fs.writeFileSync(path.join(workTree, 'app.js'), '// base\n');
|
||||
sh('git add -A && git commit -m "chore: initial v0.0.1"', workTree);
|
||||
sh(`git remote add origin "${bareRemote}" && git push -u origin main`, workTree);
|
||||
// Feature branch: a real code change, VERSION untouched → FRESH (needs a bump).
|
||||
sh('git checkout -b feat/new-thing', workTree);
|
||||
fs.writeFileSync(path.join(workTree, 'app.js'), '// base\nexport function newThing() { return 42; }\n');
|
||||
fs.writeFileSync(path.join(workTree, 'app.test.js'), 'test("newThing", () => {});\n');
|
||||
sh('git add -A && git commit -m "feat: add newThing"', workTree);
|
||||
sh('git push -u origin feat/new-thing', workTree);
|
||||
return { workTree, root };
|
||||
}
|
||||
|
||||
// Sections every version-changing ship must consult.
|
||||
const REQUIRED_SECTIONS = ['review-army.md', 'changelog.md'];
|
||||
|
||||
describeE2E('/ship section-loading E2E (periodic, real-PTY, installed skill)', () => {
|
||||
test(
|
||||
'fresh version-changing ship Reads the required sections',
|
||||
async () => {
|
||||
const { workTree, root } = buildFreshFixture();
|
||||
const session = await launchClaudePty({
|
||||
permissionMode: 'plan',
|
||||
cwd: workTree,
|
||||
timeoutMs: 720_000,
|
||||
env: { GH_TOKEN: 'mock-not-real', NO_COLOR: '1' },
|
||||
});
|
||||
|
||||
const readSections = new Set<string>();
|
||||
let planReady = false;
|
||||
try {
|
||||
await Bun.sleep(8000);
|
||||
const since = session.mark();
|
||||
session.send('/ship\r');
|
||||
const start = Date.now();
|
||||
let lastPermSig = '';
|
||||
while (Date.now() - start < 600_000) {
|
||||
await Bun.sleep(3000);
|
||||
if (session.exited()) break;
|
||||
const visible = session.visibleSince(since);
|
||||
const tail = visible.slice(-1500);
|
||||
if (isNumberedOptionListVisible(tail) && isPermissionDialogVisible(tail)) {
|
||||
const sig = visible.slice(-500);
|
||||
if (sig !== lastPermSig) { lastPermSig = sig; session.send('1\r'); await Bun.sleep(1500); continue; }
|
||||
}
|
||||
// Detect section reads from the scrollback (tool render shows the path).
|
||||
for (const m of visible.matchAll(/sections\/([A-Za-z0-9._-]+\.md)/g)) readSections.add(m[1]);
|
||||
if (/ready to execute|Would you like to proceed|GSTACK REVIEW REPORT/i.test(visible)) {
|
||||
planReady = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
await session.close();
|
||||
try { fs.rmSync(root, { recursive: true, force: true }); } catch { /* ignore */ }
|
||||
}
|
||||
|
||||
const missing = REQUIRED_SECTIONS.filter(s => !readSections.has(s));
|
||||
expect({ planReady, read: [...readSections], missing }).toEqual({
|
||||
planReady: true,
|
||||
read: expect.any(Array),
|
||||
missing: [],
|
||||
});
|
||||
},
|
||||
900_000,
|
||||
);
|
||||
});
|
||||
|
|
@ -156,7 +156,11 @@ describe('SKILL.md size budget regression (gate, free)', () => {
|
|||
const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||
const current = captureBaseline({ repoRoot: REPO_ROOT });
|
||||
const MIN_RATIO = 0.80; // a skill at <80% of its v1.44 size signals mass-deletion
|
||||
const SECTIONS_EXTRACTED = new Set<string>(); // populate in v2.0.0.0 when sections/ lands
|
||||
// Carved skills (v2 plan T9): the skeleton SKILL.md intentionally shrinks
|
||||
// because prose moved into sections/*.md. The union size is guarded instead
|
||||
// by the sectioned ship invariant in parity-harness.ts (minBytes on the
|
||||
// skeleton+sections union), so exempt the skeleton from the body-strip floor.
|
||||
const SECTIONS_EXTRACTED = new Set<string>(['ship']);
|
||||
|
||||
const undershoots: Array<{
|
||||
skill: string; beforeBytes: number; afterBytes: number; ratio: number;
|
||||
|
|
|
|||
|
|
@ -7,6 +7,22 @@ import * as path from 'path';
|
|||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
|
||||
// Carved-skill aware (v2 plan T9): ship is a skeleton SKILL.md + sections/*.md.
|
||||
// Read the union so validations of content that moved into a section still hold.
|
||||
// `_SHIP_MD` is a distinct path expression so a mechanical read-replace can't
|
||||
// recurse into this helper.
|
||||
const _SHIP_MD = path.join(ROOT, 'ship', 'SKILL.md');
|
||||
function readShipUnion(): string {
|
||||
let t = fs.readFileSync(_SHIP_MD, 'utf-8');
|
||||
const secDir = path.join(ROOT, 'ship', 'sections');
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf-8');
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
describe('SKILL.md command validation', () => {
|
||||
test('all $B commands in SKILL.md are valid browse commands', () => {
|
||||
const result = validateSkill(path.join(ROOT, 'SKILL.md'));
|
||||
|
|
@ -315,7 +331,8 @@ describe('Cross-skill path consistency', () => {
|
|||
for (const file of filesToCheck) {
|
||||
const filePath = path.join(ROOT, file);
|
||||
if (!fs.existsSync(filePath)) continue;
|
||||
const content = fs.readFileSync(filePath, 'utf-8');
|
||||
// ship's greptile handling moved into sections/greptile.md (T9 carve).
|
||||
const content = file === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(filePath, 'utf-8');
|
||||
|
||||
const hasBoth = (content.includes('per-project') && content.includes('global')) ||
|
||||
(content.includes('$REMOTE_SLUG/greptile-history') && content.includes('~/.gstack/greptile-history'));
|
||||
|
|
@ -437,7 +454,7 @@ describe('Greptile history format consistency', () => {
|
|||
|
||||
test('review/SKILL.md and ship/SKILL.md both reference greptile-triage.md for write details', () => {
|
||||
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
expect(reviewContent.toLowerCase()).toContain('greptile-triage.md');
|
||||
expect(shipContent.toLowerCase()).toContain('greptile-triage.md');
|
||||
|
|
@ -530,7 +547,7 @@ describe('TODOS-format.md reference consistency', () => {
|
|||
});
|
||||
|
||||
test('skills that write TODOs reference TODOS-format.md', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
const ceoPlanContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||||
const engPlanContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
|
|
@ -788,7 +805,7 @@ describe('Enum & Value Completeness in review checklist', () => {
|
|||
expect(checklist).toContain('ASK');
|
||||
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
expect(reviewSkill).toContain('AUTO-FIX');
|
||||
expect(reviewSkill).toContain('[AUTO-FIXED]');
|
||||
expect(shipSkill).toContain('AUTO-FIX');
|
||||
|
|
@ -1014,7 +1031,7 @@ describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
|||
});
|
||||
|
||||
test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Test Framework Bootstrap');
|
||||
expect(content).toContain('Step 4');
|
||||
});
|
||||
|
|
@ -1063,7 +1080,7 @@ describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
|||
|
||||
test('WebSearch is in allowed-tools for qa, ship, design-review', () => {
|
||||
const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const ship = readShipUnion();
|
||||
const qaDesign = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||||
expect(qa).toContain('WebSearch');
|
||||
expect(ship).toContain('WebSearch');
|
||||
|
|
@ -1112,7 +1129,7 @@ describe('Phase 8e.5 regression test generation', () => {
|
|||
|
||||
describe('Step 3.4 test coverage audit', () => {
|
||||
test('ship/SKILL.md contains Step 7', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Step 7: Test Coverage Audit');
|
||||
// The coverage diagram collapses code-path and user-flow counts onto one
|
||||
// summary line. Verify that summary is present (labels are stable).
|
||||
|
|
@ -1120,7 +1137,7 @@ describe('Step 3.4 test coverage audit', () => {
|
|||
});
|
||||
|
||||
test('Step 3.4 includes quality scoring rubric', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('★★★');
|
||||
expect(content).toContain('★★');
|
||||
expect(content).toContain('edge cases AND error paths');
|
||||
|
|
@ -1128,36 +1145,36 @@ describe('Step 3.4 test coverage audit', () => {
|
|||
});
|
||||
|
||||
test('Step 3.4 includes before/after test count', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Count test files before');
|
||||
expect(content).toContain('Count test files after');
|
||||
});
|
||||
|
||||
test('ship PR body includes Test Coverage section', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('## Test Coverage');
|
||||
});
|
||||
|
||||
test('ship rules include test generation rule', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Step 7 generates coverage tests');
|
||||
expect(content).toContain('Never commit failing tests');
|
||||
});
|
||||
|
||||
test('Step 3.4 includes vibe coding philosophy', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('vibe coding becomes yolo coding');
|
||||
});
|
||||
|
||||
test('Step 3.4 traces actual codepaths, not just syntax', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Trace every codepath');
|
||||
expect(content).toContain('Trace data flow');
|
||||
expect(content).toContain('Diagram the execution');
|
||||
});
|
||||
|
||||
test('Step 3.4 maps user flows and interaction edge cases', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Map user flows');
|
||||
expect(content).toContain('Interaction edge cases');
|
||||
expect(content).toContain('Double-click');
|
||||
|
|
@ -1167,7 +1184,7 @@ describe('Step 3.4 test coverage audit', () => {
|
|||
});
|
||||
|
||||
test('Step 3.4 diagram includes user-flow coverage summary', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
// The diagram was compressed from separate CODE PATH COVERAGE / USER FLOW
|
||||
// COVERAGE section headers into a single summary line. Assert on the
|
||||
// labels that still appear on that summary line.
|
||||
|
|
@ -1203,7 +1220,7 @@ describe('ship step numbering', () => {
|
|||
});
|
||||
|
||||
test('ship/SKILL.md main headings use clean integer step numbers', () => {
|
||||
const skill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const skill = readShipUnion();
|
||||
// Headings like "## Step 7: Test Coverage Audit" — NOT sub-steps like "## Step 8.1:"
|
||||
const headings = Array.from(skill.matchAll(/^## Step (\d+(?:\.\d+)?):/gm)).map(
|
||||
(m) => m[1]
|
||||
|
|
@ -1381,7 +1398,7 @@ describe('Codex skill', () => {
|
|||
});
|
||||
|
||||
test('adversarial review in /ship always runs both passes', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Adversarial review (always-on)');
|
||||
expect(content).toContain('adversarial-review');
|
||||
expect(content).toContain('reasoning_effort="high"');
|
||||
|
|
@ -1391,7 +1408,7 @@ describe('Codex skill', () => {
|
|||
|
||||
test('scope drift detection in /review and /ship', () => {
|
||||
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
// Both should contain scope drift from the shared resolver
|
||||
for (const content of [reviewContent, shipContent]) {
|
||||
expect(content).toContain('Scope Check:');
|
||||
|
|
@ -1427,7 +1444,8 @@ describe('Codex skill', () => {
|
|||
|
||||
test('codex review invocations avoid the prompt plus --base argument shape', () => {
|
||||
for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
|
||||
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
// ship's codex command moved into sections/adversarial.md (T9 carve).
|
||||
const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
|
||||
expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
|
||||
}
|
||||
|
|
@ -1443,7 +1461,8 @@ describe('Codex skill', () => {
|
|||
const boundaryLine =
|
||||
'Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/';
|
||||
for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
|
||||
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
// ship's codex/adversarial boundary line moved into sections/adversarial.md.
|
||||
const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
expect(content).toContain(boundaryLine);
|
||||
}
|
||||
});
|
||||
|
|
@ -1456,7 +1475,7 @@ describe('Codex skill', () => {
|
|||
});
|
||||
|
||||
test('Review Readiness Dashboard includes Adversarial Review row', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Adversarial');
|
||||
expect(content).toContain('codex-review');
|
||||
});
|
||||
|
|
@ -1711,17 +1730,17 @@ describe('Repo mode preamble validation', () => {
|
|||
|
||||
describe('Test failure triage in ship skill', () => {
|
||||
test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Test Failure Ownership Triage');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md triage uses git diff for classification', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('git diff origin/<base>...HEAD --name-only');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md triage has solo and collaborative paths', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('REPO_MODE');
|
||||
expect(content).toContain('solo');
|
||||
expect(content).toContain('collaborative');
|
||||
|
|
@ -1730,18 +1749,18 @@ describe('Test failure triage in ship skill', () => {
|
|||
});
|
||||
|
||||
test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('gh issue create');
|
||||
expect(content).toContain('--assignee');
|
||||
});
|
||||
|
||||
test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md uses in-branch language for stop condition', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('In-branch test failures');
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -0,0 +1,58 @@
|
|||
/**
|
||||
* Section TemplateContext parity (v2 plan T9 / Codex consult absorbed-refinement #1).
|
||||
*
|
||||
* Section generation must use the SAME TemplateContext as the parent skill —
|
||||
* crucially the same skillName, so resolver `appliesTo` gating + tier behave
|
||||
* identically. If a section resolved with skillName "sections" (the bug
|
||||
* processSectionTemplate guards against), gated resolvers like ADVERSARIAL_STEP /
|
||||
* CONFIDENCE_CALIBRATION would render empty.
|
||||
*
|
||||
* We assert on the GENERATED section output: gated resolver content is present and
|
||||
* no placeholder is left unresolved. That can only be true if the parent ctx
|
||||
* (skillName=ship) drove the resolve.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const SHIP_SECTIONS = path.join(ROOT, 'ship', 'sections');
|
||||
|
||||
function readSection(file: string): string {
|
||||
return fs.readFileSync(path.join(SHIP_SECTIONS, file), 'utf-8');
|
||||
}
|
||||
|
||||
describe('section TemplateContext parity (skillName pinned to parent)', () => {
|
||||
test('no generated section has unresolved {{PLACEHOLDER}} tokens', () => {
|
||||
for (const md of fs.readdirSync(SHIP_SECTIONS).filter(f => f.endsWith('.md') && !f.endsWith('.md.tmpl'))) {
|
||||
const content = readSection(md);
|
||||
const unresolved = content.match(/\{\{[A-Z_]+(?::[^}]+)?\}\}/g);
|
||||
expect({ md, unresolved }).toEqual({ md, unresolved: null });
|
||||
}
|
||||
});
|
||||
|
||||
test('adversarial section rendered the ADVERSARIAL_STEP resolver (proves ship ctx)', () => {
|
||||
const content = readSection('adversarial.md');
|
||||
// The codex filesystem-boundary line only appears when ADVERSARIAL_STEP resolves.
|
||||
expect(content).toContain('Do NOT read or execute any files under');
|
||||
expect(content.length).toBeGreaterThan(500);
|
||||
});
|
||||
|
||||
test('review-army section rendered CONFIDENCE_CALIBRATION + REVIEW_ARMY (gated resolvers)', () => {
|
||||
const content = readSection('review-army.md');
|
||||
expect(content).toContain('Confidence Calibration');
|
||||
expect(content).toContain('confidence score');
|
||||
});
|
||||
|
||||
test('tests section rendered TEST_BOOTSTRAP + TEST_FAILURE_TRIAGE', () => {
|
||||
const content = readSection('tests.md');
|
||||
expect(content).toContain('Test Failure Ownership Triage');
|
||||
});
|
||||
|
||||
test('changelog section rendered CHANGELOG_WORKFLOW', () => {
|
||||
const content = readSection('changelog.md');
|
||||
expect(content).toContain('CHANGELOG');
|
||||
expect(content.length).toBeGreaterThan(300);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
/**
|
||||
* Unit tests for the transcript section logger (T10). Pure-function coverage —
|
||||
* no paid run needed. Drives the analyzers with synthetic tool-call transcripts.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import {
|
||||
extractSectionReads,
|
||||
extractShipActions,
|
||||
compareShipActions,
|
||||
writeShipBaseline,
|
||||
readShipBaseline,
|
||||
baselinePath,
|
||||
SHIP_ACTIONS,
|
||||
type ToolCallLike,
|
||||
type ShipBaseline,
|
||||
} from './helpers/transcript-section-logger';
|
||||
|
||||
const read = (fp: string): ToolCallLike => ({ tool: 'Read', input: { file_path: fp }, output: '' });
|
||||
const bash = (command: string): ToolCallLike => ({ tool: 'Bash', input: { command }, output: '' });
|
||||
|
||||
describe('extractSectionReads', () => {
|
||||
test('picks up section reads via the /sections/<file>.md segment', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('/Users/x/.claude/skills/gstack-ship/sections/version-bump.md'),
|
||||
read('ship/sections/changelog.md'),
|
||||
read('/abs/.factory/skills/gstack-ship/sections/review-army.md'),
|
||||
],
|
||||
};
|
||||
expect(extractSectionReads(result)).toEqual(['version-bump.md', 'changelog.md', 'review-army.md']);
|
||||
});
|
||||
|
||||
test('ignores non-section reads and non-Read tools', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('ship/SKILL.md'),
|
||||
read('/some/sections-like/notsections/x.md'),
|
||||
bash('cat ship/sections/version-bump.md'), // bash, not a Read
|
||||
],
|
||||
};
|
||||
expect(extractSectionReads(result)).toEqual([]);
|
||||
});
|
||||
|
||||
test('dedupes and preserves first-read order', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('ship/sections/tests.md'),
|
||||
read('ship/sections/version-bump.md'),
|
||||
read('ship/sections/tests.md'),
|
||||
],
|
||||
};
|
||||
expect(extractSectionReads(result)).toEqual(['tests.md', 'version-bump.md']);
|
||||
});
|
||||
});
|
||||
|
||||
describe('extractShipActions', () => {
|
||||
test('detects the full action fingerprint from bash + writes', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
bash('git merge origin/main'),
|
||||
bash('bun test'),
|
||||
bash('gstack-version-bump --bump minor'),
|
||||
{ tool: 'Edit', input: { file_path: 'CHANGELOG.md' }, output: '' },
|
||||
bash('git commit -m "v1.2.0.0 feat"'),
|
||||
bash('git push origin HEAD'),
|
||||
bash('gh pr create --base main'),
|
||||
],
|
||||
};
|
||||
expect(extractShipActions(result)).toEqual([...SHIP_ACTIONS]);
|
||||
});
|
||||
|
||||
test('returns canonical order regardless of execution order', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
bash('gh pr create --base main'),
|
||||
bash('git merge origin/main'),
|
||||
],
|
||||
};
|
||||
expect(extractShipActions(result)).toEqual(['merged_base', 'opened_pr']);
|
||||
});
|
||||
|
||||
test('VERSION write counts as a version bump even without the CLI', () => {
|
||||
const result = { toolCalls: [{ tool: 'Write', input: { file_path: 'VERSION' }, output: '' }] };
|
||||
expect(extractShipActions(result)).toEqual(['bumped_version']);
|
||||
});
|
||||
|
||||
test('empty run produces empty fingerprint', () => {
|
||||
expect(extractShipActions({ toolCalls: [] })).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('compareShipActions', () => {
|
||||
const baseline: ShipBaseline = {
|
||||
tag: 'monolith',
|
||||
situation: 'fresh-version-changing',
|
||||
actions: ['merged_base', 'ran_tests', 'bumped_version', 'wrote_changelog', 'committed', 'pushed', 'opened_pr'],
|
||||
sectionReads: [],
|
||||
capturedAt: '2026-05-30T00:00:00Z',
|
||||
};
|
||||
|
||||
test('flags a dropped action as the carve regression', () => {
|
||||
const current = baseline.actions.filter(a => a !== 'bumped_version');
|
||||
const diff = compareShipActions(baseline, current);
|
||||
expect(diff.ok).toBe(false);
|
||||
expect(diff.missing).toEqual(['bumped_version']);
|
||||
});
|
||||
|
||||
test('passes when the sectioned run performs every baseline action', () => {
|
||||
const diff = compareShipActions(baseline, [...baseline.actions, 'merged_base']);
|
||||
expect(diff.ok).toBe(true);
|
||||
expect(diff.missing).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('baseline persistence', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'ship-baseline-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
test('round-trips a baseline to disk', () => {
|
||||
const baseline: ShipBaseline = {
|
||||
tag: 'monolith', situation: 'no-plan-file',
|
||||
actions: ['ran_tests', 'committed'], sectionReads: [], capturedAt: '2026-05-30T00:00:00Z',
|
||||
};
|
||||
const p = writeShipBaseline(baseline, dir);
|
||||
expect(p).toBe(baselinePath('no-plan-file', dir));
|
||||
expect(readShipBaseline('no-plan-file', dir)).toEqual(baseline);
|
||||
});
|
||||
|
||||
test('returns null when no baseline captured yet', () => {
|
||||
expect(readShipBaseline('never-captured', dir)).toBeNull();
|
||||
});
|
||||
});
|
||||
Loading…
Reference in New Issue