Merge origin/main into /spec branch — retag v1.45.0.0 → v1.47.0.0

main moved to v1.46.0.0 (gstack v2 foundation, eval-first floor across
51 skills) while this branch was at v1.45.0.0. v1.46 also reserved
v1.45.0.0 for the design daemon feature. Retag this branch's release
v1.45.0.0 → v1.47.0.0 so it lands cleanly on top of main.

Conflict resolutions:
- VERSION: 1.47.0.0 (MINOR continues on top of main's 1.46.0.0; this
  branch is also a MINOR per scale-aware rules — new skill capability).
- CHANGELOG: rewrite this branch's release header v1.45.0.0 → v1.47.0.0.
  Keep both main entries above main's older history.

Adapts to main's eval-first floor (v1.46.0.0 test/skill-coverage-matrix.ts
+ test/skill-coverage-floor.test.ts):
- Register /spec in SKILL_COVERAGE with 3 gate entries + 2 periodic.
- Skill catalog grows 51 → 52. Floor 6/6 structural checks pass.
- Catalog tokens: 4045 → 4116 (+71 for /spec, within v1.46's ≤7000 budget).
- Trim spec frontmatter description to single-paragraph block form to
  respect v1.46's catalog-trim intent (was 14 lines / ~900 chars,
  now 5 lines / ~350 chars; routing prose stays in body sections).
- 363/363 gate-tier tests pass across skill-coverage-floor (309) +
  skill-coverage-matrix (10) + skill-size-budget (3) + parity-suite (4) +
  spec-template-invariants (35) + spec-template-sync (2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan 2026-05-26 18:54:21 -07:00
commit 3d77d1edd6
No known key found for this signature in database
GPG Key ID: C1F69E85C74EFE1D
132 changed files with 10945 additions and 4270 deletions

View File

@ -116,6 +116,7 @@ jobs:
test/setup-windows-fallback.test.ts \
test/build-script-shell-compat.test.ts \
test/docs-config-keys.test.ts \
test/brain-sync-windows-paths.test.ts \
make-pdf/test/browseClient.test.ts \
make-pdf/test/pdftotext.test.ts
shell: bash

View File

@ -1,14 +1,16 @@
# Changelog
## [1.45.0.0] - 2026-05-26
## [1.47.0.0] - 2026-05-26
## **`/spec` ships: turn vague intent into a precise, executable spec in five phases.** Pipe the spec into a spawned Claude Code agent, dedupe against existing issues, archive locally for the team corpus, and let `/ship` close the source issue on merge.
A precise spec collapses an agent's clarification roundtrips from N to zero. `/spec` is the verb that turns thoughts into commits: five strict phases (why, scope, technical with mandatory code-reading, draft, file), a codex quality gate before file, archive to `$GSTACK_STATE_ROOT/projects/$SLUG/specs/`, and optional pipeline-mode spawn into a fresh worktree. Plan-mode aware: in plan mode `/spec` files the issue and loads the spec into your active plan file; in execution mode it files the issue and spawns `claude -p` in a fresh worktree by default. `/ship` reads the archive frontmatter and auto-closes the source issue on full delivery. Adapted from a community-contributed `/issue` skill (PR #1698 by @jayzalowitz) with rename, race+security hardening, and DX polish.
`/spec` is the first skill registered against the v1.46 eval-first floor (`test/skill-coverage-matrix.ts`), passing all six structural floor checks plus 37 deterministic invariant assertions specific to `/spec`'s contract. Skill catalog count: 51 → 52.
### The numbers that matter
Source: 1 contributor commit + 8 follow-on bundled fixes/expansions on this branch (`git log v1.44.0.0..HEAD --oneline`). Template at `spec/SKILL.md.tmpl` (404 → ~750 lines after expansions), 4 new test files (17 logical scenarios: 14 template invariants + sync check + 2 periodic-tier stubs).
Source: 1 contributor commit + 8 follow-on bundled fixes/expansions on this branch (`git log v1.46.0.0..HEAD --oneline`). Template at `spec/SKILL.md.tmpl` (404 → ~750 lines after expansions), 4 new test files (37 deterministic scenarios + 2 periodic-tier stubs).
| Capability | Without `/spec` | With `/spec` |
|---|---|---|
@ -38,9 +40,10 @@ Type `/spec` on a vague bug; four minutes later you have a filed GitHub issue wi
- `GSTACK_PLAN_MODE` env var: emitted by `{{PREAMBLE}}` based on `CLAUDE_PLAN_FILE` presence. Skills can branch behavior on plan-mode state without parsing system reminders.
- `/spec` entry in the gstack routing block injected into project CLAUDE.md.
- `/ship` PR body integration: reads `spec_issue_number` from archive frontmatter and adds `Closes #N` when the spec is fully delivered per the existing plan-completion gate. Partial delivery emits a "Linked to #N (not auto-closing)" notice instead.
- `/spec` entry in `test/skill-coverage-matrix.ts` (52nd skill, eval-first floor compliance per v1.46 contract).
#### Tests
- `test/spec-template-invariants.test.ts`: 14 deterministic invariants covering Phase 1 hard gate, Phase 3 hard-grep mandate, `--dedupe` graceful-skip paths, `--execute` race + security hardening (TOCTOU re-check, SHA pin, unique branch), quality-gate redaction patterns and BLOCKED path, archive atomic write + sync exclusion, plan-mode-aware Phase 5 dispatch.
- `test/spec-template-invariants.test.ts`: 35 deterministic invariants covering Phase 1 hard gate, Phase 3 hard-grep mandate, `--dedupe` graceful-skip paths, `--execute` race + security hardening (TOCTOU re-check, SHA pin, unique branch), quality-gate redaction patterns and BLOCKED path, archive atomic write + sync exclusion, plan-mode-aware Phase 5 dispatch.
- `test/spec-template-sync.test.ts`: regenerates `spec/SKILL.md` and asserts byte-identical output (prevents template-vs-generated drift).
- `test/skill-e2e-spec-execute.test.ts` (periodic-tier): full `/spec --execute` pipeline scaffold registered in `E2E_TIERS`.
- `test/skill-llm-eval-spec.test.ts` (periodic-tier): authored-spec quality eval against the 14-Quality-Standards rubric.
@ -53,6 +56,171 @@ Type `/spec` on a vague bug; four minutes later you have a filed GitHub issue wi
- Plan reviewed across `/plan-ceo-review` (SCOPE EXPANSION, 5 of 6 expansions accepted), `/plan-eng-review` (race + security hardening), and `/plan-devex-review` (persona, magical moment, error-message Tier 1, plan-mode-aware Phase 5).
- 28 codex adversarial findings across 3 review rounds, 23 accepted.
## [1.46.0.0] - 2026-05-26
## **gstack v2 foundation lands. Catalog tokens drop 56%, eval-first floor covers all 51 skills, hard token + dollar caps gate every PR.**
The always-loaded skill catalog — what every Claude Code session pays for at startup before any real work begins — went from ~9,319 tokens to ~4,045 tokens. That's a 56.6% cut to the surface gstack has been criticized for (third-party review, May 2026: "10K+ tokens before any real code is written"). Heavyweight skills like `/ship`, `/plan-ceo-review`, `/office-hours` still ship their full content, but their frontmatter descriptions trim to one sentence each; the routing prose lives in a new "## When to invoke" body section, and a per-run `scripts/proactive-suggestions.json` registry holds the voice-trigger + proactive-suggest text so agents can pull guidance on demand instead of always-loaded.
This is the v2 foundation release. The architectural break — `sections/*.md.tmpl` pattern, mechanical Read enforcement, eval-coverage annotations — lands in v2.0.0.0 as a coordinated launch. v1.46 absorbs every low-risk win, ships the eval-first floor every future skill must pass, and locks in the v1.44.1 reference baseline so reviewers can audit v1→v2 numbers against a real file (`test/fixtures/parity-baseline-v1.44.1.json`).
### The numbers that matter
Source: `bun run scripts/capture-baseline.ts --tag v1.46.0.0` vs the locked v1.44.1 baseline at `test/fixtures/parity-baseline-v1.44.1.json`. Reproduce locally with `bun test test/skill-size-budget.test.ts`.
| Metric | v1.44.1 | v1.46.0.0 | Δ |
|---|---|---|---|
| Catalog tokens (always-loaded system prompt) | ~9,319 | ~4,045 | **56.6%** |
| Total SKILL.md corpus | 2,847 KB | 2,813 KB | 1.2% |
| ship.md | 160 KB | 159 KB | 0.5% |
| plan-ceo-review.md | 128 KB | 127 KB | 0.7% |
| office-hours.md | 108 KB | 108 KB | 0.8% |
| Skills with gate-tier eval coverage | 32 of 51 | **51 of 51** | floor achieved |
| Cathedral parity invariants pinned | 0 | **10** | structural + content |
| Token & dollar budget regressions caught at CI | (none) | **5 new test files** | per-skill, corpus, catalog, eval-cost gate, eval-cost periodic |
The corpus barely moved because the catalog trim MOVES routing prose from frontmatter to a body section — it doesn't delete it. The always-loaded surface drops by more than half because catalog text is what Claude Code reads on every session start; body content only loads when the skill is invoked.
### What this means for you
If you use any gstack skill, every session starts ~5,000 tokens lighter before you type anything. Heavyweight invocations like `/ship` cost about the same as before, but session startup feels snappier. If you've been on the fence about installing gstack because of the "fat" reputation, this is the release that addresses it directly: the always-loaded surface is now competitive with stripped-down skill packs while every skill keeps its full body content.
If you contribute skills, the eval-first floor means a new SKILL.md without an entry in `test/skill-coverage-matrix.ts` fails CI. The minimum entry is one line referencing `test/skill-coverage-floor.test.ts` (the free structural-compliance smoke test). Behavioral E2E coverage gets layered on top per skill.
If you run gstack in CI, the new `EVALS_BUDGET_HARD_CAP=$30` cap (per-suite: gate $25 / periodic $70) stops runaway eval costs from a model price change or infinite-retry bug. Override path exists for legit-need-more cases: `EVALS_BUDGET_OVERRIDE_REASON="why this is OK"` logs to `~/.gstack/analytics/spend-overrides.jsonl` for audit.
### Itemized changes
**Added**
- `scripts/capture-baseline.ts` + `test/helpers/capture-parity-baseline.ts` — captures per-skill SKILL.md sizes, token estimates, frontmatter description lengths, and eval coverage flags. Writes JSON snapshots used by the parity and size-budget gates. Locks `test/fixtures/parity-baseline-v1.44.1.json` as the v1→v2 reference.
- `test/helpers/parity-harness.ts` + `test/parity-suite.test.ts` — cathedral parity-eval suite floor. `PARITY_INVARIANTS` registry pins must-preserve phrases per skill family (cso: OWASP/STRIDE; plan-ceo: SCOPE EXPANSION / HOLD SCOPE; ship: VERSION/CHANGELOG/PR) so future compression can't silently strip load-bearing prose.
- `test/skill-coverage-matrix.ts` + `test/skill-coverage-matrix.test.ts` — single source of truth mapping each skill to gate + periodic tests; CI gate asserts every skill has at least one gate-tier entry. 51 skills, 51 entries.
- `test/skill-coverage-floor.test.ts` — per-skill structural-compliance smoke test (file-IO, free). Verifies frontmatter shape, generated header, body non-trivial, no leaked `{{TEMPLATE}}` placeholders, catalog-trim contract on description. 309 assertions across 51 skills.
- `test/skill-size-budget.test.ts` — per-skill SKILL.md byte budget (×1.05 default ratio), total corpus budget, catalog token budget (≤7000 for v1.46). Caught regressions get a per-skill breakdown + override path.
- `test/cso-preserved.test.ts` — pins cso's must-not-strip security guidance phrases (OWASP, STRIDE, daily/comprehensive mode discipline, confidence scoring, active verification). Future compression that hits cso fails CI here.
- `test/helpers/budget-override.ts` — audit-trail logger for `GSTACK_SIZE_BUDGET_OVERRIDE_REASON` and `EVALS_BUDGET_OVERRIDE_REASON`. Append-only JSONL at `~/.gstack/analytics/spend-overrides.jsonl` with timestamp + scope + reason + CI provenance.
- `scripts/proactive-suggestions.json` — per-run registry of routing prose + voice triggers extracted from skill frontmatter during catalog trim. Agents pull on demand instead of paying for it always-loaded.
- `--catalog-mode=full` build flag — restores v1.44 legacy multi-line catalog descriptions. Use when debugging routing regressions or when shipping skills to hosts that depend on the legacy fat catalog.
- `--explain-level=terse` build flag — opt-in compression of `## Writing Style` + `## Completeness Principle` + `## Confusion Protocol` + `## Context Health` preamble sections. Default build keeps the runtime-conditional behavior intact (the model still skips when `EXPLAIN_LEVEL: terse` appears in the preamble echo); terse build makes the compression structural.
- `EVALS_BUDGET_HARD_CAP` environment variable (umbrella $30 default) + per-suite `EVALS_BUDGET_HARD_CAP_GATE=$25`, `EVALS_BUDGET_HARD_CAP_PERIODIC=$70`. Build fails if a single run exceeds; `EVALS_BUDGET_OVERRIDE_REASON` env unblocks + audit-logs.
**Changed**
- Skill frontmatter `description:` blocks across 51 skills trimmed to a single lead sentence + `(gstack)` tag. Routing prose ("Use when asked to...", "Proactively suggest...") and voice triggers moved to a `## When to invoke` body section in each SKILL.md. Always-loaded catalog cost drops ~56%.
- Jargon list (`scripts/jargon-list.json`, 80 terms) no longer inlined into every tier-2+ skill. `## Writing Style` now references the JSON path; agents Read it once per session on first jargon term encountered. Saves ~70 KB of duplicated text across the corpus.
- `ResolverEntry` union type in `scripts/resolvers/types.ts` + `unwrapResolver` helper. Resolvers can now be either bare functions (current behavior) or `{ resolve, appliesTo? }` gated entries. `scripts/gen-skill-docs.ts:444` checks the gate before invocation. Infrastructure for future per-skill resolver gating; all current resolvers stay bare functions and work unchanged.
- `TemplateContext` gains an optional `explainLevel: 'default' | 'terse'` field threaded from the `--explain-level` build flag.
**Fixed**
- Catalog descriptions no longer collide with adjacent YAML fields (initial implementation produced `description: ... (gstack)allowed-tools:` with no newline; fixed by appending `\n` to the replacement).
**For contributors**
- New skills require an entry in `test/skill-coverage-matrix.ts` — at minimum referencing `test/skill-coverage-floor.test.ts` in `gate[]`. The CI gate at `test/skill-coverage-matrix.test.ts` fails fast on missing entries.
- New must-preserve invariants for a skill family go in `PARITY_INVARIANTS` in `test/helpers/parity-harness.ts`. Adding invariants is additive; removing one is a deliberate scope decision.
- The `scripts/jargon-list.json` is the canonical glossary. Add terms there; gen-skill-docs picks them up automatically on next regen.
- `test/fixtures/parity-baseline-v1.44.1.json` is the locked v1→v2 reference. Do not modify; capture new snapshots at later tags via `bun run scripts/capture-baseline.ts --tag <name>`.
## [1.45.0.0] - 2026-05-25
## **Design boards now live 24 hours, not 10 minutes. One daemon hosts every board, one tab survives the whole day.**
Run `$D compare --serve` and you get a persistent design daemon at `.gstack/design.json` instead of a fresh process per call. Open three design sessions across an afternoon and they all land at `/boards/<id>/` on the same port. The browser tab you opened first still works for the board you published an hour later. The idle timeout went from 10 minutes (the old per-process server) to 24 hours of inactivity (the daemon's lifetime). Submit a board, the URL stays accessible until the daemon idles out, so you can scroll back through the day's design history at `http://127.0.0.1:N/`.
Skill invocations (`/design-shotgun`, `/design-consultation`, `/plan-design-review`, `/design-review`, `/office-hours`) keep calling `$D compare --serve` exactly the same way. The CLI shape is unchanged. What's different is the binary now self-execs into daemon mode under the hood, attaches to a running daemon if one is there, spawns a fresh one if not, and prints `BOARD_PUBLISHED: http://127.0.0.1:N/boards/<id>/` to stderr so the skill can echo the URL. The legacy `--no-daemon` flag preserves the old single-process behavior for tests and debugging.
### The numbers that matter
Source: `bun test design/test/` and `git diff origin/main...HEAD --stat`.
| Metric | Before | After | Δ |
|-----------------------------------------|---------------|---------------|----------------|
| Idle timeout per board | 10 minutes | 24 hours | 144× |
| Server processes for N boards | N | 1 | N× |
| Browser tabs to keep open | one per board | one total | N× |
| Design tests in repo | 16 | 77 | +61 |
| Test paths covered (failure modes) | not enumerated| 38 / 100% | full coverage |
| Plan-review findings absorbed pre-impl | 2 | 19 | 17× from Codex |
| Component | New lines | Test lines |
|----------------------------|-----------|------------|
| design/src/daemon.ts | ~580 | 34 tests |
| design/src/daemon-client.ts| ~340 | 23 tests |
| design/src/daemon-state.ts | ~180 | (via client + daemon tests; direct stale-lock reclaim coverage) |
| Browser round-trip via HTTP| (existed) | 4 tests |
The compression: 61 new tests cover every endpoint, lifecycle path, LRU eviction, real idle-shutdown behavior (spawn-based, daemon process observed exiting after `IDLE_MS`), the bare-GET-doesn't-reset-idle invariant (poll loop in background, daemon still idles out), the idle-with-active-boards extension path with `MAX_EXTENSIONS` hard ceiling, concurrent-CLIs lock race (two parallel `ensureDaemon` calls converge on one daemon), identity-verified spawn, version mismatch with and without active boards, PID-reuse safety, path traversal rejection, malformed-body negatives on every POST, and cross-board feedback isolation. The plan-review pass caught 2 architectural issues in-house; an outside Codex pass caught 17 more, all absorbed into the implementation before any code was written; the /ship review army caught 1 backwards-compat break in skill resolvers (fixed) + 5 deferred test gaps (filled). The version-mismatch path now refuses to silently kill a daemon with active boards (it prints a warning and exits 1), so upgrading gstack mid-design-session doesn't drop your in-memory board history.
### What this means for the builder
Open `/design-shotgun` Monday morning, work through three rounds of variants, walk away for lunch, come back, click Submit. The board is still there. Open a second `/design-shotgun` for a different feature in the afternoon, get a new URL at `/boards/<another-id>/`, no port churn, your morning board still works. The whole day's worth of design exploration accumulates as a browsable history at the daemon's root. Stop worrying about the 10-minute death clock.
### Itemized changes
#### Added
- **Persistent design daemon** (`design/src/daemon.ts`). Bun HTTP server on `127.0.0.1` hosting many boards under `/boards/<id>/`. Per-board state machine (`serving | regenerating | done`), LRU cap of 50 boards (evicts `done` first, returns 503 when 50 non-done coexist), 24h idle timeout with 1h extensions up to a 28h ceiling when boards are still active, per-board async mutex serializing feedback POST vs reload POST. Index page at `/` lists recent boards newest first.
- **`$D daemon status`** and **`$D daemon stop [--force]`**. The stop sub-command refuses without `--force` when active boards exist, so a casual stop doesn't drop in-flight history.
- **Daemon client** (`design/src/daemon-client.ts`). `ensureDaemon()` handles spawn-or-attach with file-lock-protected spawn (re-reads state inside the lock to close the two-CLIs-race window) and identity-verified SIGTERM (reads `/proc/PID/cmdline` on Linux, `ps -p PID -o command=` on macOS, only signals if `gstack-design-daemon` is in the cmdline). PID-reuse safety: if the state file points at a PID belonging to an unrelated process, no signal is sent and a fresh daemon spawns. Version-mismatch refusal: if a CLI from a newer gstack version arrives while boards are still open in an older daemon, the CLI prints a user-actionable warning and exits 1 instead of silently restarting and losing history.
- **Shared daemon state utilities** (`design/src/daemon-state.ts`). Atomic state-file write (`<tmp>` + `renameSync` at mode `0o600`), `fs.openSync('wx')` exclusive lock, cross-platform cmdline reader, version lookup that falls back through `DESIGN_DAEMON_VERSION` env → `design/dist/.version` baked at build time → source-tree `VERSION``"unknown"`.
- **End-to-end round-trip tests against a real spawned daemon** (`design/test/feedback-roundtrip-daemon.test.ts`). HTTP fetch drives publish → submit → regenerate → reload → round-2 submit, asserting `feedback.json` lands at the daemon-derived `sourceDir` with `boardId` and `publishedAt` augmented fields.
#### Changed
- **Board JS uses relative URLs** instead of an injected `__GSTACK_SERVER_URL` global. The same generated HTML works at `/` (legacy `--no-daemon`) and `/boards/<id>/` (daemon). `location.protocol` feature-detect keeps the `file://` DOM-only fallback path working.
- **Bare `GET /boards/<id>` returns 301** to `/boards/<id>/`. The trailing slash is load-bearing for relative-URL resolution in the board JS; without it, `fetch('./api/feedback')` would resolve to the wrong scope.
- **Reload guard rejects directory paths**. `design/src/serve.ts:200-212` previously let `resolvedReload === allowedDir` through, which then crashed `readFileSync` with `EISDIR`. Now requires `statSync(resolvedReload).isFile()` with a clear 400 instead.
- **Feedback files carry `boardId` and `publishedAt`** so agents polling `feedback.json` / `feedback-pending.json` in a multi-board world can verify which board produced what.
- **`sourceDir` is derived from `realpath(html)` server-side**, never trusted from the publish POST body.
- **Skill resolvers and templates** (`scripts/resolvers/design.ts`, `design-shotgun/SKILL.md`, `design-consultation/SKILL.md`, `plan-design-review/SKILL.md`, `office-hours/SKILL.md`) updated to parse `BOARD_URL:` from stderr and POST reloads to `${BOARD_URL}api/reload` instead of the legacy port-only `/api/reload`. Legacy `SERVE_STARTED: port=N html=...` line still emitted for back-compat.
#### Fixed
- **Compiled design binary self-execs as the daemon** via a `--daemon-mode` flag, so the daemon lifecycle works for users installing from `design/dist/design` (not just `bun run` against the source tree).
- **Version lookup** is consistent between client and daemon. Both go through `readVersionString()`, so the version-mismatch refusal path works on the compiled binary instead of always reading `"unknown"` and matching itself.
#### For contributors
- **Test infrastructure split**: `design/test/daemon.test.ts` (30 in-process tests against the exported `fetchHandler`, ~70ms) for fast iteration; `design/test/daemon-discovery.test.ts` (17 real-spawn tests, ~8s) for lifecycle + lock + identity guarantees. Shared helpers in `design/test/daemon-tests-fixtures.ts`.
- **Plan-review process**: this branch ran `/plan-eng-review` twice. Round 1 caught 2 architecture findings. An outside-voice Codex pass after round 1 found 17 more (URL contract self-contradiction, false test-green claim, lock semantics, identity verification, version-mismatch silent data loss, several others). Round 2 absorbed all 17 before implementation started. The full review trail is preserved in the plan file's `## GSTACK REVIEW REPORT` section.
## [1.44.1.0] - 2026-05-24
## **Nine community fixes ship in one bundle.** Office-hours session counter works again, iOS QA tunnels survive macOS 26.x, Windows brain-sync stops dropping artifacts, browse server tells you whether the bind failure was a port collision or a sandbox block.
The fix wave pattern runs its second pass after v1.43.2.0's 15-PR Daegu wave. Nine contributor PRs land in eleven commits plus a merge from new main. Each cherry-pick routes through `git cherry-pick` per-commit so contributor authorship survives in `git log --author`, with `Co-Authored-By` trailers for GitHub's contribution UI. Wave-meta files (VERSION, CHANGELOG, version-only `package.json` bumps) stripped per cherry-pick so the wave owns its own bump cleanly.
The triage caught a real failure mode mid-flight. An initial scope of 18 PRs went through Codex review as outside voice; Codex flagged that 9 of the 18 had already shipped via v1.43.2.0 or sibling commits. Verified against current main (`bin/gstack-gbrain-sync.ts:404` already wraps `{sources:[...]}`, `browser-manager.ts:30` already has `isCustomChromium`, `server.ts:209` already has `ownsTerminalAgent`). Recompute trimmed the wave from 18 to 9, saving nine empty cherry-picks and nine misleading "landed in" close comments to contributors whose work had already merged via another route.
### The numbers that matter
Source: `git log origin/main..HEAD` and `gh pr view --json closingIssuesReferences` per wave PR.
| Metric | Value |
|----------------------------------------------|------------|
| Community PRs landed | 9 |
| Distinct contributors credited | 9 |
| Issues auto-closed by merge | 4 |
| Files changed | 26 |
| Lines added | 1,651 |
| Lines removed | 114 |
| Wave commits (excluding merge) | 11 |
| Already-shipped PRs caught + politely closed | 9 |
| Paid eval suites that ran (all PASS) | 6 |
### What this means for contributors
Your fix lands as a commit with your name in `git log --author=<your-handle>`. If your PR had multiple commits, each lands separately so dates and trailers survive. If your fix was the same as something that shipped via another route in v1.43.2.0, you get a close comment pointing at the CHANGELOG line that credits you by name. The recompute step that catches duplicates is now part of every future fix wave.
### Itemized changes
**Added**
- `/investigate` freeze hook resolves on standalone marketplace installs. Falls back through both bundled and standalone freeze-bin paths instead of crashing on a hardcoded `../freeze/` lookup. Closes #1647. Contributed by @Gujiassh via PR #1648.
- `gstack-next-version --version-path` flag plus `.gstack/version-path` config: monorepo VERSION layouts now work. Contributed by @cfeddersen via PR #1627.
**Fixed**
- `/office-hours` SESSION_COUNT stuck at 0 since v1.0. Writer wrote to legacy `builder-profile.jsonl`, reader read from new `developer-profile.json`. Reader-path auto-migrates existing legacy data on first call; existing users keep their session history. 33 regression tests plus a static-grep invariant pinning the no-legacy-writes contract. Closes #1671, #1677. Contributed by @pryow via PR #1676.
- `gstack-timeline-read --branch "feature/o'hare"` no longer breaks on single-quoted branch names. Filters passed as data, not interpolated into a shell command. Closes #1634. Contributed by @jbetala7 via PR #1635.
- `browse` server localhost bind: distinguishes `EADDRINUSE` (real port collision) from sandbox `EPERM` (Codex/Conductor shell sandbox blocking the bind syscall). Tells the user which one happened. Contributed by @spacegeologist via PR #1664.
- `v1.40.0.0` migration on jq-less machines: defers done-marker until every repair succeeds, instead of writing it unconditionally. Re-runs the migration on next upgrade for users who hit the pre-fix path. 8-case regression test. Closes #1581. Contributed by @stedfn via PR #1589.
- Three Windows brain-sync bugs: backslash vs forward-slash globs, bash-shebang subprocess fail on `cmd.exe`, CRLF on stdout breaking `git add`. Static-invariant tests added to `windows-free-tests.yml`. Contributed by @daveowenatl via PR #1672.
- `gstack-diff-scope` detects `bun.lock` (Bun v1.2+ text lockfile) alongside `bun.lockb`. Without this, eval-select skipped lockfile changes on Bun 1.2+. Contributed by @hiSandog via PR #1649.
- iOS QA on macOS 26.x: `coredevice.local` resolution falls through `xcrun devicectl``dns.lookup``dns.resolve6` so the tunnel comes up even when mDNSResponder is bypassed. Tunnel keepalive added so long-running QA sessions survive. Contributed by @sternryan via PR #1673.
## [1.44.0.0] - 2026-05-23
## **Sidebar Claude Code now survives the day.** WebSocket keepalive, transparent re-attach across network blips with scrollback intact, and a restart button that actually kills the old claude before spawning the new one. Outer supervisor opt-in so the browse server itself can crash and recover without you noticing.

View File

@ -2,11 +2,7 @@
name: gstack
preamble-tier: 1
version: 1.1.0
description: |
Fast headless browser for QA testing and site dogfooding. Navigate pages, interact with
elements, verify state, diff before/after, take annotated screenshots, test responsive
layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or
test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots. (gstack)
description: Fast headless browser for QA testing and site dogfooding. (gstack)
allowed-tools:
- Bash
- Read
@ -21,6 +17,14 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Navigate pages, interact with
elements, verify state, diff before/after, take annotated screenshots, test responsive
layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or
test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots.
## Preamble (run first)
```bash

View File

@ -1,5 +1,53 @@
# TODOS
## design daemon: follow-ups (filed v1.45.0.0 via /ship review army)
### ✅ DONE (v1.45.0.0): Tighten daemon test coverage
**Resolved in commit `6b037c55` (same PR):** All 5 test gaps filled before
landing. Per-file totals after: serve 16, daemon 34, daemon-discovery 23,
feedback-roundtrip-daemon 4 = 77 (+10 from initial ship). Specifically:
- Idle-shutdown actually fires (spawn-based, daemon process observed exiting,
state file removed).
- Bare GET polling doesn't reset idle (hammers `/api/progress` in background,
daemon still idles out).
- Idle-with-active-boards extends, then force-shuts after MAX_EXTENSIONS
(with `DESIGN_DAEMON_EXTENSION_MS=1500` + `MAX_EXTENSIONS=2`).
- Concurrent `ensureDaemon()` race converges on one daemon (lock wins).
- Stale-lock reclaim (dead PID succeeds, alive unrelated PID refuses).
- Malformed-JSON + non-object + array-body + missing-html negatives for
`POST /api/boards` and `POST /boards/<id>/api/reload`.
### P3: Minor maintainability nits from /ship review
- `design/src/cli.ts` and `design/src/serve.ts` both have a small `openBrowser`
helper with identical darwin/linux/else branches. Extract a shared
`design/src/open-browser.ts`.
- `design/src/daemon-client.ts:320` (`AbortSignal.timeout(2000)`) and `:357`
(`delay(50)`) use bare numeric literals while sibling timeouts are named
constants. Promote to `SHUTDOWN_POST_TIMEOUT_MS` and `ALIVE_POLL_INTERVAL_MS`.
- `design/src/daemon-state.ts:21` `serverPath` field is written
(`daemon.ts:541`) but never read by production code. Either remove or
document the forensic intent.
### P3: Daemon scope deferred from v1.45.0.0 plan
Originally listed in the plan's "TODOs surfaced for later" section:
- Per-daemon scoped auth tokens (only relevant once a tunnel/share use case appears).
- Optional persistent board history on disk in
`~/.gstack/projects/$SLUG/designs/history/` so submitted boards survive
daemon restarts.
- Windows spawn branch lifted from browse (V1 daemon is macOS + Linux;
Windows users fall back to legacy `--no-daemon` per-process server).
- `$D board list` / `$D board stop <id>` per-board ops CLI (V1 has only
`$D daemon status` / `stop`).
- Cross-worktree daemon attach (conductor sibling worktrees of the same
repo currently each spawn their own daemon — matches browse; revisit
if it causes friction).
---
## browse server: terminal-agent teardown follow-ups (filed v1.41 via /plan-eng-review)
### ✅ DONE (v1.44.0.0): Identity-based terminal-agent kill (replace pkill regex with PID)

View File

@ -1 +1 @@
1.45.0.0
1.47.0.0

View File

@ -2,16 +2,7 @@
name: autoplan
preamble-tier: 3
version: 1.0.0
description: |
Auto-review pipeline — reads the full CEO, design, eng, and DX review skills from disk
and runs them sequentially with auto-decisions using 6 decision principles. Surfaces
taste decisions (close approaches, borderline scope, codex disagreements) at a final
approval gate. One command, fully reviewed plan out.
Use when asked to "auto review", "autoplan", "run all reviews", "review this plan
automatically", or "make the decisions for me".
Proactively suggest when the user has a plan file and wants to run the full review
gauntlet without answering 15-30 intermediate questions. (gstack)
Voice triggers (speech-to-text aliases): "auto plan", "automatic review".
description: Auto-review pipeline — reads the full CEO, design, eng, and DX review skills from disk and runs them sequentially with auto-decisions using 6 decision principles. (gstack)
benefits-from: [office-hours]
triggers:
- run all reviews
@ -30,6 +21,19 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Surfaces
taste decisions (close approaches, borderline scope, codex disagreements) at a final
approval gate. One command, fully reviewed plan out.
Use when asked to "auto review", "autoplan", "run all reviews", "review this plan
automatically", or "make the decisions for me".
Proactively suggest when the user has a plan file and wants to run the full review
gauntlet without answering 15-30 intermediate questions.
Voice triggers (speech-to-text aliases): "auto plan", "automatic review".
## Preamble (run first)
```bash
@ -570,84 +574,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: benchmark-models
preamble-tier: 1
version: 1.0.0
description: |
Cross-model benchmark for gstack skills. Runs the same prompt through Claude,
GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,
and optionally quality via LLM judge. Answers "which model is actually best
for this skill?" with data instead of vibes. Separate from /benchmark, which
measures web page performance. Use when: "benchmark models", "compare models",
"which model is best for X", "cross-model comparison", "model shootout". (gstack)
Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
description: Cross-model benchmark for gstack skills. (gstack)
triggers:
- cross model benchmark
- compare claude gpt gemini
@ -23,6 +16,18 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Runs the same prompt through Claude,
GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,
and optionally quality via LLM judge. Answers "which model is actually best
for this skill?" with data instead of vibes. Separate from /benchmark, which
measures web page performance. Use when: "benchmark models", "compare models",
"which model is best for X", "cross-model comparison", "model shootout".
Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
## Preamble (run first)
```bash

View File

@ -2,13 +2,7 @@
name: benchmark
preamble-tier: 1
version: 1.0.0
description: |
Performance regression detection using the browse daemon. Establishes
baselines for page load times, Core Web Vitals, and resource sizes.
Compares before/after on every PR. Tracks performance trends over time.
Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
"bundle size", "load time". (gstack)
Voice triggers (speech-to-text aliases): "speed test", "check performance".
description: Performance regression detection using the browse daemon. (gstack)
triggers:
- performance benchmark
- check page speed
@ -23,6 +17,17 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Establishes
baselines for page load times, Core Web Vitals, and resource sizes.
Compares before/after on every PR. Tracks performance trends over time.
Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
"bundle size", "load time".
Voice triggers (speech-to-text aliases): "speed test", "check performance".
## Preamble (run first)
```bash

View File

@ -136,7 +136,11 @@ def load_privacy_map(path):
allowlist_globs = load_lines(allowlist_path)
privacy_map = load_privacy_map(privacy_path)
skip_lines = set(load_lines(skip_path))
# Normalize skip entries to the POSIX form queued paths use, so a backslash
# entry in .brain-skip.txt still matches on Windows. The drain is the safety
# boundary that actually stages files, so it must normalize identically to
# discover_new — otherwise an explicitly-skipped file gets committed.
skip_lines = {s.replace(os.sep, "/") for s in load_lines(skip_path)}
# Read queue; collect unique file paths.
queue_paths = set()
@ -253,6 +257,8 @@ subcmd_once() {
# Stage with git add -f (forces past .gitignore=*) explicit paths only.
while IFS= read -r p; do
p="${p%$'\r'}" # Windows: compute_paths_to_stage's python print() emits CRLF;
# a trailing CR makes the pathspec match nothing (silent no-stage).
[ -z "$p" ] && continue
git -C "$GSTACK_HOME" add -f -- "$p" 2>/dev/null || true
done < "$paths_file"
@ -376,10 +382,13 @@ subcmd_discover_new() {
exit 0
fi
# Walk allowlist globs; enqueue any file where mtime+size differs from cursor.
python3 - "$GSTACK_HOME" "$ALLOWLIST" "$DISCOVER_CURSOR" "$SCRIPT_DIR/gstack-brain-enqueue" <<'PYEOF' 2>/dev/null || true
import sys, os, json, glob, fnmatch, subprocess, hashlib
python3 - "$GSTACK_HOME" "$ALLOWLIST" "$DISCOVER_CURSOR" <<'PYEOF' 2>/dev/null || true
import sys, os, json, fnmatch
from datetime import datetime, timezone
gstack_home, allowlist_path, cursor_path, enqueue_bin = sys.argv[1:5]
gstack_home, allowlist_path, cursor_path = sys.argv[1:4]
queue_path = os.path.join(gstack_home, ".brain-queue.jsonl")
skip_path = os.path.join(gstack_home, ".brain-skip.txt")
def load_lines(path):
try:
@ -403,8 +412,12 @@ def save_cursor(path, data):
pass
allowlist = load_lines(allowlist_path)
# Normalize skip entries to the same POSIX form as `rel` below, so a
# backslash entry in .brain-skip.txt still matches a normalized path on Windows.
skip = {s.replace(os.sep, "/") for s in load_lines(skip_path)}
cursor = load_cursor(cursor_path)
new_cursor = dict(cursor)
to_enqueue = []
# Walk all files under gstack_home, match against allowlist.
for root, dirs, files in os.walk(gstack_home):
@ -413,22 +426,54 @@ for root, dirs, files in os.walk(gstack_home):
continue
for name in files:
full = os.path.join(root, name)
rel = os.path.relpath(full, gstack_home)
# Repo paths are POSIX-relative. os.path.relpath yields backslash
# separators on Windows, which never match the forward-slash allowlist
# globs (e.g. "projects/*/learnings.jsonl"), so discovery silently
# enqueued nothing under projects/ on Windows. Normalize to "/".
rel = os.path.relpath(full, gstack_home).replace(os.sep, "/")
if rel.startswith(".brain-"):
continue
matched = any(fnmatch.fnmatchcase(rel, pat) for pat in allowlist)
if not matched:
if not any(fnmatch.fnmatchcase(rel, pat) for pat in allowlist):
continue
if rel in skip:
continue
try:
st = os.stat(full)
key = f"{int(st.st_mtime)}:{st.st_size}"
except OSError:
continue
prev = cursor.get(rel)
if prev != key:
# Enqueue via the shim (respects sync mode + skip list).
subprocess.run([enqueue_bin, rel], check=False)
new_cursor[rel] = key
if cursor.get(rel) != key:
to_enqueue.append((rel, key))
# Append to the queue directly. The previous implementation shelled out to
# gstack-brain-enqueue once per file, but Windows Python cannot exec a
# bash-shebang script (the spawn fails with a fork error), so discovery
# enqueued nothing on Windows even after the path-match fix above.
# Writing the queue line here is platform-agnostic; the drain step
# (compute_paths_to_stage) still re-applies the skip-list + privacy filters.
if to_enqueue:
ts = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
try:
# One atomic append per record (O_APPEND, each line < PIPE_BUF), matching
# gstack-brain-enqueue's concurrency contract so a writer-shim append
# running in parallel can't interleave mid-record. Buffered text writes
# don't guarantee that. Compact separators match the shim's JSON shape.
fd = os.open(queue_path, os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o644)
try:
for rel, key in to_enqueue:
rec = json.dumps({"file": rel, "ts": ts}, separators=(",", ":"))
os.write(fd, (rec + "\n").encode("utf-8"))
finally:
os.close(fd)
except OSError:
# Queue write failed (disk full, AV file lock). Leave the cursor
# unadvanced so these files are retried on the next discover instead of
# being silently recorded as synced (which loses the change until the
# file next changes).
to_enqueue = []
# Advance the cursor only for records actually written.
for rel, key in to_enqueue:
new_cursor[rel] = key
save_cursor(cursor_path, new_cursor)
PYEOF

View File

@ -17,6 +17,9 @@
# --check-mismatch detect meaningful gaps between declared and observed.
# --migrate migrate builder-profile.jsonl → developer-profile.json.
# Idempotent; archives the source file on success.
# --log-session append a session entry (from /office-hours) to
# sessions[] and update aggregates. Required fields:
# date, mode. Silent skip on invalid input.
#
# Profile file: ~/.gstack/developer-profile.json (unified schema — see
# docs/designs/PLAN_TUNING_V0.md). Event file: ~/.gstack/projects/{SLUG}/
@ -154,6 +157,65 @@ ensure_profile() {
EOF
}
# -----------------------------------------------------------------------
# Record session: append a session entry from /office-hours to sessions[]
# and update aggregates (signals_accumulated, resources_shown, topics).
# Fix for #1671: the writer side of the v1.0.0.0 migration. Reader and
# writer now share the same file.
# Silent skip on invalid input (matches gstack-timeline-log:22-26 pattern).
# -----------------------------------------------------------------------
do_log_session() {
local INPUT="${1:-}"
if [ -z "$INPUT" ]; then
return 0
fi
# Validate: input must be parseable JSON with required fields (date, mode).
if ! printf '%s' "$INPUT" | bun -e "
const j = JSON.parse(await Bun.stdin.text());
if (!j.date || !j.mode) process.exit(1);
" 2>/dev/null; then
return 0
fi
ensure_profile
local TMPOUT
TMPOUT=$(mktemp "$GSTACK_HOME/developer-profile.json.XXXXXX.tmp")
trap 'rm -f "$TMPOUT"' EXIT
PROFILE_FILE_PATH="$PROFILE_FILE" RECORD_INPUT="$INPUT" TMPOUT_PATH="$TMPOUT" bun -e "
const fs = require('fs');
const entry = JSON.parse(process.env.RECORD_INPUT);
if (!entry.ts) entry.ts = new Date().toISOString();
const profile = JSON.parse(fs.readFileSync(process.env.PROFILE_FILE_PATH, 'utf-8'));
profile.sessions = profile.sessions || [];
profile.sessions.push(entry);
profile.signals_accumulated = profile.signals_accumulated || {};
for (const s of (entry.signals || [])) {
profile.signals_accumulated[s] = (profile.signals_accumulated[s] || 0) + 1;
}
profile.resources_shown = profile.resources_shown || [];
const resSet = new Set(profile.resources_shown);
for (const r of (entry.resources_shown || [])) resSet.add(r);
profile.resources_shown = Array.from(resSet);
profile.topics = profile.topics || [];
const topicSet = new Set(profile.topics);
for (const t of (entry.topics || [])) topicSet.add(t);
profile.topics = Array.from(topicSet);
fs.writeFileSync(process.env.TMPOUT_PATH, JSON.stringify(profile, null, 2));
"
mv "$TMPOUT" "$PROFILE_FILE"
trap - EXIT
"$SCRIPT_DIR/gstack-brain-enqueue" "developer-profile.json" 2>/dev/null &
}
# -----------------------------------------------------------------------
# Read: emit legacy KEY: VALUE output for /office-hours compat.
# -----------------------------------------------------------------------
@ -168,14 +230,19 @@ do_read() {
else if (count >= 4) tier = 'regular';
else if (count >= 1) tier = 'welcome_back';
const last = sessions[count - 1] || {};
const prev = sessions[count - 2] || {};
// LAST_* / CROSS_PROJECT must reflect real sessions, not resource-tracking
// events (the Phase 6 auto-append). Without this filter, a session's
// resources entry written immediately after the real session would clobber
// LAST_PROJECT/LAST_ASSIGNMENT/LAST_DESIGN_TITLE.
const realSessions = sessions.filter(e => e.mode !== 'resources');
const last = realSessions[realSessions.length - 1] || {};
const prev = realSessions[realSessions.length - 2] || {};
const crossProject = prev.project_slug && last.project_slug
? prev.project_slug !== last.project_slug
: false;
const designs = sessions.map(e => e.design_doc || '').filter(Boolean);
const designTitles = sessions
const designs = realSessions.map(e => e.design_doc || '').filter(Boolean);
const designTitles = realSessions
.map(e => (e.design_doc ? (e.project_slug || 'unknown') : ''))
.filter(Boolean);
@ -441,6 +508,7 @@ case "$CMD" in
--vibe) do_vibe ;;
--check-mismatch) do_check_mismatch ;;
--migrate) do_migrate ;;
--log-session) do_log_session "$@" ;;
--help|-h) sed -n '1,/^set -euo/p' "$0" | sed 's|^# \?||' ;;
*)
echo "gstack-developer-profile: unknown subcommand '$CMD'" >&2

View File

@ -57,7 +57,7 @@ while IFS= read -r f; do
*.md) DOCS=true ;;
# Config
package.json|package-lock.json|yarn.lock|bun.lockb) CONFIG=true ;;
package.json|package-lock.json|yarn.lock|bun.lock|bun.lockb) CONFIG=true ;;
Gemfile|Gemfile.lock) CONFIG=true ;;
*.yml|*.yaml) CONFIG=true ;;
.github/*) CONFIG=true ;;

View File

@ -10,7 +10,14 @@
//
// Usage:
// gstack-next-version --base <branch> --bump <major|minor|patch|micro> \
// --current-version <X.Y.Z.W> [--workspace-root <path>|null] [--json]
// --current-version <X.Y.Z.W> [--workspace-root <path>|null] \
// [--version-path <path>] [--json]
//
// VERSION path resolution (monorepo support):
// 1. --version-path <path> CLI flag (highest priority)
// 2. .gstack/version-path file at the repo root (single-line relative path,
// committed so all collaborators benefit)
// 3. "VERSION" at the repo root (default, backward-compatible)
//
// Exit codes:
// 0 — emitted JSON successfully (may include "offline":true or "host":"unknown")
@ -45,6 +52,7 @@ type Output = {
version: string;
current_version: string;
base_version: string;
version_path: string;
bump: Bump;
host: "github" | "gitlab" | "unknown";
offline: boolean;
@ -114,6 +122,28 @@ function runCommand(cmd: string, args: string[], timeoutMs = 15000): { ok: boole
};
}
// VERSION-path resolution for monorepos. Priority: CLI flag > .gstack/version-path
// at repo root > "VERSION". Pure function; takes the repo root as an argument so
// tests can drive it with a fixture dir without mocking git.
function resolveVersionPath(override: string | undefined, repoRoot: string): string {
if (override) return override.trim();
const configFile = join(repoRoot, ".gstack", "version-path");
if (existsSync(configFile)) {
try {
const firstLine = readFileSync(configFile, "utf8").split("\n")[0]?.trim() ?? "";
if (firstLine) return firstLine;
} catch {
// fall through to default
}
}
return "VERSION";
}
function repoToplevel(): string {
const r = runCommand("git", ["rev-parse", "--show-toplevel"]);
return r.ok ? r.stdout.trim() : process.cwd();
}
function detectHost(): "github" | "gitlab" | "unknown" {
const remote = runCommand("git", ["remote", "get-url", "origin"]);
if (remote.ok) {
@ -128,19 +158,19 @@ function detectHost(): "github" | "gitlab" | "unknown" {
return "unknown";
}
function readBaseVersion(base: string, warnings: string[]): string {
function readBaseVersion(base: string, versionPath: string, warnings: string[]): string {
// git fetch is best-effort; we tolerate failure and fall back to whatever
// origin/<base> currently points at.
runCommand("git", ["fetch", "origin", base, "--quiet"], 10000);
const r = runCommand("git", ["show", `origin/${base}:VERSION`]);
const r = runCommand("git", ["show", `origin/${base}:${versionPath}`]);
if (!r.ok) {
warnings.push(`could not read VERSION at origin/${base}; assuming 0.0.0.0`);
warnings.push(`could not read ${versionPath} at origin/${base}; assuming 0.0.0.0`);
return "0.0.0.0";
}
return r.stdout.trim();
}
async function fetchGithubClaimed(base: string, excludePR: number | null, warnings: string[]): Promise<{ claimed: ClaimedPR[]; offline: boolean }> {
async function fetchGithubClaimed(base: string, versionPath: string, excludePR: number | null, warnings: string[]): Promise<{ claimed: ClaimedPR[]; offline: boolean }> {
const list = runCommand("gh", [
"pr",
"list",
@ -187,14 +217,18 @@ async function fetchGithubClaimed(base: string, excludePR: number | null, warnin
const pr = queue.shift();
if (!pr) return;
// gh passes branch name via argv, not shell — safe.
// encodeURI handles spaces in subproject paths (e.g. "Tinas Second Brain/...")
// while leaving "/" untouched so the GitHub Contents API gets the path intact.
const content = runCommand("gh", [
"api",
`repos/{owner}/{repo}/contents/VERSION?ref=${encodeURIComponent(pr.headRefName)}`,
`repos/{owner}/{repo}/contents/${encodeURI(versionPath)}?ref=${encodeURIComponent(pr.headRefName)}`,
"-q",
".content",
]);
if (!content.ok) {
warnings.push(`PR #${pr.number}: could not fetch VERSION (fork or private)`);
warnings.push(
`PR #${pr.number}: could not fetch ${versionPath} (fork, private, or wrong path — try --version-path or .gstack/version-path)`,
);
continue;
}
let versionStr: string;
@ -215,7 +249,7 @@ async function fetchGithubClaimed(base: string, excludePR: number | null, warnin
return { claimed: results, offline: false };
}
async function fetchGitlabClaimed(base: string, excludePR: number | null, warnings: string[]): Promise<{ claimed: ClaimedPR[]; offline: boolean }> {
async function fetchGitlabClaimed(base: string, versionPath: string, excludePR: number | null, warnings: string[]): Promise<{ claimed: ClaimedPR[]; offline: boolean }> {
const list = runCommand("glab", [
"mr",
"list",
@ -243,12 +277,15 @@ async function fetchGitlabClaimed(base: string, excludePR: number | null, warnin
}
const results: ClaimedPR[] = [];
for (const mr of mrs) {
// GitLab files API takes the full path URL-encoded (slashes become %2F).
const content = runCommand("glab", [
"api",
`projects/:id/repository/files/VERSION?ref=${encodeURIComponent(mr.source_branch)}`,
`projects/:id/repository/files/${encodeURIComponent(versionPath)}?ref=${encodeURIComponent(mr.source_branch)}`,
]);
if (!content.ok) {
warnings.push(`MR !${mr.iid}: could not fetch VERSION`);
warnings.push(
`MR !${mr.iid}: could not fetch ${versionPath} (wrong path? — try --version-path or .gstack/version-path)`,
);
continue;
}
try {
@ -285,7 +322,7 @@ function currentRepoSlug(): string {
return m ? m[1] : "";
}
function scanSiblings(root: string | null, claimed: ClaimedPR[], warnings: string[]): Sibling[] {
function scanSiblings(root: string | null, versionPath: string, claimed: ClaimedPR[], warnings: string[]): Sibling[] {
if (!root || !existsSync(root)) return [];
const mySlug = currentRepoSlug();
if (!mySlug) {
@ -308,7 +345,7 @@ function scanSiblings(root: string | null, claimed: ClaimedPR[], warnings: strin
continue;
}
if (!existsSync(join(p, ".git")) && !existsSync(join(p, ".git/HEAD"))) continue;
const versionFile = join(p, "VERSION");
const versionFile = join(p, versionPath);
if (!existsSync(versionFile)) continue;
let version: string;
try {
@ -346,12 +383,13 @@ function markActiveSiblings(siblings: Sibling[], baseVersion: Version): Sibling[
});
}
function parseArgs(argv: string[]): { base: string; bump: Bump; current: string; workspaceRoot?: string; excludePR: number | null; help: boolean } {
function parseArgs(argv: string[]): { base: string; bump: Bump; current: string; workspaceRoot?: string; excludePR: number | null; versionPath?: string; help: boolean } {
let base = "";
let bump: Bump | "" = "";
let current = "";
let workspaceRoot: string | undefined;
let excludePR: number | null = null;
let versionPath: string | undefined;
let help = false;
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
@ -359,6 +397,7 @@ function parseArgs(argv: string[]): { base: string; bump: Bump; current: string;
else if (a === "--bump") bump = (argv[++i] ?? "") as Bump;
else if (a === "--current-version") current = argv[++i] ?? "";
else if (a === "--workspace-root") workspaceRoot = argv[++i];
else if (a === "--version-path") versionPath = argv[++i];
else if (a === "--exclude-pr") {
const n = Number(argv[++i]);
excludePR = Number.isFinite(n) && n > 0 ? n : null;
@ -375,7 +414,7 @@ function parseArgs(argv: string[]): { base: string; bump: Bump; current: string;
console.error(`Error: --bump must be major|minor|patch|micro (got ${bump})`);
process.exit(2);
}
return { base, bump: bump as Bump, current, workspaceRoot, excludePR, help: false };
return { base, bump: bump as Bump, current, workspaceRoot, excludePR, versionPath, help: false };
}
// Auto-detect: if --exclude-pr wasn't passed, check whether the current branch
@ -392,13 +431,14 @@ async function main() {
const args = parseArgs(process.argv.slice(2));
if (args.help) {
console.log(
"Usage: gstack-next-version --base <branch> --bump <level> --current-version <X.Y.Z.W> [--workspace-root <path|null>]",
"Usage: gstack-next-version --base <branch> --bump <level> --current-version <X.Y.Z.W> [--workspace-root <path|null>] [--version-path <path>]",
);
process.exit(0);
}
const warnings: string[] = [];
const host = detectHost();
const baseVersion = args.current || readBaseVersion(args.base, warnings);
const versionPath = resolveVersionPath(args.versionPath, repoToplevel());
const baseVersion = args.current || readBaseVersion(args.base, versionPath, warnings);
const baseParsed = parseVersion(baseVersion);
if (!baseParsed) {
console.error(`Error: could not parse base version '${baseVersion}'`);
@ -413,9 +453,9 @@ async function main() {
let claimed: ClaimedPR[] = [];
let offline = false;
if (host === "github") {
({ claimed, offline } = await fetchGithubClaimed(args.base, excludePR, warnings));
({ claimed, offline } = await fetchGithubClaimed(args.base, versionPath, excludePR, warnings));
} else if (host === "gitlab") {
({ claimed, offline } = await fetchGitlabClaimed(args.base, excludePR, warnings));
({ claimed, offline } = await fetchGitlabClaimed(args.base, versionPath, excludePR, warnings));
} else {
warnings.push("host unknown; queue-awareness unavailable");
}
@ -433,7 +473,7 @@ async function main() {
const { version: picked, reason } = pickNextSlot(baseParsed, claimedVersions, args.bump);
const workspaceRoot = resolveWorkspaceRoot(args.workspaceRoot);
const siblings = markActiveSiblings(scanSiblings(workspaceRoot, claimed, warnings), baseParsed);
const siblings = markActiveSiblings(scanSiblings(workspaceRoot, versionPath, claimed, warnings), baseParsed);
const activeSiblings = siblings.filter((s) => s.is_active);
// If an active sibling outranks our pick, bump past it (same bump level).
@ -453,6 +493,7 @@ async function main() {
version: fmtVersion(finalVersion),
current_version: args.current || baseVersion,
base_version: baseVersion,
version_path: versionPath,
bump: args.bump,
host,
offline,
@ -466,7 +507,7 @@ async function main() {
}
// Pure-function exports for testing
export { parseVersion, fmtVersion, bumpVersion, cmpVersion, pickNextSlot, markActiveSiblings };
export { parseVersion, fmtVersion, bumpVersion, cmpVersion, pickNextSlot, markActiveSiblings, resolveVersionPath };
// Only run main() when invoked as a script, not when imported by tests.
if (import.meta.main) {

View File

@ -29,11 +29,13 @@ if [ ! -f "$TIMELINE_FILE" ]; then
exit 0
fi
cat "$TIMELINE_FILE" 2>/dev/null | bun -e "
cat "$TIMELINE_FILE" 2>/dev/null | GSTACK_TIMELINE_SINCE="$SINCE" GSTACK_TIMELINE_BRANCH="$BRANCH" GSTACK_TIMELINE_LIMIT="$LIMIT" bun -e "
const lines = (await Bun.stdin.text()).trim().split('\n').filter(Boolean);
const since = '${SINCE}';
const branch = '${BRANCH}';
const limit = ${LIMIT};
const since = process.env.GSTACK_TIMELINE_SINCE || '';
const branch = process.env.GSTACK_TIMELINE_BRANCH || '';
const limitRaw = process.env.GSTACK_TIMELINE_LIMIT || '20';
const parsedLimit = Number.parseInt(limitRaw, 10);
const limit = Number.isSafeInteger(parsedLimit) && parsedLimit > 0 ? parsedLimit : 20;
let sinceMs = 0;
if (since) {

View File

@ -2,13 +2,7 @@
name: browse
preamble-tier: 1
version: 1.1.0
description: |
Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
elements, verify page state, diff before/after actions, take annotated screenshots, check
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
site", "take a screenshot", or "dogfood this". (gstack)
description: Fast headless browser for QA testing and site dogfooding. (gstack)
triggers:
- browse a page
- headless browser
@ -22,6 +16,16 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Navigate any URL, interact with
elements, verify page state, diff before/after actions, take annotated screenshots, check
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
site", "take a screenshot", or "dogfood this".
## Preamble (run first)
```bash

View File

@ -650,6 +650,8 @@ export const __testInternals__ = {
idleCheckTick,
setTunnelActive: (v: boolean) => { tunnelActive = v; },
setLastActivity: (t: number) => { lastActivity = t; },
formatExplicitPortUnavailableError,
formatRandomPortUnavailableError,
// Reset the module-level shutdown latch so tests that drive shutdown to
// completion (process.exit-stubbed) can be followed by tests that also
// need shutdown to fire. Without this, the second test's shutdown
@ -752,41 +754,124 @@ let activeBrowserManager: BrowserManager = browserManager;
browserManager.onDisconnect = (code) => activeShutdown?.(code ?? 2);
let isShuttingDown = false;
type PortCheckResult =
| { available: true }
| { available: false; code?: string; message: string };
type FailedPortAttempt = {
port: number;
result: Extract<PortCheckResult, { available: false }>;
};
const RANDOM_PORT_MIN = 10000;
const RANDOM_PORT_MAX = 60000;
const RANDOM_PORT_RETRIES = 5;
function normalizePortError(err: unknown): Extract<PortCheckResult, { available: false }> {
const maybeNodeError = err as NodeJS.ErrnoException | undefined;
return {
available: false,
code: maybeNodeError?.code,
message: maybeNodeError?.message || String(err),
};
}
function isOccupiedPort(result: Extract<PortCheckResult, { available: false }>): boolean {
return result.code === 'EADDRINUSE';
}
function formatPortFailureDetail(attempt: FailedPortAttempt): string {
const { code, message } = attempt.result;
return code ? `${attempt.port} (${code}: ${message})` : `${attempt.port} (${message})`;
}
function formatExplicitPortUnavailableError(
port: number,
result: Extract<PortCheckResult, { available: false }>
): Error {
if (isOccupiedPort(result)) {
return new Error(`[browse] Port ${port} (from BROWSE_PORT env) is in use`);
}
const detail = result.code ? `${result.code}: ${result.message}` : result.message;
return new Error(
`[browse] Cannot bind BROWSE_PORT=${port} on 127.0.0.1 (${detail}). ` +
`This usually means localhost port binding is blocked by the current sandbox or OS permissions, ` +
`not that the port is occupied. Allow localhost binding, or run browse from an unrestricted terminal.`
);
}
function formatRandomPortUnavailableError(attempts: FailedPortAttempt[]): Error {
const blockingAttempts = attempts.filter((attempt) => !isOccupiedPort(attempt.result));
if (blockingAttempts.length > 0) {
const last = blockingAttempts[blockingAttempts.length - 1];
return new Error(
`[browse] Cannot bind localhost ports after ${attempts.length} attempts in range ` +
`${RANDOM_PORT_MIN}-${RANDOM_PORT_MAX}. Last error: ${formatPortFailureDetail(last)}. ` +
`This usually means the current sandbox or OS permissions are blocking localhost port binding, ` +
`not that every sampled port is occupied. Allow localhost binding, set BROWSE_PORT to an approved ` +
`port, or run browse from an unrestricted terminal.`
);
}
return new Error(
`[browse] No available port after ${RANDOM_PORT_RETRIES} attempts in range ` +
`${RANDOM_PORT_MIN}-${RANDOM_PORT_MAX}; every sampled port was already in use`
);
}
// Test if a port is available by binding and immediately releasing.
// Uses net.createServer instead of Bun.serve to avoid a race condition
// in the Node.js polyfill where listen/close are async but the caller
// expects synchronous bind semantics. See: #486
function isPortAvailable(port: number, hostname: string = '127.0.0.1'): Promise<boolean> {
function checkPortAvailable(port: number, hostname: string = '127.0.0.1'): Promise<PortCheckResult> {
return new Promise((resolve) => {
const srv = net.createServer();
srv.once('error', () => resolve(false));
srv.listen(port, hostname, () => {
srv.close(() => resolve(true));
});
let settled = false;
const finish = (result: PortCheckResult) => {
if (settled) return;
settled = true;
resolve(result);
};
srv.once('error', (err) => finish(normalizePortError(err)));
try {
srv.listen(port, hostname, () => {
srv.close(() => finish({ available: true }));
});
} catch (err) {
finish(normalizePortError(err));
}
});
}
function isPortAvailable(port: number, hostname: string = '127.0.0.1'): Promise<boolean> {
return checkPortAvailable(port, hostname).then((result) => result.available);
}
// Find port: explicit BROWSE_PORT, or random in 10000-60000
async function findPort(): Promise<number> {
// Explicit port override (for debugging)
if (BROWSE_PORT) {
if (await isPortAvailable(BROWSE_PORT)) {
const result = await checkPortAvailable(BROWSE_PORT);
if (result.available) {
return BROWSE_PORT;
}
throw new Error(`[browse] Port ${BROWSE_PORT} (from BROWSE_PORT env) is in use`);
throw formatExplicitPortUnavailableError(BROWSE_PORT, result);
}
// Random port with retry
const MIN_PORT = 10000;
const MAX_PORT = 60000;
const MAX_RETRIES = 5;
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
const port = MIN_PORT + Math.floor(Math.random() * (MAX_PORT - MIN_PORT));
if (await isPortAvailable(port)) {
const attempts: FailedPortAttempt[] = [];
for (let attempt = 0; attempt < RANDOM_PORT_RETRIES; attempt++) {
const port = RANDOM_PORT_MIN + Math.floor(Math.random() * (RANDOM_PORT_MAX - RANDOM_PORT_MIN));
const result = await checkPortAvailable(port);
if (result.available) {
return port;
}
attempts.push({ port, result });
}
throw new Error(`[browse] No available port after ${MAX_RETRIES} attempts in range ${MIN_PORT}-${MAX_PORT}`);
throw formatRandomPortUnavailableError(attempts);
}
/**

View File

@ -1,6 +1,7 @@
import { describe, test, expect } from 'bun:test';
import * as net from 'net';
import * as path from 'path';
import { __testInternals__ } from '../src/server';
const polyfillPath = path.resolve(import.meta.dir, '../src/bun-polyfill.cjs');
@ -28,6 +29,47 @@ function getFreePort(): Promise<number> {
}
describe('findPort / isPortAvailable', () => {
test('explicit BROWSE_PORT diagnostic distinguishes bind denial from occupied port', () => {
const blocked = __testInternals__.formatExplicitPortUnavailableError(34567, {
available: false,
code: 'EPERM',
message: 'operation not permitted',
}).message;
expect(blocked).toContain('Cannot bind BROWSE_PORT=34567');
expect(blocked).toContain('localhost port binding is blocked');
expect(blocked).toContain('not that the port is occupied');
const occupied = __testInternals__.formatExplicitPortUnavailableError(34567, {
available: false,
code: 'EADDRINUSE',
message: 'address already in use',
}).message;
expect(occupied).toBe('[browse] Port 34567 (from BROWSE_PORT env) is in use');
});
test('random port diagnostic calls out sandbox-style bind denial', () => {
const message = __testInternals__.formatRandomPortUnavailableError([
{ port: 11001, result: { available: false, code: 'EADDRINUSE', message: 'address already in use' } },
{ port: 12002, result: { available: false, code: 'EPERM', message: 'operation not permitted' } },
]).message;
expect(message).toContain('Cannot bind localhost ports after 2 attempts');
expect(message).toContain('Last error: 12002 (EPERM: operation not permitted)');
expect(message).toContain('not that every sampled port is occupied');
expect(message).toContain('set BROWSE_PORT to an approved port');
});
test('random port diagnostic preserves old busy-port meaning when all attempts are occupied', () => {
const message = __testInternals__.formatRandomPortUnavailableError([
{ port: 11001, result: { available: false, code: 'EADDRINUSE', message: 'address already in use' } },
{ port: 12002, result: { available: false, code: 'EADDRINUSE', message: 'address already in use' } },
]).message;
expect(message).toContain('No available port after 5 attempts');
expect(message).toContain('every sampled port was already in use');
});
test('isPortAvailable returns true for a free port', async () => {
// Use the same isPortAvailable logic from server.ts

View File

@ -2,12 +2,7 @@
name: canary
preamble-tier: 2
version: 1.0.0
description: |
Post-deploy canary monitoring. Watches the live app for console errors,
performance regressions, and page failures using the browse daemon. Takes
periodic screenshots, compares against pre-deploy baselines, and alerts
on anomalies. Use when: "monitor deploy", "canary", "post-deploy check",
"watch production", "verify deploy". (gstack)
description: Post-deploy canary monitoring. (gstack)
allowed-tools:
- Bash
- Read
@ -22,6 +17,15 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Watches the live app for console errors,
performance regressions, and page failures using the browse daemon. Takes
periodic screenshots, compares against pre-deploy baselines, and alerts
on anomalies. Use when: "monitor deploy", "canary", "post-deploy check",
"watch production", "verify deploy".
## Preamble (run first)
```bash
@ -562,84 +566,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,12 +1,7 @@
---
name: careful
version: 0.1.0
description: |
Safety guardrails for destructive commands. Warns before rm -rf, DROP TABLE,
force-push, git reset --hard, kubectl delete, and similar destructive operations.
User can override each warning. Use when touching prod, debugging live systems,
or working in a shared environment. Use when asked to "be careful", "safety mode",
"prod mode", or "careful mode". (gstack)
description: Safety guardrails for destructive commands. (gstack)
triggers:
- be careful
- warn before destructive
@ -25,6 +20,15 @@ hooks:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Warns before rm -rf, DROP TABLE,
force-push, git reset --hard, kubectl delete, and similar destructive operations.
User can override each warning. Use when touching prod, debugging live systems,
or working in a shared environment. Use when asked to "be careful", "safety mode",
"prod mode", or "careful mode".
# /careful — Destructive Command Guardrails
Safety mode is now **active**. Every bash command will be checked for destructive

View File

@ -2,13 +2,7 @@
name: codex
preamble-tier: 3
version: 1.0.0
description: |
OpenAI Codex CLI wrapper — three modes. Code review: independent diff review via
codex review with pass/fail gate. Challenge: adversarial mode that tries to break
your code. Consult: ask codex anything with session continuity for follow-ups.
The "200 IQ autistic developer" second opinion. Use when asked to "codex review",
"codex challenge", "ask codex", "second opinion", or "consult codex". (gstack)
Voice triggers (speech-to-text aliases): "code x", "code ex", "get another opinion".
description: OpenAI Codex CLI wrapper — three modes. (gstack)
triggers:
- codex review
- second opinion
@ -24,6 +18,17 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Code review: independent diff review via
codex review with pass/fail gate. Challenge: adversarial mode that tries to break
your code. Consult: ask codex anything with session continuity for follow-ups.
The "200 IQ autistic developer" second opinion. Use when asked to "codex review",
"codex challenge", "ask codex", "second opinion", or "consult codex".
Voice triggers (speech-to-text aliases): "code x", "code ex", "get another opinion".
## Preamble (run first)
```bash
@ -564,84 +569,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: context-restore
preamble-tier: 2
version: 1.0.0
description: |
Restore working context saved earlier by /context-save. Loads the most recent
saved state (across all branches by default) so you can pick up where you
left off — even across Conductor workspace handoffs.
Use when asked to "resume", "restore context", "where was I", or
"pick up where I left off". Pair with /context-save.
Formerly /checkpoint resume — renamed because Claude Code treats /checkpoint
as a native rewind alias in current environments. (gstack)
description: Restore working context saved earlier by /context-save. (gstack)
allowed-tools:
- Bash
- Read
@ -26,6 +19,17 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Loads the most recent
saved state (across all branches by default) so you can pick up where you
left off — even across Conductor workspace handoffs.
Use when asked to "resume", "restore context", "where was I", or
"pick up where I left off". Pair with /context-save.
Formerly /checkpoint resume — renamed because Claude Code treats /checkpoint
as a native rewind alias in current environments.
## Preamble (run first)
```bash
@ -566,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: context-save
preamble-tier: 2
version: 1.0.0
description: |
Save working context. Captures git state, decisions made, and remaining work
so any future session can pick up without losing a beat.
Use when asked to "save progress", "save state", "context save", or
"save my work". Pair with /context-restore to resume later.
Formerly /checkpoint — renamed because Claude Code treats /checkpoint as a
native rewind alias in current environments, which was shadowing this skill.
(gstack)
description: Save working context. (gstack)
allowed-tools:
- Bash
- Read
@ -26,6 +19,16 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Captures git state, decisions made, and remaining work
so any future session can pick up without losing a beat.
Use when asked to "save progress", "save state", "context save", or
"save my work". Pair with /context-restore to resume later.
Formerly /checkpoint — renamed because Claude Code treats /checkpoint as a
native rewind alias in current environments, which was shadowing this skill.
## Preamble (run first)
```bash
@ -566,84 +569,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: cso
preamble-tier: 2
version: 2.0.0
description: |
Chief Security Officer mode. Infrastructure-first security audit: secrets archaeology,
dependency supply chain, CI/CD pipeline security, LLM/AI security, skill supply chain
scanning, plus OWASP Top 10, STRIDE threat modeling, and active verification.
Two modes: daily (zero-noise, 8/10 confidence gate) and comprehensive (monthly deep
scan, 2/10 bar). Trend tracking across audit runs.
Use when: "security audit", "threat model", "pentest review", "OWASP", "CSO review". (gstack)
Voice triggers (speech-to-text aliases): "see-so", "see so", "security review", "security check", "vulnerability scan", "run security".
description: Chief Security Officer mode. (gstack)
allowed-tools:
- Bash
- Read
@ -27,6 +20,18 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Infrastructure-first security audit: secrets archaeology,
dependency supply chain, CI/CD pipeline security, LLM/AI security, skill supply chain
scanning, plus OWASP Top 10, STRIDE threat modeling, and active verification.
Two modes: daily (zero-noise, 8/10 confidence gate) and comprehensive (monthly deep
scan, 2/10 bar). Trend tracking across audit runs.
Use when: "security audit", "threat model", "pentest review", "OWASP", "CSO review".
Voice triggers (speech-to-text aliases): "see-so", "see so", "security review", "security check", "vulnerability scan", "run security".
## Preamble (run first)
```bash
@ -567,84 +572,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: design-consultation
preamble-tier: 3
version: 1.0.0
description: |
Design consultation: understands your product, researches the landscape, proposes a
complete design system (aesthetic, typography, color, layout, spacing, motion), and
generates font+color preview pages. Creates DESIGN.md as your project's design source
of truth. For existing sites, use /plan-design-review to infer the system instead.
Use when asked to "design system", "brand guidelines", or "create DESIGN.md".
Proactively suggest when starting a new project's UI with no existing
design system or DESIGN.md. (gstack)
description: Design consultation: understands your product, researches the landscape, proposes a complete design system (aesthetic, typography, color, layout, spacing, motion), and generates font+color preview... (gstack)
allowed-tools:
- Bash
- Read
@ -50,6 +43,15 @@ gbrain:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Creates DESIGN.md as your project's design source
of truth. For existing sites, use /plan-design-review to infer the system instead.
Use when asked to "design system", "brand guidelines", or "create DESIGN.md".
Proactively suggest when starting a new project's UI with no existing
design system or DESIGN.md.
## Preamble (run first)
```bash
@ -590,84 +592,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake
@ -1321,8 +1246,12 @@ This command generates the board HTML, starts an HTTP server on a random port,
and opens it in the user's default browser. **Run it in the background** with `&`
because the server needs to stay running while the user interacts with the board.
Parse the port from stderr output: `SERVE_STARTED: port=XXXXX`. You need this
for the board URL and for reloading during regeneration cycles.
Parse the board URL from stderr output. Default daemon path:
`BOARD_URL: http://127.0.0.1:N/boards/<id>/` (already includes the per-board
path; use this for the AskUserQuestion URL AND as the base for the reload
endpoint). Legacy `--no-daemon` path emits `SERVE_STARTED: port=XXXXX` and
serves a single board at `/`, with reload at `/api/reload` — only relevant
when an external caller explicitly passes `--no-daemon`.
**PRIMARY WAIT: AskUserQuestion with board URL**
@ -1330,11 +1259,14 @@ After the board is serving, use AskUserQuestion to wait for the user. Include th
board URL so they can click it if they lost the browser tab:
"I've opened a comparison board with the design variants:
http://127.0.0.1:<PORT>/ — Rate them, leave comments, remix
<BOARD_URL> — Rate them, leave comments, remix
elements you like, and click Submit when you're done. Let me know when you've
submitted your feedback (or paste your preferences here). If you clicked
Regenerate or Remix on the board, tell me and I'll generate new variants."
Substitute `<BOARD_URL>` with the URL parsed from stderr (the daemon path
emits `BOARD_URL: http://127.0.0.1:N/boards/<id>/`).
**Do NOT use AskUserQuestion to ask which variant the user prefers.** The comparison
board IS the chooser. AskUserQuestion is just the blocking wait mechanism.
@ -1378,8 +1310,13 @@ the approved variant.
2. If `regenerateAction` is `"remix"`, read `remixSpec` (e.g. `{"layout":"A","colors":"B"}`)
3. Generate new variants with `$D iterate` or `$D variants` using updated brief
4. Create new board: `$D compare --images "..." --output "$_DESIGN_DIR/design-board.html"`
5. Reload the board in the user's browser (same tab):
`curl -s -X POST http://127.0.0.1:PORT/api/reload -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
5. Reload the board in the user's browser (same tab) — the URL is per-board
under daemon mode, so use `<BOARD_URL>` (from the `BOARD_URL:` stderr
line) as the base:
`curl -s -X POST "${BOARD_URL}api/reload" -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
Under `--no-daemon` the reload endpoint is `/api/reload` at the legacy
port; this path only matters if the caller explicitly opted out of the
daemon.
6. The board auto-refreshes. **AskUserQuestion again** with the same board URL to
wait for the next round of feedback. Repeat until `feedback.json` appears.

View File

@ -2,16 +2,7 @@
name: design-html
preamble-tier: 2
version: 1.0.0
description: |
Design finalization: generates production-quality Pretext-native HTML/CSS.
Works with approved mockups from /design-shotgun, CEO plans from /plan-ceo-review,
design review context from /plan-design-review, or from scratch with a user
description. Text actually reflows, heights are computed, layouts are dynamic.
30KB overhead, zero deps. Smart API routing: picks the right Pretext patterns
for each design type. Use when: "finalize this design", "turn this into HTML",
"build me a page", "implement this design", or after any planning skill.
Proactively suggest when user has approved a design or has a plan ready. (gstack)
Voice triggers (speech-to-text aliases): "build the design", "code the mockup", "make it real".
description: Design finalization: generates production-quality Pretext-native HTML/CSS. (gstack)
triggers:
- build the design
- code the mockup
@ -29,6 +20,19 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Works with approved mockups from /design-shotgun, CEO plans from /plan-ceo-review,
design review context from /plan-design-review, or from scratch with a user
description. Text actually reflows, heights are computed, layouts are dynamic.
30KB overhead, zero deps. Smart API routing: picks the right Pretext patterns
for each design type. Use when: "finalize this design", "turn this into HTML",
"build me a page", "implement this design", or after any planning skill.
Proactively suggest when user has approved a design or has a plan ready.
Voice triggers (speech-to-text aliases): "build the design", "code the mockup", "make it real".
## Preamble (run first)
```bash
@ -569,84 +573,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: design-review
preamble-tier: 4
version: 2.0.0
description: |
Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems,
AI slop patterns, and slow interactions — then fixes them. Iteratively fixes issues
in source code, committing each fix atomically and re-verifying with before/after
screenshots. For plan-mode design review (before implementation), use /plan-design-review.
Use when asked to "audit the design", "visual QA", "check if it looks good", or "design polish".
Proactively suggest when the user mentions visual inconsistencies or
wants to polish the look of a live site. (gstack)
description: Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems, AI slop patterns, and slow interactions — then fixes them. (gstack)
allowed-tools:
- Bash
- Read
@ -27,6 +20,16 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Iteratively fixes issues
in source code, committing each fix atomically and re-verifying with before/after
screenshots. For plan-mode design review (before implementation), use /plan-design-review.
Use when asked to "audit the design", "visual QA", "check if it looks good", or "design polish".
Proactively suggest when the user mentions visual inconsistencies or
wants to polish the look of a live site.
## Preamble (run first)
```bash
@ -567,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,13 +2,7 @@
name: design-shotgun
preamble-tier: 2
version: 1.0.0
description: |
Design shotgun: generate multiple AI design variants, open a comparison board,
collect structured feedback, and iterate. Standalone design exploration you can
run anytime. Use when: "explore designs", "show me options", "design variants",
"visual brainstorm", or "I don't like how this looks".
Proactively suggest when the user describes a UI feature but hasn't seen
what it could look like. (gstack)
description: Design shotgun: generate multiple AI design variants, open a comparison board, collect structured feedback, and iterate. (gstack)
triggers:
- explore design variants
- show me design options
@ -44,6 +38,15 @@ gbrain:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Standalone design exploration you can
run anytime. Use when: "explore designs", "show me options", "design variants",
"visual brainstorm", or "I don't like how this looks".
Proactively suggest when the user describes a UI feature but hasn't seen
what it could look like.
## Preamble (run first)
```bash
@ -584,84 +587,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake
@ -1207,8 +1133,12 @@ This command generates the board HTML, starts an HTTP server on a random port,
and opens it in the user's default browser. **Run it in the background** with `&`
because the server needs to stay running while the user interacts with the board.
Parse the port from stderr output: `SERVE_STARTED: port=XXXXX`. You need this
for the board URL and for reloading during regeneration cycles.
Parse the board URL from stderr output. Default daemon path:
`BOARD_URL: http://127.0.0.1:N/boards/<id>/` (already includes the per-board
path; use this for the AskUserQuestion URL AND as the base for the reload
endpoint). Legacy `--no-daemon` path emits `SERVE_STARTED: port=XXXXX` and
serves a single board at `/`, with reload at `/api/reload` — only relevant
when an external caller explicitly passes `--no-daemon`.
**PRIMARY WAIT: AskUserQuestion with board URL**
@ -1216,11 +1146,14 @@ After the board is serving, use AskUserQuestion to wait for the user. Include th
board URL so they can click it if they lost the browser tab:
"I've opened a comparison board with the design variants:
http://127.0.0.1:<PORT>/ — Rate them, leave comments, remix
<BOARD_URL> — Rate them, leave comments, remix
elements you like, and click Submit when you're done. Let me know when you've
submitted your feedback (or paste your preferences here). If you clicked
Regenerate or Remix on the board, tell me and I'll generate new variants."
Substitute `<BOARD_URL>` with the URL parsed from stderr (the daemon path
emits `BOARD_URL: http://127.0.0.1:N/boards/<id>/`).
**Do NOT use AskUserQuestion to ask which variant the user prefers.** The comparison
board IS the chooser. AskUserQuestion is just the blocking wait mechanism.
@ -1264,8 +1197,13 @@ the approved variant.
2. If `regenerateAction` is `"remix"`, read `remixSpec` (e.g. `{"layout":"A","colors":"B"}`)
3. Generate new variants with `$D iterate` or `$D variants` using updated brief
4. Create new board: `$D compare --images "..." --output "$_DESIGN_DIR/design-board.html"`
5. Reload the board in the user's browser (same tab):
`curl -s -X POST http://127.0.0.1:PORT/api/reload -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
5. Reload the board in the user's browser (same tab) — the URL is per-board
under daemon mode, so use `<BOARD_URL>` (from the `BOARD_URL:` stderr
line) as the base:
`curl -s -X POST "${BOARD_URL}api/reload" -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
Under `--no-daemon` the reload endpoint is `/api/reload` at the legacy
port; this path only matters if the caller explicitly opted out of the
daemon.
6. The board auto-refreshes. **AskUserQuestion again** with the same board URL to
wait for the next round of feedback. Repeat until `feedback.json` appears.

View File

@ -25,8 +25,19 @@ import { evolve } from "./evolve";
import { generateDesignToCodePrompt } from "./design-to-code";
import { serve } from "./serve";
import { gallery } from "./gallery";
import {
daemonStatus as daemonStatusClient,
ensureDaemon,
publishBoard,
shutdownDaemon,
} from "./daemon-client";
import { spawn as nodeSpawn } from "child_process";
function parseArgs(argv: string[]): { command: string; flags: Record<string, string | boolean> } {
function parseArgs(argv: string[]): {
command: string;
flags: Record<string, string | boolean>;
positionals: string[];
} {
const args = argv.slice(2); // skip bun/node and script path
if (args.length === 0) {
printUsage();
@ -35,6 +46,7 @@ function parseArgs(argv: string[]): { command: string; flags: Record<string, str
const command = args[0];
const flags: Record<string, string | boolean> = {};
const positionals: string[] = [];
for (let i = 1; i < args.length; i++) {
const arg = args[i];
@ -47,10 +59,12 @@ function parseArgs(argv: string[]): { command: string; flags: Record<string, str
} else {
flags[key] = true;
}
} else {
positionals.push(arg);
}
}
return { command, flags };
return { command, flags, positionals };
}
function printUsage(): void {
@ -108,7 +122,7 @@ async function runSetup(): Promise<void> {
}
async function main(): Promise<void> {
const { command, flags } = parseArgs(process.argv);
const { command, flags, positionals } = parseArgs(process.argv);
if (!COMMANDS.has(command)) {
console.error(`Unknown command: ${command}`);
@ -139,12 +153,24 @@ async function main(): Promise<void> {
const images = await resolveImagePaths(imagesArg);
const outputPath = (flags.output as string) || "/tmp/gstack-design-board.html";
compare({ images, output: outputPath });
// If --serve flag is set, start HTTP server for the board
// If --serve flag is set, publish the board.
// Default: ensure the persistent daemon is up, POST the board, open
// the browser, exit. The daemon survives the CLI and hosts every
// board the user has published this day at stable URLs.
// --no-daemon: legacy single-process server in serve.ts (kept for
// tests / Windows / explicit debugging).
if (flags.serve) {
await serve({
html: outputPath,
timeout: flags.timeout ? parseInt(flags.timeout as string) : 600,
});
if (flags["no-daemon"]) {
await serve({
html: outputPath,
timeout: flags.timeout ? parseInt(flags.timeout as string) : 600,
});
} else {
await publishToDaemon({
html: outputPath,
title: flags.title as string | undefined,
});
}
}
break;
}
@ -247,11 +273,108 @@ async function main(): Promise<void> {
break;
case "serve":
await serve({
html: flags.html as string,
timeout: flags.timeout ? parseInt(flags.timeout as string) : 600,
});
if (flags["no-daemon"]) {
await serve({
html: flags.html as string,
timeout: flags.timeout ? parseInt(flags.timeout as string) : 600,
});
} else {
await publishToDaemon({
html: flags.html as string,
title: flags.title as string | undefined,
});
}
break;
case "daemon": {
// Sub-commands: `$D daemon status` and `$D daemon stop [--force]`.
const sub = positionals[0] || "status";
if (sub === "status") {
const s = await daemonStatusClient();
if (!s.running) {
console.log(JSON.stringify({ running: false }, null, 2));
process.exit(0);
}
console.log(JSON.stringify(s, null, 2));
break;
}
if (sub === "stop") {
const r = await shutdownDaemon({ force: !!flags.force });
if (r.stopped) {
console.log(JSON.stringify({ stopped: true, reason: r.reason }, null, 2));
process.exit(0);
}
console.error(
`Refused to stop daemon: ${r.reason} (activeBoards=${r.activeBoards ?? 0})`,
);
console.error(
`Submit/close active boards first, or pass --force to drop in-memory history.`,
);
process.exit(1);
}
console.error(`Unknown daemon sub-command: ${sub}. Use 'status' or 'stop'.`);
process.exit(2);
}
}
}
/**
* Default `$D compare --serve` path: ensure the persistent daemon is up,
* publish the board, open the browser to its URL, then exit. The daemon
* survives.
*
* Stderr lines (in order):
* - "DAEMON_STARTED port=N version=V" (or "DAEMON_ATTACHED port=N ..."
* if a daemon was already running)
* - "BOARD_PUBLISHED: http://127.0.0.1:N/boards/<id>/"
* - "BOARD_URL: <same url>" (alias for grep-friendliness)
* - "SERVE_STARTED: port=N html=<path>" (legacy back-compat alias for
* any external script that scraped the pre-daemon output note the
* daemon hosts boards under /boards/<id>/, not /, so scripts that
* ALSO POSTed /api/reload at the parsed port need to switch to
* BOARD_URL + ./api/reload to work end-to-end. Emitting the legacy
* line keeps port-only consumers from breaking outright.)
*/
async function publishToDaemon(opts: { html: string; title?: string }): Promise<void> {
if (!opts.html) {
console.error("--html is required (compare --serve provides --output as the html)");
process.exit(1);
}
const ensured = await ensureDaemon({});
console.error(
`${ensured.spawned ? "DAEMON_STARTED" : "DAEMON_ATTACHED"} port=${ensured.port} version=${ensured.version}`,
);
const result = await publishBoard({
port: ensured.port,
html: opts.html,
title: opts.title,
});
console.error(`BOARD_PUBLISHED: ${result.url}`);
console.error(`BOARD_URL: ${result.url}`);
// Legacy alias so anything still grepping `SERVE_STARTED: port=` gets the
// port. The full back-compat story requires the caller to ALSO learn the
// per-board path; see publishToDaemon docstring above.
console.error(`SERVE_STARTED: port=${ensured.port} html=${opts.html}`);
console.log(JSON.stringify({ id: result.id, url: result.url, sourceDir: result.sourceDir }, null, 2));
openBrowser(result.url);
// Short-lived publisher process exits; daemon keeps serving.
}
/** Open a URL in the default browser. Stays cross-platform with serve.ts. */
function openBrowser(url: string): void {
const platform = process.platform;
let cmd: string;
if (platform === "darwin") cmd = "open";
else if (platform === "linux") cmd = "xdg-open";
else {
console.error(`Open this URL in your browser: ${url}`);
return;
}
try {
const child = nodeSpawn(cmd, [url], { stdio: "ignore", detached: true });
child.unref();
} catch {
console.error(`Open this URL in your browser: ${url}`);
}
}
@ -280,7 +403,19 @@ async function resolveImagePaths(input: string): Promise<string[]> {
return input.split(",").map(p => p.trim());
}
main().catch(err => {
console.error(err.message || err);
process.exit(1);
});
// Self-execution shortcut: when invoked with --daemon-mode, this same
// binary runs as the persistent design daemon instead of the CLI. Keeps
// the production install to a single executable; daemon-client.ts spawns
// `<this binary> --daemon-mode` (or `bun run cli.ts --daemon-mode` in dev)
// rather than relying on a separate daemon.ts file at a known path.
if (process.argv.includes("--daemon-mode")) {
const { start } = await import("./daemon");
start();
// start() binds Bun.serve and registers signal handlers; this branch
// never falls through to main(). Process stays alive on the bound port.
} else {
main().catch((err) => {
console.error(err.message || err);
process.exit(1);
});
}

View File

@ -36,8 +36,8 @@ export const COMMANDS = new Map<string, {
}],
["compare", {
description: "Generate HTML comparison board for user review",
usage: "compare --images /path/*.png --output /path/board.html [--serve]",
flags: ["--images", "--output", "--serve", "--timeout"],
usage: "compare --images /path/*.png --output /path/board.html [--serve [--no-daemon] [--title \"...\"]]",
flags: ["--images", "--output", "--serve", "--no-daemon", "--title", "--timeout"],
}],
["diff", {
description: "Visual diff between two mockups",
@ -71,8 +71,13 @@ export const COMMANDS = new Map<string, {
}],
["serve", {
description: "Serve comparison board over HTTP and collect user feedback",
usage: "serve --html /path/board.html [--timeout 600]",
flags: ["--html", "--timeout"],
usage: "serve --html /path/board.html [--no-daemon] [--title \"...\"] [--timeout 600]",
flags: ["--html", "--no-daemon", "--title", "--timeout"],
}],
["daemon", {
description: "Manage the persistent design board daemon (sub-commands: status, stop)",
usage: "daemon status | daemon stop [--force]",
flags: ["--force"],
}],
["setup", {
description: "Guided API key setup + smoke test",

View File

@ -391,6 +391,17 @@ export function generateCompareHtml(images: string[]): string {
<div id="feedback-result"></div>
<script>
// Feature-detect: are we being served over HTTP (by serve.ts or the
// daemon), or opened directly as a file:// URL? In file:// mode the
// board JS falls through to a DOM-only success path with no server
// round-trips. Using location.protocol instead of an injected global
// means the same generated HTML works at both / (legacy --no-daemon)
// and /boards/<id>/ (daemon) — relative URLs resolve against
// location.pathname in both cases.
function hasServer() {
return location.protocol === 'http:' || location.protocol === 'https:';
}
// View toggle
document.querySelectorAll('.view-toggle button').forEach(function(btn) {
btn.addEventListener('click', function() {
@ -465,8 +476,8 @@ export function generateCompareHtml(images: string[]): string {
});
function postFeedback(feedback) {
if (!window.__GSTACK_SERVER_URL) return Promise.resolve(null);
return fetch(window.__GSTACK_SERVER_URL + '/api/feedback', {
if (!hasServer()) return Promise.resolve(null);
return fetch('./api/feedback', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(feedback),
@ -509,7 +520,7 @@ export function generateCompareHtml(images: string[]): string {
}
function startProgressPolling() {
if (!window.__GSTACK_SERVER_URL) return;
if (!hasServer()) return;
var pollCount = 0;
var maxPolls = 150; // 5 min at 2s intervals
var pollInterval = setInterval(function() {
@ -523,7 +534,7 @@ export function generateCompareHtml(images: string[]): string {
'</div>';
return;
}
fetch(window.__GSTACK_SERVER_URL + '/api/progress')
fetch('./api/progress')
.then(function(r) { return r.json(); })
.then(function(data) {
if (data.status === 'serving') {
@ -563,7 +574,7 @@ export function generateCompareHtml(images: string[]): string {
postFeedback(feedback).then(function(result) {
if (result && result.received) {
showRegeneratingState();
} else if (window.__GSTACK_SERVER_URL) {
} else if (hasServer()) {
showPostFailure(feedback);
}
});
@ -578,7 +589,7 @@ export function generateCompareHtml(images: string[]): string {
postFeedback(feedback).then(function(result) {
if (result && result.received) {
showPostSubmitState();
} else if (window.__GSTACK_SERVER_URL) {
} else if (hasServer()) {
showPostFailure(feedback);
} else {
// DOM-only mode (legacy / test)

419
design/src/daemon-client.ts Normal file
View File

@ -0,0 +1,419 @@
/**
* CLI-side client for the design daemon.
*
* Responsible for the lifecycle dance that `$D compare --serve` (default
* path) goes through:
*
* ensureDaemon() publishBoard(html, opts) openBrowser(url) exit 0
*
* Mirrors browse/src/cli.ts:317-415 same health-check-first attach
* decision, same fs.openSync('wx') lock, same re-read-under-lock guard.
* Adds two design-specific safety properties Codex flagged on the daemon
* plan:
*
* 1. Identity verification before any SIGTERM. Browse signals on PID
* alone; here we require the cmdline to contain CMDLINE_MARKER so a
* stale state file pointing at a reused PID doesn't kill an
* unrelated process.
*
* 2. Refuse-to-kill on version mismatch with active boards. Browse will
* restart on version drift; here in-memory boards would be lost, so
* we exit 1 with a user-actionable message instead of silent loss.
*
* Spawn uses Node's child_process.spawn with detached: true + stdio
* pointed at a log file. Bun.spawn().unref() has macOS session-detach
* quirks browse already discovered (browse/src/cli.ts:225-275).
*/
import { spawn as nodeSpawn } from "child_process";
import fs from "fs";
import path from "path";
import { setTimeout as delay } from "timers/promises";
import {
acquireLock,
CMDLINE_MARKER,
healthCheck,
isProcessAlive,
readStateFile,
readVersionString,
resolveLockFilePath,
resolveStartupLogPath,
resolveStateFilePath,
verifyIdentity,
} from "./daemon-state";
const MAX_START_WAIT_MS = parseInt(
process.env.DESIGN_DAEMON_START_TIMEOUT_MS || "8000",
10,
);
const POLL_INTERVAL_MS = 100;
const SIGTERM_GRACE_MS = 2000;
export interface EnsureDaemonOptions {
/** Default: package version. Used for version-match check. */
version?: string;
/** Default: `<repo>/design/src/daemon.ts`. */
daemonScript?: string;
/** Extra env vars passed to the spawned daemon. */
daemonEnv?: Record<string, string>;
/** Print noisy progress to stderr. Default true. */
verbose?: boolean;
/**
* Override the state-file path. Default: resolveStateFilePath() (env
* DESIGN_DAEMON_STATE_FILE or .gstack/design.json under the git root /
* cwd). Tests inject a per-test path; the same path is forwarded to the
* spawned daemon via env so client + daemon agree.
*/
stateFile?: string;
}
export interface EnsureDaemonResult {
port: number;
version: string;
spawned: boolean;
}
function log(verbose: boolean, msg: string): void {
if (verbose) process.stderr.write(`[design-daemon] ${msg}\n`);
}
/**
* Ensure a design daemon is reachable on the project's state file. Returns
* the port to talk to. Spawns a new daemon under an exclusive lock when
* needed; attaches to an existing healthy daemon otherwise.
*
* Exits with code 1 (not throws) on the refuse-kill-with-active-boards
* branch that's a user-actionable situation, not a programming error.
*/
export async function ensureDaemon(
opts: EnsureDaemonOptions = {},
): Promise<EnsureDaemonResult> {
const verbose = opts.verbose !== false;
const expectedVersion = opts.version ?? readPackageVersion();
const stateFile = opts.stateFile ?? resolveStateFilePath();
const existing = readStateFile(stateFile);
if (existing) {
const health = await healthCheck(existing.port);
if (health) {
if (health.version === expectedVersion) {
log(verbose, `attached to existing daemon pid=${existing.pid} port=${existing.port}`);
return { port: existing.port, version: health.version, spawned: false };
}
// Version mismatch: refuse if active boards exist (Codex finding).
if (health.activeBoards > 0) {
process.stderr.write(
`[design-daemon] WARNING: existing daemon is gstack ${health.version}; this CLI is ${expectedVersion}.\n` +
`[design-daemon] ${health.activeBoards} active board(s) detected. Refusing to auto-kill.\n` +
`[design-daemon] Submit or close the open boards, then re-run.\n` +
`[design-daemon] Or force restart: $D daemon stop (will drop in-memory history).\n`,
);
process.exit(1);
}
// No active boards — safe to graceful-shutdown and respawn.
log(verbose, `daemon version mismatch (${health.version} vs ${expectedVersion}); shutting down`);
await gracefulShutdownExistingDaemon(existing.port);
await killByPidWithIdentity(existing.pid, existing.cmdlineMarker, verbose);
} else {
// State file points at an unresponsive port. Either the daemon
// crashed or the PID got reused. Identity-verify before any SIGTERM
// so we don't kill an unrelated process (Codex finding).
log(verbose, `state file present (pid=${existing.pid}) but /health unresponsive`);
await killByPidWithIdentity(existing.pid, existing.cmdlineMarker, verbose);
}
}
// Spawn under exclusive lock; re-read state INSIDE the lock so we don't
// race a concurrent CLI that won the lock first.
const lockPath = resolveLockFilePath(stateFile);
const release = acquireLock(lockPath);
if (!release) {
// Another process is starting the daemon. Wait for it.
log(verbose, "another CLI is spawning the daemon; waiting…");
const start = Date.now();
while (Date.now() - start < MAX_START_WAIT_MS) {
const fresh = readStateFile(stateFile);
if (fresh) {
const h = await healthCheck(fresh.port);
if (h) return { port: fresh.port, version: h.version, spawned: false };
}
await delay(POLL_INTERVAL_MS);
}
throw new Error("Timed out waiting for concurrent daemon spawn");
}
try {
// Re-read under lock. Another caller may have already finished spawning
// between our first check and our lock acquisition.
const fresh = readStateFile(stateFile);
if (fresh) {
const h = await healthCheck(fresh.port);
if (h && h.version === expectedVersion) {
log(verbose, `another CLI won the lock; attaching pid=${fresh.pid} port=${fresh.port}`);
return { port: fresh.port, version: h.version, spawned: false };
}
}
log(verbose, "spawning new daemon");
const port = await spawnDaemon({
script: opts.daemonScript,
env: { ...opts.daemonEnv, DESIGN_DAEMON_STATE_FILE: stateFile },
stateFile,
expectedVersion,
});
return { port, version: expectedVersion, spawned: true };
} finally {
release();
}
}
/**
* Publish a board to the daemon and return its URL. Wraps the HTTP POST
* with a friendlier error surface than raw fetch.
*/
export interface PublishBoardOptions {
port: number;
html: string;
title?: string;
publisherPid?: number;
}
export interface PublishBoardResult {
id: string;
url: string;
sourceDir: string;
}
export async function publishBoard(opts: PublishBoardOptions): Promise<PublishBoardResult> {
const body: Record<string, unknown> = {
html: opts.html,
publisherPid: opts.publisherPid ?? process.pid,
};
if (opts.title) body.title = opts.title;
const resp = await fetch(`http://127.0.0.1:${opts.port}/api/boards`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
});
if (!resp.ok) {
let errText: string;
try {
const j = (await resp.json()) as { error?: string; existing?: { id: string; url: string } };
if (j.existing) {
// 409: surface the existing-board URL so the caller can reuse it
return { id: j.existing.id, url: j.existing.url, sourceDir: "" };
}
errText = j.error || `HTTP ${resp.status}`;
} catch {
errText = `HTTP ${resp.status}`;
}
throw new Error(`Daemon refused publish: ${errText}`);
}
return (await resp.json()) as PublishBoardResult;
}
// ─── Internals ───────────────────────────────────────────────────
function readPackageVersion(): string {
return readVersionString();
}
function defaultDaemonScript(): string {
// design/src/daemon-client.ts → daemon.ts is a sibling. Only used in dev
// when this process is `bun run cli.ts`; the compiled-binary path
// self-execs instead (see resolveSpawnCommand).
return path.join(import.meta.dir, "daemon.ts");
}
/**
* Compute the argv to spawn the daemon. Two modes:
*
* Compiled binary (`design/dist/design`): re-exec ourselves with
* --daemon-mode. process.execPath IS the compiled design binary;
* spawning it again with the flag runs the daemon (see the
* --daemon-mode branch at the bottom of cli.ts).
*
* Dev (`bun run design/src/cli.ts`): process.execPath is bun, so we
* invoke `bun run <daemon.ts> --marker ...` directly.
*
* Tests can override the dev script via opts.script.
*/
function resolveSpawnCommand(scriptOverride: string | undefined): {
command: string;
args: string[];
} {
const execBase = path.basename(process.execPath).toLowerCase();
const isCompiledHost = execBase !== "bun" && execBase !== "bun.exe" && execBase !== "node";
if (isCompiledHost && !scriptOverride) {
return {
command: process.execPath,
args: ["--daemon-mode", "--marker", CMDLINE_MARKER],
};
}
const script = scriptOverride ?? defaultDaemonScript();
return {
command: "bun",
args: ["run", script, "--marker", CMDLINE_MARKER],
};
}
interface SpawnDaemonOpts {
script?: string;
env?: Record<string, string>;
stateFile: string;
expectedVersion: string;
}
async function spawnDaemon(opts: SpawnDaemonOpts): Promise<number> {
const logPath = resolveStartupLogPath();
fs.mkdirSync(path.dirname(logPath), { recursive: true });
// Truncate the startup log on each spawn so a later read finds only THIS
// attempt's output (mirrors browse's per-spawn log truncation).
fs.writeFileSync(logPath, "");
const logFd = fs.openSync(logPath, "a");
const { command, args } = resolveSpawnCommand(opts.script);
const child = nodeSpawn(command, args, {
detached: true,
stdio: ["ignore", logFd, logFd],
env: {
...process.env,
DESIGN_DAEMON_VERSION: opts.expectedVersion,
...(opts.env ?? {}),
},
});
child.unref();
fs.closeSync(logFd);
// Poll the state file + /health until the daemon is up, or until timeout.
const deadline = Date.now() + MAX_START_WAIT_MS;
while (Date.now() < deadline) {
const fresh = readStateFile(opts.stateFile);
if (fresh) {
const h = await healthCheck(fresh.port);
if (h) return fresh.port;
}
await delay(POLL_INTERVAL_MS);
}
// Timed out — surface the startup log so the user sees the actual error
// instead of "daemon failed silently."
let tail = "";
try {
tail = fs.readFileSync(logPath, "utf-8").trim();
} catch {
// log file may not exist
}
throw new Error(
`Design daemon failed to start within ${MAX_START_WAIT_MS}ms.\n` +
`Startup log (${logPath}):\n${tail || "(empty)"}`,
);
}
async function gracefulShutdownExistingDaemon(port: number): Promise<void> {
try {
await fetch(`http://127.0.0.1:${port}/shutdown`, {
method: "POST",
signal: AbortSignal.timeout(2000),
});
} catch {
// Daemon may have already exited or be unresponsive — fall through
// to the SIGTERM path with identity verification.
}
}
/**
* Send SIGTERM (then SIGKILL) to `pid`, but ONLY if the running cmdline
* contains `marker`. Prevents a stale state file from causing us to signal
* an unrelated process that inherited the PID.
*/
async function killByPidWithIdentity(
pid: number,
marker: string,
verbose: boolean,
): Promise<void> {
if (!pid || pid <= 0) return;
if (!isProcessAlive(pid)) return;
if (!verifyIdentity(pid, marker || CMDLINE_MARKER)) {
log(
verbose,
`pid ${pid} is alive but cmdline doesn't match marker '${marker || CMDLINE_MARKER}'; skipping signal (possible PID reuse)`,
);
return;
}
try {
process.kill(pid, "SIGTERM");
} catch {
// already gone
return;
}
// Give it a grace period; SIGKILL if still alive AND still ours.
const deadline = Date.now() + SIGTERM_GRACE_MS;
while (Date.now() < deadline) {
if (!isProcessAlive(pid)) return;
await delay(50);
}
if (isProcessAlive(pid) && verifyIdentity(pid, marker || CMDLINE_MARKER)) {
log(verbose, `pid ${pid} survived SIGTERM; SIGKILL`);
try {
process.kill(pid, "SIGKILL");
} catch {
// raced with exit
}
}
}
/**
* Public: $D daemon stop. Posts /shutdown if no active boards; otherwise
* reports refusal. Used by the CLI sub-command (next commit).
*/
export async function shutdownDaemon(opts: { force?: boolean } = {}): Promise<{
stopped: boolean;
reason?: string;
activeBoards?: number;
}> {
const stateFile = resolveStateFilePath();
const existing = readStateFile(stateFile);
if (!existing) return { stopped: false, reason: "no daemon running" };
const health = await healthCheck(existing.port);
if (!health) {
// unresponsive: try SIGTERM via identity-checked path
await killByPidWithIdentity(existing.pid, existing.cmdlineMarker, true);
return { stopped: true, reason: "unresponsive daemon killed via SIGTERM" };
}
if (health.activeBoards > 0 && !opts.force) {
return {
stopped: false,
reason: "active boards present",
activeBoards: health.activeBoards,
};
}
await gracefulShutdownExistingDaemon(existing.port);
// Best-effort: SIGTERM if /shutdown didn't take effect
if (isProcessAlive(existing.pid)) {
await killByPidWithIdentity(existing.pid, existing.cmdlineMarker, true);
}
return { stopped: true };
}
/** $D daemon status — for the CLI sub-command. */
export async function daemonStatus(): Promise<
| { running: false }
| { running: true; port: number; pid: number; version: string; boards: number; activeBoards: number; uptime: number }
> {
const existing = readStateFile();
if (!existing) return { running: false };
const h = await healthCheck(existing.port);
if (!h) return { running: false };
return {
running: true,
port: existing.port,
pid: existing.pid,
version: h.version,
boards: h.boards,
activeBoards: h.activeBoards,
uptime: h.uptime,
};
}

220
design/src/daemon-state.ts Normal file
View File

@ -0,0 +1,220 @@
/**
* Pure utilities for design-daemon discovery.
*
* Shared between daemon.ts (writes/removes the state file) and
* daemon-client.ts (reads state, decides spawn-vs-attach). Mirrors
* browse/src/cli.ts:109-315 same atomic-write + fs.openSync 'wx' lock
* pattern, with an added cmdline-based identity check to guard against
* SIGTERM hitting a reused PID (Codex finding on the daemon plan).
*/
import { execFileSync } from "child_process";
import fs from "fs";
import os from "os";
import path from "path";
export interface DaemonState {
pid: number;
port: number;
startedAt: string; // ISO 8601
version: string;
serverPath: string;
cmdlineMarker: string;
}
// String we grep for in the spawned daemon's cmdline to confirm a pid is
// ours before sending any signal. Must appear in argv at spawn time.
export const CMDLINE_MARKER = "gstack-design-daemon";
export function resolveStateFilePath(): string {
// Env override has highest precedence so tests can point both client and
// spawned daemon at a per-test path without a shared cwd.
const envOverride = process.env.DESIGN_DAEMON_STATE_FILE;
if (envOverride) return envOverride;
try {
const root = execFileSync("git", ["rev-parse", "--show-toplevel"], {
encoding: "utf8",
stdio: ["ignore", "pipe", "ignore"],
}).trim();
if (root) return path.join(root, ".gstack", "design.json");
} catch {
// not in a git repo — fall through
}
return path.join(process.cwd(), ".gstack", "design.json");
}
export function resolveLockFilePath(stateFile: string = resolveStateFilePath()): string {
return `${stateFile}.lock`;
}
export function resolveDaemonLogPath(): string {
return path.join(os.homedir(), ".gstack", "design-daemon.log");
}
export function resolveStartupLogPath(): string {
return path.join(os.homedir(), ".gstack", "design-daemon-startup.log");
}
/**
* Read the gstack version both client and daemon should agree on. Looks
* (in order): DESIGN_DAEMON_VERSION env, design/dist/.version baked at
* build time, VERSION at the source-tree root (dev), then "unknown".
*
* Compiled binaries lose the source-tree relative path at runtime, so we
* try the dist/.version sidecar (which build.sh writes) before falling
* back. This keeps client.expectedVersion and daemon.VERSION coherent.
*/
export function readVersionString(): string {
const env = process.env.DESIGN_DAEMON_VERSION;
if (env) return env;
const candidates = [
// Compiled binary: design/dist/design lives alongside design/dist/.version
path.join(path.dirname(process.execPath), ".version"),
// Dev: design/src/* → repo root is two levels up
path.join(import.meta.dir, "..", "..", "VERSION"),
// Defensive: design/dist sibling of source tree
path.join(import.meta.dir, "..", "dist", ".version"),
];
for (const p of candidates) {
try {
const v = fs.readFileSync(p, "utf-8").trim();
if (v) return v;
} catch {
// try next
}
}
return "unknown";
}
export function readStateFile(stateFile: string = resolveStateFilePath()): DaemonState | null {
try {
return JSON.parse(fs.readFileSync(stateFile, "utf-8")) as DaemonState;
} catch {
return null;
}
}
export function writeStateFile(
state: DaemonState,
stateFile: string = resolveStateFilePath(),
): void {
fs.mkdirSync(path.dirname(stateFile), { recursive: true });
const tmp = `${stateFile}.tmp.${process.pid}.${Math.random().toString(36).slice(2)}`;
fs.writeFileSync(tmp, JSON.stringify(state, null, 2), { mode: 0o600 });
fs.renameSync(tmp, stateFile);
}
export function removeStateFile(stateFile: string = resolveStateFilePath()): void {
try {
fs.unlinkSync(stateFile);
} catch {
// already gone
}
}
export interface HealthOk {
ok: true;
version: string;
uptime: number;
boards: number;
activeBoards: number;
}
export async function healthCheck(
port: number,
timeoutMs: number = 2000,
): Promise<HealthOk | null> {
try {
const resp = await fetch(`http://127.0.0.1:${port}/health`, {
signal: AbortSignal.timeout(timeoutMs),
});
if (!resp.ok) return null;
const body = (await resp.json()) as Partial<HealthOk> | null;
if (body && body.ok === true && typeof body.version === "string") {
return body as HealthOk;
}
return null;
} catch {
return null;
}
}
export function isProcessAlive(pid: number): boolean {
if (!pid || pid <= 0) return false;
try {
process.kill(pid, 0);
return true;
} catch (e: unknown) {
// EPERM means it exists, we just can't signal it. ESRCH means it's gone.
const code = (e as NodeJS.ErrnoException | undefined)?.code;
return code === "EPERM";
}
}
/**
* Read the cmdline of a running process. Returns "" on any error.
* Linux: /proc/<pid>/cmdline (NUL-separated argv). macOS: `ps -p PID -o command=`.
*/
export function readCmdline(pid: number): string {
if (!isProcessAlive(pid)) return "";
try {
if (process.platform === "linux") {
const raw = fs.readFileSync(`/proc/${pid}/cmdline`, "utf-8");
return raw.replace(/\0/g, " ").trim();
}
if (process.platform === "darwin") {
return execFileSync("ps", ["-p", String(pid), "-o", "command="], {
encoding: "utf8",
stdio: ["ignore", "pipe", "ignore"],
}).trim();
}
return "";
} catch {
return "";
}
}
/**
* True only when the process at `pid` has `marker` in its cmdline. Used to
* avoid SIGTERMing an unrelated process that happens to have inherited a
* PID from a stale state file (the Codex PID-reuse concern). On systems
* where readCmdline is unsupported (or fails), this returns false safer
* to skip the signal than to risk killing the wrong process.
*/
export function verifyIdentity(pid: number, marker: string): boolean {
if (!marker) return false;
return readCmdline(pid).includes(marker);
}
/**
* Acquire an exclusive lock on `lockPath`. Returns a release function, or
* null if held by another live process. Stale locks (PID dead) are reclaimed
* once; if reclaim also fails the caller waits and retries via state re-read.
*/
export function acquireLock(lockPath: string): (() => void) | null {
try {
fs.mkdirSync(path.dirname(lockPath), { recursive: true });
// 'wx' = create exclusive, fail if exists. Atomic check-and-create.
const fd = fs.openSync(lockPath, "wx");
fs.writeSync(fd, `${process.pid}\n`);
fs.closeSync(fd);
return () => {
try {
fs.unlinkSync(lockPath);
} catch {
// already gone
}
};
} catch {
// Held — check if holder is alive
try {
const holderPid = parseInt(fs.readFileSync(lockPath, "utf-8").trim(), 10);
if (holderPid && isProcessAlive(holderPid)) return null;
// Stale, reclaim
fs.unlinkSync(lockPath);
return acquireLock(lockPath);
} catch {
return null;
}
}
}

582
design/src/daemon.ts Normal file
View File

@ -0,0 +1,582 @@
/**
* Persistent design board daemon.
*
* One process hosts many boards under /boards/<id>/. Spawned by
* daemon-client.ts when no live daemon is found on the project's discovery
* file (.gstack/design.json). Replaces the per-invocation server in
* serve.ts as the default for `$D compare --serve`; serve.ts is kept as
* the --no-daemon legacy/test path.
*
* Endpoints (see plan docs/designs path for full table):
* GET / index of boards
* GET /health liveness + version (unauth)
* POST /api/boards publish a new board
* POST /shutdown graceful exit (refused if active)
* GET /boards/<id> 301 /boards/<id>/
* GET /boards/<id>/ render board HTML
* GET /boards/<id>/api/progress state machine status
* POST /boards/<id>/api/feedback submit/regenerate
* POST /boards/<id>/api/reload swap board HTML
*
* Lifecycle:
* start bind 127.0.0.1:N write state file serve until 24h idle or
* explicit /shutdown remove state file exit 0
*
* The daemon refuses /shutdown when boards are non-done; the idle timer
* extends rather than killing in that case (up to a 28h hard ceiling).
* Both are Codex-flagged guards against silent loss of in-memory history.
*/
import fs from "fs";
import path from "path";
import {
CMDLINE_MARKER,
DaemonState,
readVersionString,
removeStateFile,
resolveDaemonLogPath,
writeStateFile,
} from "./daemon-state";
// ─── Tunables (env overrides for tests) ──────────────────────────
const DEFAULT_IDLE_MS = 24 * 60 * 60 * 1000; // 24h
const IDLE_MS = parseInt(
process.env.DESIGN_DAEMON_IDLE_MS || String(DEFAULT_IDLE_MS),
10,
);
const IDLE_EXTENSION_MS = parseInt(
process.env.DESIGN_DAEMON_EXTENSION_MS || String(60 * 60 * 1000), // 1h
10,
);
const MAX_EXTENSIONS = parseInt(process.env.DESIGN_DAEMON_MAX_EXTENSIONS || "4", 10);
const IDLE_CHECK_INTERVAL_MS = parseInt(
process.env.DESIGN_DAEMON_CHECK_MS || "60000",
10,
);
const MAX_BOARDS = parseInt(process.env.DESIGN_DAEMON_MAX_BOARDS || "50", 10);
const VERSION = readVersionString();
// ─── Per-board state ─────────────────────────────────────────────
export type BoardState = "serving" | "regenerating" | "done";
export interface Board {
id: string;
htmlContent: string;
sourceDir: string; // realpath of the dir feedback files write to
allowedDir: string; // realpath anchor for path-traversal guard
state: BoardState;
publishedAt: number;
lastTouched: number;
publisherPid: number;
title?: string;
}
// In-memory: keyed by board id.
const boards = new Map<string, Board>();
// Per-board mutex chain — serializes feedback POST vs reload POST on the
// same board so the daemon doesn't race a state mutation against an HTML swap.
const boardMutex = new Map<string, Promise<void>>();
let lastMeaningfulActivity = Date.now();
let idleExtensions = 0;
let shuttingDown = false;
let serverRef: ReturnType<typeof Bun.serve> | null = null;
let idleInterval: ReturnType<typeof setInterval> | null = null;
const startTime = Date.now();
const daemonLog = openDaemonLog();
function openDaemonLog(): fs.WriteStream | null {
try {
const p = resolveDaemonLogPath();
fs.mkdirSync(path.dirname(p), { recursive: true });
return fs.createWriteStream(p, { flags: "a" });
} catch {
return null;
}
}
function dlog(...args: unknown[]): void {
const line = `[${new Date().toISOString()}] ${args.map(String).join(" ")}\n`;
if (daemonLog) daemonLog.write(line);
process.stderr.write(line);
}
// ─── Helpers ─────────────────────────────────────────────────────
function newBoardId(): string {
const now = new Date();
const y = now.getUTCFullYear().toString().padStart(4, "0");
const mo = (now.getUTCMonth() + 1).toString().padStart(2, "0");
const d = now.getUTCDate().toString().padStart(2, "0");
const hh = now.getUTCHours().toString().padStart(2, "0");
const mm = now.getUTCMinutes().toString().padStart(2, "0");
const ss = now.getUTCSeconds().toString().padStart(2, "0");
const rand = Math.random().toString(36).slice(2, 8).padEnd(6, "0");
return `b-${y}${mo}${d}-${hh}${mm}${ss}-${rand}`;
}
async function withBoardMutex<T>(id: string, fn: () => Promise<T>): Promise<T> {
const prev = boardMutex.get(id) || Promise.resolve();
let release!: () => void;
const next = new Promise<void>((r) => {
release = r;
});
boardMutex.set(id, prev.then(() => next));
await prev;
try {
return await fn();
} finally {
release();
if (boardMutex.get(id) === next) boardMutex.delete(id);
}
}
function markMeaningfulActivity(): void {
lastMeaningfulActivity = Date.now();
idleExtensions = 0;
}
function nonDoneCount(): number {
let n = 0;
for (const b of boards.values()) if (b.state !== "done") n += 1;
return n;
}
function hasActiveBoards(): boolean {
return nonDoneCount() > 0;
}
// LRU eviction. Prefers `done` boards as victims so an active regen doesn't
// vanish mid-flight. Returns the evicted id, or null when the map fits.
function evictOne(): string | null {
if (boards.size <= MAX_BOARDS) return null;
let oldestDone: Board | null = null;
let oldestAny: Board | null = null;
for (const b of boards.values()) {
if (b.state === "done") {
if (!oldestDone || b.lastTouched < oldestDone.lastTouched) oldestDone = b;
}
if (!oldestAny || b.lastTouched < oldestAny.lastTouched) oldestAny = b;
}
const victim = oldestDone || oldestAny;
if (!victim) return null;
boards.delete(victim.id);
boardMutex.delete(victim.id);
dlog(`evicted board ${victim.id} state=${victim.state}`);
return victim.id;
}
function evictUntilUnderCap(): void {
while (boards.size > MAX_BOARDS) {
if (!evictOne()) break;
}
}
function findActiveBoardForSourceDir(sourceDir: string): Board | null {
for (const b of boards.values()) {
if (b.sourceDir === sourceDir && b.state !== "done") return b;
}
return null;
}
function escapeHtml(s: string): string {
return s.replace(/[&<>"']/g, (c) =>
({ "&": "&amp;", "<": "&lt;", ">": "&gt;", '"': "&quot;", "'": "&#39;" }[c]!),
);
}
// ─── Shutdown ─────────────────────────────────────────────────────
async function gracefulShutdown(exitCode = 0): Promise<void> {
if (shuttingDown) return;
shuttingDown = true;
dlog(`shutting down boards=${boards.size} code=${exitCode}`);
if (idleInterval) clearInterval(idleInterval);
try {
serverRef?.stop();
} catch {
// already stopped
}
removeStateFile();
if (daemonLog) daemonLog.end();
setTimeout(() => process.exit(exitCode), 50);
}
export function idleCheckTick(): void {
if (shuttingDown) return;
const idle = Date.now() - lastMeaningfulActivity;
if (idle < IDLE_MS) return;
if (hasActiveBoards()) {
if (idleExtensions >= MAX_EXTENSIONS) {
dlog(`idle past hard ceiling with ${nonDoneCount()} active boards — forcing shutdown`);
gracefulShutdown(0);
return;
}
idleExtensions += 1;
// Push lastMeaningfulActivity forward by an extension window without
// marking real activity (so the count stays correct).
lastMeaningfulActivity = Date.now() - IDLE_MS + IDLE_EXTENSION_MS;
dlog(
`idle with ${nonDoneCount()} active boards — extending ${IDLE_EXTENSION_MS / 60000}min (${idleExtensions}/${MAX_EXTENSIONS})`,
);
return;
}
dlog(`idle for ${Math.floor(idle / 1000)}s — shutting down`);
gracefulShutdown(0);
}
// ─── Handlers ─────────────────────────────────────────────────────
function handleHealth(): Response {
return Response.json({
ok: true,
version: VERSION,
uptime: Math.floor((Date.now() - startTime) / 1000),
boards: boards.size,
activeBoards: nonDoneCount(),
});
}
function handleIndex(): Response {
const sorted = [...boards.values()].sort((a, b) => b.publishedAt - a.publishedAt);
const rows = sorted
.map((b) => {
const ts = new Date(b.publishedAt).toISOString();
const titleSuffix = b.title ? `${escapeHtml(b.title)}` : "";
return `<li><a href="/boards/${b.id}/">${b.id}</a> <span class="state state-${b.state}">${b.state}</span> <time>${ts}</time>${titleSuffix}</li>`;
})
.join("\n");
const empty = `<p class="empty">No boards yet. Run <code>$D compare --serve</code> to publish one.</p>`;
const list = sorted.length === 0 ? empty : `<ul>\n${rows}\n</ul>`;
const html = `<!DOCTYPE html><html lang="en"><head>
<meta charset="utf-8"><title>gstack design boards</title><style>
body{font:14px/1.5 -apple-system,system-ui,sans-serif;max-width:720px;margin:32px auto;padding:0 16px;color:#1a1a1a}
h1{font-size:20px;margin-bottom:4px}
.meta{color:#666;margin-bottom:24px;font-size:13px}
ul{padding:0;list-style:none}
li{padding:10px 0;border-bottom:1px solid #eee;display:flex;align-items:center;gap:12px;flex-wrap:wrap}
a{color:#0070f3;text-decoration:none;font-family:ui-monospace,monospace}
a:hover{text-decoration:underline}
.state{font-size:11px;padding:2px 8px;border-radius:10px;background:#eef;color:#335}
.state-done{background:#efe;color:#353}
.state-regenerating{background:#ffe;color:#553}
time{color:#888;font-size:12px}
.empty{color:#888;font-style:italic}
code{font-family:ui-monospace,monospace;background:#f5f5f5;padding:2px 6px;border-radius:3px}
</style></head><body>
<h1>gstack design boards</h1>
<p class="meta">daemon up ${Math.floor((Date.now() - startTime) / 1000)}s · ${boards.size} board(s) · ${nonDoneCount()} active</p>
${list}
</body></html>`;
return new Response(html, { headers: { "Content-Type": "text/html; charset=utf-8" } });
}
async function handlePublish(req: Request, origin: string): Promise<Response> {
let body: any;
try {
body = await req.json();
} catch {
return Response.json({ error: "Invalid JSON" }, { status: 400 });
}
if (!body || typeof body !== "object") {
return Response.json({ error: "Expected JSON object" }, { status: 400 });
}
const htmlPath = typeof body.html === "string" ? body.html : "";
if (!htmlPath) return Response.json({ error: "Missing 'html' field" }, { status: 400 });
if (!fs.existsSync(htmlPath)) {
return Response.json({ error: `HTML file not found: ${htmlPath}` }, { status: 400 });
}
let resolvedHtml: string;
let sourceDir: string;
try {
resolvedHtml = fs.realpathSync(path.resolve(htmlPath));
sourceDir = fs.realpathSync(path.dirname(resolvedHtml));
} catch (e: any) {
return Response.json({ error: `Cannot resolve path: ${e.message}` }, { status: 400 });
}
if (!fs.statSync(resolvedHtml).isFile()) {
return Response.json(
{ error: `'html' must be a file, not a directory: ${htmlPath}` },
{ status: 400 },
);
}
// sourceDir comes from realpath(html), not from the body — Codex finding:
// body-supplied sourceDir is a local trust boundary the daemon shouldn't cross.
const existing = findActiveBoardForSourceDir(sourceDir);
if (existing) {
return Response.json(
{
error: "Source directory already in use by an active board",
existing: {
id: existing.id,
url: `${origin}/boards/${existing.id}/`,
state: existing.state,
},
},
{ status: 409 },
);
}
if (nonDoneCount() >= MAX_BOARDS) {
return Response.json(
{
error: `Cannot publish: ${MAX_BOARDS} non-done boards already exist. Submit or close some first.`,
},
{ status: 503 },
);
}
const id = newBoardId();
const htmlContent = fs.readFileSync(resolvedHtml, "utf-8");
const now = Date.now();
const board: Board = {
id,
htmlContent,
sourceDir,
allowedDir: sourceDir,
state: "serving",
publishedAt: now,
lastTouched: now,
publisherPid: typeof body.publisherPid === "number" ? body.publisherPid : 0,
title: typeof body.title === "string" ? body.title : undefined,
};
boards.set(id, board);
evictUntilUnderCap();
markMeaningfulActivity();
dlog(`published board ${id} sourceDir=${sourceDir} pid=${board.publisherPid}`);
return Response.json({
id,
url: `${origin}/boards/${id}/`,
sourceDir,
});
}
function handleBoardGet(board: Board): Response {
board.lastTouched = Date.now();
// No __GSTACK_SERVER_URL injection — board JS uses relative URLs that
// resolve against /boards/<id>/ (the trailing slash is load-bearing here;
// the 301 from the bare /boards/<id> form ensures it).
return new Response(board.htmlContent, {
headers: { "Content-Type": "text/html; charset=utf-8" },
});
}
function handleBoardProgress(board: Board): Response {
// NOT meaningful activity — bare progress polling shouldn't keep the
// daemon alive forever (Codex finding on idle-immortality).
board.lastTouched = Date.now();
return Response.json({ status: board.state });
}
async function handleBoardFeedback(board: Board, req: Request): Promise<Response> {
let body: any;
try {
body = await req.json();
} catch {
return Response.json({ error: "Invalid JSON" }, { status: 400 });
}
if (!body || typeof body !== "object") {
return Response.json({ error: "Expected JSON object" }, { status: 400 });
}
const isSubmit = body.regenerated === false;
const isRegen = body.regenerated === true;
// Augment with boardId + publishedAt so multi-board agents can disambiguate
// which board produced a given feedback.json.
const augmented = {
...body,
boardId: board.id,
publishedAt: new Date(board.publishedAt).toISOString(),
};
const feedbackFile = isSubmit ? "feedback.json" : "feedback-pending.json";
const feedbackPath = path.join(board.sourceDir, feedbackFile);
try {
fs.writeFileSync(feedbackPath, JSON.stringify(augmented, null, 2));
} catch (e: any) {
dlog(`feedback write failed for ${board.id}: ${e.message}`);
return Response.json(
{ error: `Cannot write ${feedbackFile}: ${e.message}` },
{ status: 500 },
);
}
board.lastTouched = Date.now();
markMeaningfulActivity();
if (isSubmit) {
board.state = "done";
dlog(`board ${board.id} submitted → ${feedbackPath}`);
return Response.json({ received: true, action: "submitted" });
}
if (isRegen) {
board.state = "regenerating";
dlog(`board ${board.id} regenerate → ${feedbackPath}`);
return Response.json({ received: true, action: "regenerate" });
}
return Response.json({ received: true, action: "unknown" });
}
async function handleBoardReload(board: Board, req: Request): Promise<Response> {
let body: any;
try {
body = await req.json();
} catch {
return Response.json({ error: "Invalid JSON" }, { status: 400 });
}
const newHtmlPath = typeof body?.html === "string" ? body.html : "";
if (!newHtmlPath || !fs.existsSync(newHtmlPath)) {
return Response.json({ error: `HTML file not found: ${newHtmlPath}` }, { status: 400 });
}
const resolvedReload = fs.realpathSync(path.resolve(newHtmlPath));
if (!resolvedReload.startsWith(board.allowedDir + path.sep)) {
return Response.json(
{ error: `Path must be within: ${board.allowedDir}` },
{ status: 403 },
);
}
if (!fs.statSync(resolvedReload).isFile()) {
return Response.json(
{ error: `Path must be a file, not a directory: ${newHtmlPath}` },
{ status: 400 },
);
}
board.htmlContent = fs.readFileSync(resolvedReload, "utf-8");
board.state = "serving";
board.lastTouched = Date.now();
markMeaningfulActivity();
dlog(`board ${board.id} reloaded from ${resolvedReload}`);
return Response.json({ reloaded: true });
}
function boardExpiredHtml(id: string): string {
return `<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><title>Board expired — gstack</title>
<style>body{font:14px/1.5 -apple-system,system-ui,sans-serif;max-width:600px;margin:80px auto;padding:0 20px;color:#1a1a1a;text-align:center}
h1{font-size:20px}.id{font-family:ui-monospace,monospace;color:#888;font-size:13px}
a{color:#0070f3;text-decoration:none}a:hover{text-decoration:underline}</style></head><body>
<h1>Board expired</h1>
<p>Board <span class="id">${escapeHtml(id)}</span> is no longer hosted by this daemon (evicted or the daemon restarted).</p>
<p><a href="/"> see active boards</a></p>
</body></html>`;
}
// ─── Router ──────────────────────────────────────────────────────
const BOARD_RE = /^\/boards\/([A-Za-z0-9_-]+)(\/.*)?$/;
export async function fetchHandler(req: Request): Promise<Response> {
const url = new URL(req.url);
const origin = url.origin;
if (req.method === "GET" && url.pathname === "/health") return handleHealth();
if (req.method === "GET" && url.pathname === "/") return handleIndex();
if (req.method === "POST" && url.pathname === "/api/boards") return handlePublish(req, origin);
if (req.method === "POST" && url.pathname === "/shutdown") {
if (hasActiveBoards()) {
return Response.json(
{
error: "Refusing /shutdown: daemon has active boards. Submit or close them first.",
activeBoards: nonDoneCount(),
},
{ status: 409 },
);
}
setTimeout(() => gracefulShutdown(0), 50);
return Response.json({ shuttingDown: true });
}
const m = url.pathname.match(BOARD_RE);
if (m) {
const id = m[1]!;
const subpath = m[2] || "";
const board = boards.get(id);
if (!board) {
return new Response(boardExpiredHtml(id), {
status: 404,
headers: { "Content-Type": "text/html; charset=utf-8" },
});
}
// Bare /boards/<id> → 301 to /boards/<id>/ so relative URLs in board JS
// resolve against the right base (./api/feedback → /boards/<id>/api/feedback).
if (req.method === "GET" && subpath === "") {
return new Response(null, {
status: 301,
headers: { Location: `/boards/${id}/` },
});
}
if (req.method === "GET" && subpath === "/") return handleBoardGet(board);
if (req.method === "GET" && subpath === "/api/progress") return handleBoardProgress(board);
if (req.method === "POST" && subpath === "/api/feedback") {
return withBoardMutex(id, () => handleBoardFeedback(board, req));
}
if (req.method === "POST" && subpath === "/api/reload") {
return withBoardMutex(id, () => handleBoardReload(board, req));
}
}
return new Response("Not found", { status: 404 });
}
// ─── Startup ─────────────────────────────────────────────────────
export function start(): { port: number } {
const portArg = process.env.DESIGN_DAEMON_PORT;
const port = portArg ? parseInt(portArg, 10) : 0;
serverRef = Bun.serve({
port,
hostname: "127.0.0.1",
fetch: fetchHandler,
});
const actualPort = serverRef.port;
const state: DaemonState = {
pid: process.pid,
port: actualPort,
startedAt: new Date().toISOString(),
version: VERSION,
serverPath: process.argv[1] || "",
cmdlineMarker: CMDLINE_MARKER,
};
writeStateFile(state);
dlog(`DAEMON_STARTED port=${actualPort} pid=${process.pid} version=${VERSION}`);
// Stdout line the spawning CLI parses to learn the port quickly.
console.log(`DAEMON_STARTED port=${actualPort}`);
idleInterval = setInterval(idleCheckTick, IDLE_CHECK_INTERVAL_MS);
process.on("SIGTERM", () => {
void gracefulShutdown(0);
});
process.on("SIGINT", () => {
void gracefulShutdown(0);
});
process.on("uncaughtException", (e) => {
dlog(`uncaughtException: ${(e as Error).stack || (e as Error).message}`);
void gracefulShutdown(1);
});
return { port: actualPort };
}
if (import.meta.main) {
start();
}
// Exported for tests. Keep this small and stable.
export const __testInternals__ = {
boards,
fetchHandler,
idleCheckTick,
markMeaningfulActivity,
resetForTest: (): void => {
boards.clear();
boardMutex.clear();
lastMeaningfulActivity = Date.now();
idleExtensions = 0;
shuttingDown = false;
},
};

View File

@ -1,12 +1,18 @@
/**
* HTTP server for the design comparison board feedback loop.
*
* Replaces the broken file:// + DOM polling approach. The server:
* 1. Serves the comparison board HTML over HTTP
* 2. Injects __GSTACK_SERVER_URL so the board POSTs feedback here
* 3. Prints feedback JSON to stdout (agent reads it)
* 4. Stays alive across regeneration rounds (stateful)
* 5. Auto-opens in the user's default browser
* Legacy single-process path: spawned by `$D compare --serve --no-daemon`.
* The daemon (`design/src/daemon.ts`) handles default invocations and hosts
* multiple boards under `/boards/<id>/`; this file stays as the escape hatch
* for tests and debugging. Board JS uses relative URLs and a
* location.protocol feature-detect, so the same generated HTML works at
* both `/` (here) and `/boards/<id>/` (daemon).
*
* The server:
* 1. Serves the comparison board HTML over HTTP at `/`
* 2. Prints feedback JSON to stdout (agent reads it)
* 3. Stays alive across regeneration rounds (stateful)
* 4. Auto-opens in the user's default browser
*
* State machine:
*
@ -69,17 +75,14 @@ export async function serve(options: ServeOptions): Promise<void> {
fetch(req) {
const url = new URL(req.url);
// Serve the comparison board HTML
// Serve the comparison board HTML. The board JS uses relative paths
// (./api/feedback, ./api/progress) and a location.protocol
// feature-detect, so no per-request injection is needed.
if (
req.method === "GET" &&
(url.pathname === "/" || url.pathname === "/index.html")
) {
// Inject the server URL so the board can POST feedback
const injected = htmlContent.replace(
"</head>",
`<script>window.__GSTACK_SERVER_URL = ${JSON.stringify(url.origin)};</script>\n</head>`,
);
return new Response(injected, {
return new Response(htmlContent, {
headers: { "Content-Type": "text/html; charset=utf-8" },
});
}
@ -194,19 +197,25 @@ export async function serve(options: ServeOptions): Promise<void> {
);
}
// Security: resolve symlinks and validate the reload path is within the
// allowed directory (anchored to the initial HTML file's parent).
// Prevents path traversal via /api/reload reading arbitrary files.
// Security: resolve symlinks and validate the reload path is a FILE
// inside the allowed directory (anchored to the initial HTML file's
// parent). Prevents path traversal via /api/reload reading arbitrary
// files. A path resolving to the allowedDir itself (a directory) used
// to pass the guard and then crash readFileSync with EISDIR — reject
// it explicitly with a clear 400 instead.
const resolvedReload = fs.realpathSync(path.resolve(newHtmlPath));
if (
!resolvedReload.startsWith(allowedDir + path.sep) &&
resolvedReload !== allowedDir
) {
if (!resolvedReload.startsWith(allowedDir + path.sep)) {
return Response.json(
{ error: `Path must be within: ${allowedDir}` },
{ status: 403 },
);
}
if (!fs.statSync(resolvedReload).isFile()) {
return Response.json(
{ error: `Path must be a file, not a directory: ${newHtmlPath}` },
{ status: 400 },
);
}
// Swap the HTML content
htmlContent = fs.readFileSync(resolvedReload, "utf-8");

View File

@ -0,0 +1,580 @@
/**
* Out-of-process tests for daemon-client.ts.
*
* Spawns real daemon subprocesses (via the fixtures helper) so we can
* exercise: state-file discovery, /health attach vs spawn, the lock +
* re-read-under-lock race, identity-verified SIGTERM, version mismatch
* with and without active boards, startup-error log surfacing, and the
* concurrent-CLIs race (two real subprocesses, one wins the lock).
*
* These tests are slower than daemon.test.ts (each spawn is ~200ms) so
* they're kept in a separate file to keep the in-process suite fast.
*/
import { afterEach, beforeEach, describe, expect, test } from "bun:test";
import { spawn } from "child_process";
import fs from "fs";
import os from "os";
import path from "path";
import {
daemonStatus,
ensureDaemon,
publishBoard,
shutdownDaemon,
} from "../src/daemon-client";
import {
acquireLock,
CMDLINE_MARKER,
isProcessAlive,
readStateFile,
resolveLockFilePath,
verifyIdentity,
} from "../src/daemon-state";
import {
DAEMON_SCRIPT,
makeBoardHtml,
makeTmpDir,
spawnDaemonForTest,
type SpawnedDaemon,
} from "./daemon-tests-fixtures";
let workDir: string;
let stateFile: string;
let activeDaemons: SpawnedDaemon[] = [];
beforeEach(() => {
workDir = makeTmpDir("discovery");
stateFile = path.join(workDir, "design.json");
// Each test gets a private state-file path; env var ensures both the
// client's resolver and any spawned daemons converge on the same file.
process.env.DESIGN_DAEMON_STATE_FILE = stateFile;
});
afterEach(async () => {
for (const d of activeDaemons.splice(0)) {
try { await d.stop(); } catch {}
}
// Tear down any state file left around so the next test starts clean.
try { fs.unlinkSync(stateFile); } catch {}
try { fs.unlinkSync(resolveLockFilePath(stateFile)); } catch {}
delete process.env.DESIGN_DAEMON_STATE_FILE;
try { fs.rmSync(workDir, { recursive: true, force: true }); } catch {}
});
async function spawn1(idleMs = 60_000): Promise<SpawnedDaemon> {
const d = await spawnDaemonForTest({ stateFile, idleMs });
activeDaemons.push(d);
return d;
}
// ─── healthCheck + readStateFile basics ──────────────────────────
describe("daemon-state helpers", () => {
test("readStateFile returns null when missing", () => {
expect(readStateFile(stateFile)).toBeNull();
});
test("spawned daemon writes a usable state file", async () => {
const d = await spawn1();
const state = readStateFile(stateFile);
expect(state).not.toBeNull();
expect(state!.pid).toBe(d.proc.pid);
expect(state!.port).toBe(d.port);
expect(state!.cmdlineMarker).toBe(CMDLINE_MARKER);
expect(state!.version).toBe("test-version");
});
test("verifyIdentity matches a real spawned daemon's cmdline", async () => {
const d = await spawn1();
expect(verifyIdentity(d.proc.pid!, CMDLINE_MARKER)).toBe(true);
// wrong marker → false
expect(verifyIdentity(d.proc.pid!, "some-other-marker-xyz")).toBe(false);
});
test("verifyIdentity returns false for dead pids", async () => {
expect(verifyIdentity(999_999_999, CMDLINE_MARKER)).toBe(false);
});
});
// ─── ensureDaemon ────────────────────────────────────────────────
describe("ensureDaemon", () => {
test("with no state file: spawns a fresh daemon", async () => {
const result = await ensureDaemon({
version: "test-version",
stateFile,
verbose: false,
});
expect(result.spawned).toBe(true);
expect(result.port).toBeGreaterThan(0);
expect(result.version).toBe("test-version");
const state = readStateFile(stateFile);
expect(state).not.toBeNull();
expect(isProcessAlive(state!.pid)).toBe(true);
// Track for cleanup
activeDaemons.push({
proc: { pid: state!.pid } as any,
port: state!.port,
stateFile,
stop: async () => {
try { process.kill(state!.pid, "SIGTERM"); } catch {}
},
});
});
test("with a healthy daemon already running: attaches without spawning", async () => {
const existing = await spawn1();
const result = await ensureDaemon({
version: "test-version",
stateFile,
verbose: false,
});
expect(result.spawned).toBe(false);
expect(result.port).toBe(existing.port);
});
test("with a stale state file (PID dead): spawns fresh, overwrites state", async () => {
// Synthesize a stale state file pointing at a definitely-dead pid.
fs.mkdirSync(path.dirname(stateFile), { recursive: true });
fs.writeFileSync(stateFile, JSON.stringify({
pid: 999_999_998,
port: 1, // bogus port — /health will fail fast
startedAt: "2020-01-01T00:00:00Z",
version: "ancient",
serverPath: "/nope",
cmdlineMarker: CMDLINE_MARKER,
}));
const result = await ensureDaemon({
version: "test-version",
stateFile,
verbose: false,
});
expect(result.spawned).toBe(true);
// State file should now point at the live daemon.
const fresh = readStateFile(stateFile);
expect(fresh!.pid).not.toBe(999_999_998);
expect(isProcessAlive(fresh!.pid)).toBe(true);
activeDaemons.push({
proc: { pid: fresh!.pid } as any,
port: fresh!.port,
stateFile,
stop: async () => { try { process.kill(fresh!.pid, "SIGTERM"); } catch {} },
});
});
test("PID-reuse safety: stale state with an unrelated alive PID → identity-verify blocks signal, daemon spawned", async () => {
// Use the current test process's PID — definitely alive, definitely
// does NOT have CMDLINE_MARKER in its cmdline (it's the Bun test runner).
fs.mkdirSync(path.dirname(stateFile), { recursive: true });
fs.writeFileSync(stateFile, JSON.stringify({
pid: process.pid, // alive but NOT a daemon
port: 1,
startedAt: "2020-01-01T00:00:00Z",
version: "ancient",
serverPath: "/nope",
cmdlineMarker: CMDLINE_MARKER,
}));
// ensureDaemon should NOT signal process.pid (we'd kill ourselves);
// verifyIdentity catches the cmdline mismatch and skips the kill.
const result = await ensureDaemon({
version: "test-version",
stateFile,
verbose: false,
});
// We're still alive (didn't get killed)
expect(isProcessAlive(process.pid)).toBe(true);
expect(result.spawned).toBe(true);
const fresh = readStateFile(stateFile);
expect(fresh!.pid).not.toBe(process.pid);
activeDaemons.push({
proc: { pid: fresh!.pid } as any,
port: fresh!.port,
stateFile,
stop: async () => { try { process.kill(fresh!.pid, "SIGTERM"); } catch {} },
});
});
test("version mismatch with NO active boards: gracefully shuts existing down and respawns", async () => {
const existing = await spawn1();
// The existing daemon's version is "test-version" (set by fixture env).
// ensureDaemon with a DIFFERENT version → should /shutdown the existing
// (no active boards) and spawn fresh.
const result = await ensureDaemon({
version: "different-version",
stateFile,
verbose: false,
});
expect(result.spawned).toBe(true);
expect(result.version).toBe("different-version");
// existing.proc.pid should be gone by now (or soon)
// Give it a moment for the /shutdown + SIGTERM to take effect
await new Promise((r) => setTimeout(r, 200));
expect(isProcessAlive(existing.proc.pid!)).toBe(false);
// New daemon recorded
const fresh = readStateFile(stateFile);
expect(fresh!.pid).not.toBe(existing.proc.pid);
activeDaemons.push({
proc: { pid: fresh!.pid } as any,
port: fresh!.port,
stateFile,
stop: async () => { try { process.kill(fresh!.pid, "SIGTERM"); } catch {} },
});
});
test("version mismatch WITH active boards: refuses to kill, exits 1 with user-actionable error", async () => {
// Run the ensureDaemon-that-would-exit-1 in a subprocess so we can
// observe the exit code and stderr without killing the test runner.
const existing = await spawn1();
// Publish a board so activeBoards > 0
const html = makeBoardHtml(workDir);
await publishBoard({ port: existing.port, html });
// Sanity: status should reflect the active board
const statusResp = await fetch(`http://127.0.0.1:${existing.port}/health`);
const status = (await statusResp.json()) as any;
expect(status.activeBoards).toBe(1);
// Now run a tiny script that calls ensureDaemon with a mismatched
// version. It should print the WARNING + exit 1.
const scriptPath = path.join(workDir, "ensure-mismatch.ts");
fs.writeFileSync(scriptPath, `
import { ensureDaemon } from "${path.resolve(import.meta.dir, "..", "src", "daemon-client.ts").replace(/\\\\/g, "/")}";
await ensureDaemon({
version: "totally-different-version",
stateFile: ${JSON.stringify(stateFile)},
verbose: true,
});
console.log("REACHED_AFTER_ENSURE — should not happen");
`);
const child = spawn("bun", ["run", scriptPath], {
env: { ...process.env, DESIGN_DAEMON_STATE_FILE: stateFile },
stdio: ["ignore", "pipe", "pipe"],
});
const stderrChunks: Buffer[] = [];
const stdoutChunks: Buffer[] = [];
child.stderr.on("data", (c) => stderrChunks.push(c));
child.stdout.on("data", (c) => stdoutChunks.push(c));
const exitCode = await new Promise<number>((resolve) => {
child.on("exit", (code) => resolve(code ?? -1));
});
const stderr = Buffer.concat(stderrChunks).toString();
const stdout = Buffer.concat(stdoutChunks).toString();
expect(exitCode).toBe(1);
expect(stderr).toContain("active board");
expect(stderr).toContain("Refusing to auto-kill");
// We must NOT have reached the post-ensure line
expect(stdout).not.toContain("REACHED_AFTER_ENSURE");
// And the existing daemon should still be alive
expect(isProcessAlive(existing.proc.pid!)).toBe(true);
}, 15_000);
});
// ─── publishBoard ────────────────────────────────────────────────
describe("publishBoard", () => {
test("publishes a board through the real HTTP path and returns id+url+sourceDir", async () => {
const d = await spawn1();
const htmlPath = makeBoardHtml(workDir, "<p>via-client</p>");
const result = await publishBoard({ port: d.port, html: htmlPath });
expect(result.id).toMatch(/^b-/);
expect(result.url).toBe(`http://127.0.0.1:${d.port}/boards/${result.id}/`);
expect(result.sourceDir).toBe(fs.realpathSync(workDir));
// Confirm the board is actually fetchable at the returned URL
const r = await fetch(result.url);
expect(r.status).toBe(200);
const html = await r.text();
expect(html).toContain("via-client");
});
test("409 surfaces existing board's id+url (returned object, no throw)", async () => {
const d = await spawn1();
const htmlPath = makeBoardHtml(workDir);
const first = await publishBoard({ port: d.port, html: htmlPath });
const htmlPath2 = makeBoardHtml(workDir, "<p>second</p>");
const second = await publishBoard({ port: d.port, html: htmlPath2 });
// Same sourceDir → 409 with `existing` field; publishBoard returns it
// so the caller can attach to the existing board.
expect(second.id).toBe(first.id);
expect(second.url).toBe(first.url);
});
});
// ─── shutdownDaemon / daemonStatus ───────────────────────────────
describe("shutdownDaemon + daemonStatus", () => {
test("status reports not-running when no state file", async () => {
const s = await daemonStatus();
expect(s.running).toBe(false);
});
test("status reports running with port + version + counts when daemon alive", async () => {
const d = await spawn1();
const s = await daemonStatus();
expect(s.running).toBe(true);
if (s.running) {
expect(s.port).toBe(d.port);
expect(s.pid).toBe(d.proc.pid);
expect(s.version).toBe("test-version");
expect(s.boards).toBe(0);
expect(s.activeBoards).toBe(0);
}
});
test("shutdownDaemon succeeds when no active boards", async () => {
const d = await spawn1();
const r = await shutdownDaemon();
expect(r.stopped).toBe(true);
// Give it a moment to die
await new Promise((res) => setTimeout(res, 300));
expect(isProcessAlive(d.proc.pid!)).toBe(false);
});
test("shutdownDaemon refuses (without force) when active boards present", async () => {
const d = await spawn1();
await publishBoard({ port: d.port, html: makeBoardHtml(workDir) });
const r = await shutdownDaemon();
expect(r.stopped).toBe(false);
expect(r.reason).toContain("active");
expect(r.activeBoards).toBe(1);
// Daemon still running
expect(isProcessAlive(d.proc.pid!)).toBe(true);
});
test("shutdownDaemon with force=true ignores active boards", async () => {
const d = await spawn1();
await publishBoard({ port: d.port, html: makeBoardHtml(workDir) });
const r = await shutdownDaemon({ force: true });
expect(r.stopped).toBe(true);
});
});
// ─── Real idle-shutdown behavior (spawned daemon, fast clock) ───
//
// The lastMeaningfulActivity timestamp is not observable from outside the
// daemon process, so the only way to prove "bare GETs do not reset the
// idle timer" is to spawn a real daemon with a short idle window, hit
// progress polls in a loop, and watch the process exit anyway.
//
// These tests aim for ~3-5s real time per test by setting IDLE_MS=2000
// and CHECK_MS=200. The idle-with-active-boards extension path needs a
// board in `serving` state to exercise.
describe("daemon idle-shutdown behavior (real process)", () => {
// Wait for a child process to exit, with a deadline. Resolves true on
// observed exit, false on timeout. Doesn't kill on timeout — caller does.
async function waitForExit(pid: number, timeoutMs: number): Promise<boolean> {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
if (!isProcessAlive(pid)) return true;
await new Promise((r) => setTimeout(r, 100));
}
return false;
}
test("idle daemon (no boards) shuts itself down after IDLE_MS + CHECK_MS", async () => {
const d = await spawnDaemonForTest({
stateFile,
idleMs: 2_000,
checkMs: 200,
});
// Don't push to activeDaemons; the daemon should self-exit and the
// afterEach SIGTERM would race with that. Track manually.
try {
// No boards published. lastMeaningfulActivity is the startup time.
// Wait IDLE_MS + a couple CHECK_MS intervals for the timer to fire.
const exited = await waitForExit(d.proc.pid!, 5_000);
expect(exited).toBe(true);
// State file removed by gracefulShutdown
expect(readStateFile(stateFile)).toBeNull();
} finally {
if (isProcessAlive(d.proc.pid!)) {
try { d.proc.kill("SIGKILL"); } catch {}
}
}
}, 10_000);
test("bare GET polling does NOT prevent idle shutdown (progress polls don't reset idle)", async () => {
const d = await spawnDaemonForTest({
stateFile,
idleMs: 2_000,
checkMs: 200,
});
let polling = true;
let pollCount = 0;
const boardDir = makeTmpDir("idle-poll");
try {
const board = await publishBoard({
port: d.port,
html: makeBoardHtml(boardDir),
});
// Submit so the board becomes `done` — non-done would trigger the
// 1h extension path and keep the daemon alive past IDLE_MS.
await fetch(`${board.url}api/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ regenerated: false, preferred: "A" }),
});
// Hammer /api/progress every 200ms in the background. If bare GETs
// reset meaningful activity, the daemon would never idle out.
const pollLoop = (async () => {
while (polling) {
try {
await fetch(`${board.url}api/progress`);
pollCount += 1;
} catch {
// daemon went away
break;
}
await new Promise((r) => setTimeout(r, 200));
}
})();
const exited = await waitForExit(d.proc.pid!, 6_000);
polling = false;
await pollLoop;
expect(exited).toBe(true);
// We polled at least a few times before the daemon idled out
expect(pollCount).toBeGreaterThan(3);
expect(readStateFile(stateFile)).toBeNull();
} finally {
polling = false;
if (isProcessAlive(d.proc.pid!)) {
try { d.proc.kill("SIGKILL"); } catch {}
}
try { fs.rmSync(boardDir, { recursive: true, force: true }); } catch {}
}
}, 15_000);
test("idle with active (non-done) boards triggers extension instead of shutdown", async () => {
// With non-done boards, the daemon should NOT shut down on the first
// idle check after IDLE_MS — it extends. Verify it's still alive past
// the would-be-shutdown deadline. The MAX_EXTENSIONS=4 hard ceiling
// would take 4 * 1h = 4h to exercise with default extension window,
// so we shrink both IDLE and EXTENSION via env to test it in seconds.
const d = await spawnDaemonForTest({
stateFile,
idleMs: 1_500,
checkMs: 200,
env: {
DESIGN_DAEMON_EXTENSION_MS: "1500",
DESIGN_DAEMON_MAX_EXTENSIONS: "2",
},
});
const boardDir = makeTmpDir("idle-active");
try {
await publishBoard({ port: d.port, html: makeBoardHtml(boardDir) });
// Daemon has 1 non-done board. After IDLE_MS, idleCheckTick should
// extend rather than shut down. So at IDLE_MS + small margin, it's
// still alive.
await new Promise((r) => setTimeout(r, 2_500));
expect(isProcessAlive(d.proc.pid!)).toBe(true);
expect(readStateFile(stateFile)).not.toBeNull();
// After MAX_EXTENSIONS extension windows (2 * 1500ms = 3000ms more),
// the hard ceiling kicks in and force-shutdown fires. Total wait:
// IDLE_MS(1500) + EXT*MAX(3000) + slack(1000) = ~5500ms. We've already
// waited 2500ms, so 4000ms more.
const exited = await waitForExit(d.proc.pid!, 5_500);
expect(exited).toBe(true);
expect(readStateFile(stateFile)).toBeNull();
} finally {
if (isProcessAlive(d.proc.pid!)) {
try { d.proc.kill("SIGKILL"); } catch {}
}
try { fs.rmSync(boardDir, { recursive: true, force: true }); } catch {}
}
}, 15_000);
});
// ─── Concurrent ensureDaemon race (one wins the lock) ───────────
describe("concurrent ensureDaemon race", () => {
test("two parallel ensureDaemon() calls converge on one daemon (one spawned, one attached)", async () => {
// Fire two ensureDaemon calls in parallel against the same empty
// stateFile. The fs.openSync('wx') lock should make exactly one win
// the spawn race; the loser waits for the first to write the state
// file, then attaches.
const [a, b] = await Promise.all([
ensureDaemon({ version: "test-version", stateFile, verbose: false }),
ensureDaemon({ version: "test-version", stateFile, verbose: false }),
]);
// Both got the same port (same daemon)
expect(a.port).toBe(b.port);
// Exactly one spawned, one attached
const spawnedCount = [a.spawned, b.spawned].filter(Boolean).length;
expect(spawnedCount).toBe(1);
// Exactly one daemon process is alive at that port
const state = readStateFile(stateFile);
expect(state).not.toBeNull();
expect(isProcessAlive(state!.pid)).toBe(true);
// Lock file cleaned up (the winner released it on exit from the try block)
expect(fs.existsSync(resolveLockFilePath(stateFile))).toBe(false);
// Track for cleanup
activeDaemons.push({
proc: { pid: state!.pid } as any,
port: state!.port,
stateFile,
stop: async () => {
try { process.kill(state!.pid, "SIGTERM"); } catch {}
},
});
}, 15_000);
});
// ─── Stale-lock reclaim ──────────────────────────────────────────
describe("acquireLock stale-lock reclaim", () => {
test("reclaims a lockfile owned by a dead PID and writes our PID", () => {
const lockPath = resolveLockFilePath(stateFile);
// Plant a lockfile owned by a definitely-dead PID
fs.mkdirSync(path.dirname(lockPath), { recursive: true });
fs.writeFileSync(lockPath, "999999998\n");
const release = acquireLock(lockPath);
expect(release).not.toBeNull();
// Lock file now contains our PID
expect(fs.readFileSync(lockPath, "utf-8").trim()).toBe(String(process.pid));
release!();
// Released = lock file gone
expect(fs.existsSync(lockPath)).toBe(false);
});
test("refuses to reclaim a lockfile owned by an alive (unrelated) PID", () => {
const lockPath = resolveLockFilePath(stateFile);
fs.mkdirSync(path.dirname(lockPath), { recursive: true });
// Use this test process's own PID — it's alive AND unrelated to a daemon.
// acquireLock should refuse and return null without unlinking the lock.
fs.writeFileSync(lockPath, `${process.pid}\n`);
const release = acquireLock(lockPath);
expect(release).toBeNull();
// Lock file is untouched
expect(fs.readFileSync(lockPath, "utf-8").trim()).toBe(String(process.pid));
// Cleanup
try { fs.unlinkSync(lockPath); } catch {}
});
});

View File

@ -0,0 +1,135 @@
/**
* Shared helpers for daemon + daemon-client tests.
*
* Two test styles live here:
* - In-process: import fetchHandler from daemon.ts and call it with a
* synthesized Request. Fast, no spawn, no HTTP. Covers routing +
* handler semantics. Used by most of daemon.test.ts.
* - Out-of-process: spawn `bun run design/src/daemon.ts` with a tmp
* state file + env overrides, then HTTP against the bound port.
* Slow but only path that proves real spawn + state file + signal
* handling work. Used by daemon-discovery.test.ts.
*/
import { spawn, type ChildProcess } from "child_process";
import fs from "fs";
import os from "os";
import path from "path";
import { __testInternals__ } from "../src/daemon";
export const DAEMON_SCRIPT = path.join(import.meta.dir, "..", "src", "daemon.ts");
export function makeTmpDir(prefix = "design-daemon-test"): string {
return fs.mkdtempSync(path.join(os.tmpdir(), `${prefix}-`));
}
export function makeBoardHtml(tmpDir: string, body = "<p>Test board</p>"): string {
const p = path.join(tmpDir, "design-board.html");
fs.writeFileSync(
p,
`<!DOCTYPE html><html><head></head><body>${body}</body></html>`,
);
return p;
}
/** Reset the in-process daemon state between tests. */
export function resetDaemon(): void {
__testInternals__.resetForTest();
}
/** Build a Request for the in-process fetchHandler tests. */
export function req(method: string, urlPath: string, body?: unknown): Request {
const init: RequestInit = { method };
if (body !== undefined) {
init.body = typeof body === "string" ? body : JSON.stringify(body);
init.headers = { "Content-Type": "application/json" };
}
return new Request(`http://127.0.0.1:1234${urlPath}`, init);
}
export interface SpawnedDaemon {
proc: ChildProcess;
port: number;
stateFile: string;
stop: () => Promise<void>;
}
/**
* Spawn a real daemon process pointed at a per-test state file, with an
* aggressive idle window so idle-shutdown tests don't take 24h. Resolves
* when stdout emits `DAEMON_STARTED port=<N>`.
*/
export async function spawnDaemonForTest(
opts: { stateFile?: string; idleMs?: number; checkMs?: number; env?: Record<string, string> } = {},
): Promise<SpawnedDaemon> {
const stateFile = opts.stateFile ?? path.join(makeTmpDir("daemon-state"), "design.json");
const env: Record<string, string> = {
...(process.env as Record<string, string>),
// DESIGN_DAEMON_STATE_FILE points both daemon and any same-process
// discovery at this test's state file (overrides resolveStateFilePath).
DESIGN_DAEMON_STATE_FILE: stateFile,
DESIGN_DAEMON_IDLE_MS: String(opts.idleMs ?? 60_000),
DESIGN_DAEMON_CHECK_MS: String(opts.checkMs ?? 1000),
DESIGN_DAEMON_VERSION: "test-version",
...(opts.env ?? {}),
};
// Spawn with a marker in argv so cmdline-based identity verification
// exercises the real CMDLINE_MARKER ("gstack-design-daemon").
const proc = spawn(
"bun",
["run", DAEMON_SCRIPT, "--marker", "gstack-design-daemon"],
{
env,
stdio: ["ignore", "pipe", "pipe"],
cwd: path.dirname(stateFile),
},
);
const port = await new Promise<number>((resolve, reject) => {
const onTimeout = setTimeout(() => {
proc.kill("SIGKILL");
reject(new Error("Daemon failed to emit DAEMON_STARTED within 5s"));
}, 5000);
proc.stdout!.on("data", (chunk: Buffer) => {
const line = chunk.toString();
const m = line.match(/DAEMON_STARTED port=(\d+)/);
if (m) {
clearTimeout(onTimeout);
resolve(parseInt(m[1]!, 10));
}
});
proc.on("error", (e) => {
clearTimeout(onTimeout);
reject(e);
});
proc.on("exit", (code) => {
clearTimeout(onTimeout);
reject(new Error(`Daemon exited early with code ${code}`));
});
});
return {
proc,
port,
stateFile,
stop: async () => {
proc.kill("SIGTERM");
await new Promise<void>((r) => {
const t = setTimeout(() => {
try {
proc.kill("SIGKILL");
} catch {
// gone
}
r();
}, 2000);
proc.on("exit", () => {
clearTimeout(t);
r();
});
});
},
};
}

534
design/test/daemon.test.ts Normal file
View File

@ -0,0 +1,534 @@
/**
* In-process tests for design daemon endpoints + lifecycle helpers.
*
* Uses the exported fetchHandler directly (no Bun.serve spawn) so the suite
* is fast and deterministic. Spawn-based tests live in
* daemon-discovery.test.ts.
*/
import { afterEach, beforeEach, describe, expect, test } from "bun:test";
import fs from "fs";
import path from "path";
import { __testInternals__, fetchHandler, idleCheckTick } from "../src/daemon";
const { markMeaningfulActivity } = __testInternals__;
import { makeBoardHtml, makeTmpDir, req, resetDaemon } from "./daemon-tests-fixtures";
let tmpDir: string;
beforeEach(() => {
resetDaemon();
tmpDir = makeTmpDir();
});
afterEach(() => {
try {
fs.rmSync(tmpDir, { recursive: true, force: true });
} catch {
// already gone
}
});
async function publishTestBoard(opts: { dir?: string; body?: string; title?: string } = {}) {
const dir = opts.dir ?? tmpDir;
const htmlPath = makeBoardHtml(dir, opts.body ?? "<p>Test</p>");
const r = await fetchHandler(
req("POST", "/api/boards", { html: htmlPath, title: opts.title }),
);
expect(r.status).toBe(200);
const body = (await r.json()) as { id: string; url: string; sourceDir: string };
return { ...body, htmlPath, dir };
}
// ─── /health ─────────────────────────────────────────────────────
describe("daemon /health", () => {
test("returns ok=true with version + boards counts", async () => {
const r = await fetchHandler(req("GET", "/health"));
expect(r.status).toBe(200);
const body = (await r.json()) as any;
expect(body.ok).toBe(true);
expect(typeof body.version).toBe("string");
expect(body.boards).toBe(0);
expect(body.activeBoards).toBe(0);
expect(typeof body.uptime).toBe("number");
});
test("activeBoards counts non-done after publish", async () => {
await publishTestBoard();
const r = await fetchHandler(req("GET", "/health"));
const body = (await r.json()) as any;
expect(body.boards).toBe(1);
expect(body.activeBoards).toBe(1);
});
});
// ─── POST /api/boards (publish) ─────────────────────────────────
describe("daemon /api/boards (publish)", () => {
test("publishes a board and returns id + url + derived sourceDir", async () => {
const htmlPath = makeBoardHtml(tmpDir);
const r = await fetchHandler(req("POST", "/api/boards", { html: htmlPath }));
expect(r.status).toBe(200);
const body = (await r.json()) as any;
expect(body.id).toMatch(/^b-\d{8}-\d{6}-[a-z0-9]{6}$/);
expect(body.url).toMatch(/\/boards\/b-\d{8}-\d{6}-[a-z0-9]{6}\/$/); // trailing slash
expect(body.sourceDir).toBe(fs.realpathSync(tmpDir));
});
test("rejects when html field missing", async () => {
const r = await fetchHandler(req("POST", "/api/boards", { title: "noop" }));
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("Missing 'html'");
});
test("rejects when html file does not exist", async () => {
const r = await fetchHandler(
req("POST", "/api/boards", { html: "/tmp/does-not-exist.html" }),
);
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("not found");
});
test("rejects when html points at a directory", async () => {
const r = await fetchHandler(req("POST", "/api/boards", { html: tmpDir }));
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("must be a file");
});
test("ignores body-supplied sourceDir; derives from realpath(html) instead", async () => {
const htmlPath = makeBoardHtml(tmpDir);
const otherDir = makeTmpDir("sneaky");
try {
const r = await fetchHandler(
req("POST", "/api/boards", { html: htmlPath, sourceDir: otherDir }),
);
expect(r.status).toBe(200);
const body = (await r.json()) as any;
// The daemon used the realpath of the HTML's dir, NOT the body field.
expect(body.sourceDir).toBe(fs.realpathSync(tmpDir));
expect(body.sourceDir).not.toBe(fs.realpathSync(otherDir));
} finally {
try {
fs.rmSync(otherDir, { recursive: true, force: true });
} catch {
// already gone
}
}
});
test("409 when a non-done board already claims the same sourceDir", async () => {
const first = await publishTestBoard();
const htmlPath = makeBoardHtml(tmpDir, "<p>Second attempt</p>");
const r = await fetchHandler(req("POST", "/api/boards", { html: htmlPath }));
expect(r.status).toBe(409);
const body = (await r.json()) as any;
expect(body.error).toContain("already in use");
expect(body.existing.id).toBe(first.id);
expect(body.existing.url).toContain(`/boards/${first.id}/`);
});
test("allows publish to same sourceDir after the prior board is done", async () => {
const first = await publishTestBoard();
// Submit the first board so it becomes done
await fetchHandler(
req("POST", `/boards/${first.id}/api/feedback`, { regenerated: false }),
);
const htmlPath = makeBoardHtml(tmpDir, "<p>Round two</p>");
const r = await fetchHandler(req("POST", "/api/boards", { html: htmlPath }));
expect(r.status).toBe(200);
});
});
// ─── GET /boards/<id> trailing-slash redirect ────────────────────
describe("daemon /boards/<id> trailing-slash redirect", () => {
test("GET /boards/<id> returns 301 with Location /boards/<id>/", async () => {
const board = await publishTestBoard();
const r = await fetchHandler(req("GET", `/boards/${board.id}`));
expect(r.status).toBe(301);
expect(r.headers.get("Location")).toBe(`/boards/${board.id}/`);
});
test("GET /boards/<id>/ renders the board's HTML", async () => {
const board = await publishTestBoard({ body: "<p>Hello from board</p>" });
const r = await fetchHandler(req("GET", `/boards/${board.id}/`));
expect(r.status).toBe(200);
expect(r.headers.get("Content-Type") || "").toContain("text/html");
const html = await r.text();
expect(html).toContain("Hello from board");
// No __GSTACK_SERVER_URL injection (board JS uses relative paths)
expect(html).not.toContain("__GSTACK_SERVER_URL");
});
test("404 on unknown board id (shows expired page)", async () => {
const r = await fetchHandler(req("GET", "/boards/b-nonexistent/"));
expect(r.status).toBe(404);
const html = await r.text();
expect(html).toContain("Board expired");
});
});
// ─── POST /boards/<id>/api/feedback ──────────────────────────────
describe("daemon /boards/<id>/api/feedback", () => {
test("submit writes feedback.json to derived sourceDir with boardId + publishedAt", async () => {
const board = await publishTestBoard();
const feedback = { preferred: "A", ratings: { A: 5 }, regenerated: false };
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/feedback`, feedback),
);
expect(r.status).toBe(200);
expect(((await r.json()) as any).action).toBe("submitted");
const written = JSON.parse(
fs.readFileSync(path.join(board.sourceDir, "feedback.json"), "utf-8"),
);
expect(written.preferred).toBe("A");
expect(written.regenerated).toBe(false);
expect(written.boardId).toBe(board.id);
expect(typeof written.publishedAt).toBe("string");
expect(written.publishedAt).toMatch(/^\d{4}-\d{2}-\d{2}T/);
});
test("regenerate writes feedback-pending.json and flips state to regenerating", async () => {
const board = await publishTestBoard();
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/feedback`, {
regenerated: true,
regenerateAction: "more_like_A",
}),
);
expect(r.status).toBe(200);
expect(((await r.json()) as any).action).toBe("regenerate");
expect(fs.existsSync(path.join(board.sourceDir, "feedback-pending.json"))).toBe(true);
expect(fs.existsSync(path.join(board.sourceDir, "feedback.json"))).toBe(false);
const progress = await fetchHandler(
req("GET", `/boards/${board.id}/api/progress`),
);
expect(((await progress.json()) as any).status).toBe("regenerating");
});
test("cross-board isolation: feedback writes only into that board's sourceDir", async () => {
const dirA = makeTmpDir("board-a");
const dirB = makeTmpDir("board-b");
try {
const htmlA = makeBoardHtml(dirA);
const htmlB = makeBoardHtml(dirB);
const a = (await (await fetchHandler(
req("POST", "/api/boards", { html: htmlA }),
)).json()) as any;
const b = (await (await fetchHandler(
req("POST", "/api/boards", { html: htmlB }),
)).json()) as any;
expect(a.id).not.toBe(b.id);
await fetchHandler(
req("POST", `/boards/${a.id}/api/feedback`, { preferred: "A", regenerated: false }),
);
expect(fs.existsSync(path.join(a.sourceDir, "feedback.json"))).toBe(true);
// Board B's directory must not have been touched
expect(fs.existsSync(path.join(b.sourceDir, "feedback.json"))).toBe(false);
expect(fs.existsSync(path.join(b.sourceDir, "feedback-pending.json"))).toBe(false);
} finally {
try { fs.rmSync(dirA, { recursive: true, force: true }); } catch {}
try { fs.rmSync(dirB, { recursive: true, force: true }); } catch {}
}
});
test("rejects malformed JSON body", async () => {
const board = await publishTestBoard();
const bad = new Request(`http://127.0.0.1/boards/${board.id}/api/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: "{not json",
});
const r = await fetchHandler(bad);
expect(r.status).toBe(400);
});
});
// ─── POST /boards/<id>/api/reload ────────────────────────────────
describe("daemon /boards/<id>/api/reload", () => {
test("swaps HTML in place; subsequent GET returns new content", async () => {
const board = await publishTestBoard({ body: "<p>round 1</p>" });
const newHtml = makeBoardHtml(tmpDir, "<p>round 2</p>");
// The reload helper writes to design-board.html; make a distinct path
fs.writeFileSync(path.join(tmpDir, "round2.html"), "<html><body><p>round 2</p></body></html>");
const reloadPath = path.join(tmpDir, "round2.html");
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/reload`, { html: reloadPath }),
);
expect(r.status).toBe(200);
const page = await fetchHandler(req("GET", `/boards/${board.id}/`));
expect(await page.text()).toContain("round 2");
});
test("rejects path traversal outside allowedDir", async () => {
const board = await publishTestBoard();
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/reload`, { html: "/etc/passwd" }),
);
expect(r.status).toBe(403);
});
test("rejects directory path (Codex finding regression guard)", async () => {
const board = await publishTestBoard();
const sub = path.join(tmpDir, "subdir");
fs.mkdirSync(sub, { recursive: true });
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/reload`, { html: sub }),
);
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("must be a file");
});
test("rejects symlink pointing out of allowedDir", async () => {
const board = await publishTestBoard();
const linkPath = path.join(tmpDir, "evil.html");
try {
fs.symlinkSync("/etc/passwd", linkPath);
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/reload`, { html: linkPath }),
);
expect(r.status).toBe(403);
} finally {
try { fs.unlinkSync(linkPath); } catch {}
}
});
});
// ─── GET / (index) ───────────────────────────────────────────────
describe("daemon / (index)", () => {
test("empty state shows the no-boards message", async () => {
const r = await fetchHandler(req("GET", "/"));
expect(r.status).toBe(200);
const html = await r.text();
expect(html).toContain("No boards yet");
});
test("lists boards newest first with state badges", async () => {
const a = await publishTestBoard({ title: "first" });
// Small wait so publishedAt differs
await new Promise((r) => setTimeout(r, 5));
const dirB = makeTmpDir("index-b");
try {
const htmlB = makeBoardHtml(dirB);
const b = (await (await fetchHandler(
req("POST", "/api/boards", { html: htmlB, title: "second" }),
)).json()) as any;
const html = await (await fetchHandler(req("GET", "/"))).text();
const idxA = html.indexOf(a.id);
const idxB = html.indexOf(b.id);
// Newest first: b appears before a
expect(idxB).toBeGreaterThanOrEqual(0);
expect(idxA).toBeGreaterThan(idxB);
// State badge present
expect(html).toMatch(/state-serving/);
} finally {
try { fs.rmSync(dirB, { recursive: true, force: true }); } catch {}
}
});
});
// ─── /shutdown ───────────────────────────────────────────────────
describe("daemon /shutdown", () => {
test("refuses /shutdown when boards are non-done", async () => {
await publishTestBoard();
const r = await fetchHandler(req("POST", "/shutdown"));
expect(r.status).toBe(409);
const body = (await r.json()) as any;
expect(body.error).toContain("active boards");
expect(body.activeBoards).toBe(1);
});
test("accepts /shutdown when no active boards (graceful path)", async () => {
// Publish then submit so state=done
const board = await publishTestBoard();
await fetchHandler(
req("POST", `/boards/${board.id}/api/feedback`, { regenerated: false }),
);
// Now non-done count is 0 — handler should return shuttingDown:true.
// We DON'T let the real gracefulShutdown timer fire (it calls process.exit
// after 50ms which would tear down the test runner); instead we just
// observe the immediate response.
const r = await fetchHandler(req("POST", "/shutdown"));
expect(r.status).toBe(200);
const body = (await r.json()) as any;
expect(body.shuttingDown).toBe(true);
// Reset state for subsequent tests; the shutdown timer will be a no-op
// because the next resetForTest flips shuttingDown back to false.
resetDaemon();
});
});
// ─── LRU + non-done protection ───────────────────────────────────
describe("daemon LRU eviction", () => {
test("evicts done boards in preference to non-done", async () => {
// Seed the map directly so we don't have to publish 50 real boards.
// Setup: 10 done (oldest) + 40 serving (newer) = 50 total, 40 non-done.
// Publishing a 51st board: nonDoneCount(40) < MAX(50) → accepts, inserts,
// size=51, then evictUntilUnderCap kicks out the LRU done.
const boards = __testInternals__.boards;
const mk = (id: string, state: "serving" | "done", lastTouched: number) => {
boards.set(id, {
id,
htmlContent: "<p>seeded</p>",
sourceDir: `/tmp/seeded-${id}`,
allowedDir: `/tmp/seeded-${id}`,
state,
publishedAt: lastTouched,
lastTouched,
publisherPid: 0,
});
};
for (let i = 0; i < 10; i++) mk(`b-done-${i}`, "done", 1000 + i);
for (let i = 0; i < 40; i++) mk(`b-active-${i}`, "serving", 2000 + i);
expect(boards.size).toBe(50);
const htmlPath = makeBoardHtml(tmpDir);
const r = await fetchHandler(req("POST", "/api/boards", { html: htmlPath }));
expect(r.status).toBe(200);
expect(boards.size).toBeLessThanOrEqual(50);
// At least one of the (oldest) done boards is gone; non-done untouched.
let doneGoneCount = 0;
for (let i = 0; i < 10; i++) if (!boards.has(`b-done-${i}`)) doneGoneCount += 1;
expect(doneGoneCount).toBeGreaterThanOrEqual(1);
// All non-done preserved
for (let i = 0; i < 40; i++) {
expect(boards.has(`b-active-${i}`)).toBe(true);
}
});
test("503 when 50 non-done boards already exist", async () => {
const boards = __testInternals__.boards;
for (let i = 0; i < 50; i++) {
boards.set(`b-busy-${i}`, {
id: `b-busy-${i}`,
htmlContent: "<p>busy</p>",
sourceDir: `/tmp/busy-${i}`,
allowedDir: `/tmp/busy-${i}`,
state: "serving",
publishedAt: i,
lastTouched: i,
publisherPid: 0,
});
}
const htmlPath = makeBoardHtml(tmpDir);
const r = await fetchHandler(req("POST", "/api/boards", { html: htmlPath }));
expect(r.status).toBe(503);
});
});
// ─── Idle + meaningful activity ──────────────────────────────────
//
// The behavioral tests for idle shutdown — actual process exit, bare-GET-
// doesn't-reset-idle, MAX_EXTENSIONS hard ceiling — live in
// daemon-discovery.test.ts because they require a real spawned daemon
// (lastMeaningfulActivity isn't observable in-process). The in-process
// version of these tests previously was a smoke that the testing specialist
// correctly flagged as misleading; it was removed.
describe("daemon idle + activity tracking (smoke)", () => {
test("idleCheckTick on a freshly-touched daemon does not throw or shut down", () => {
markMeaningfulActivity();
expect(() => idleCheckTick()).not.toThrow();
// boards map shouldn't have been wiped (no graceful shutdown happened)
expect(typeof __testInternals__.boards.size).toBe("number");
});
});
// ─── Malformed body negatives ────────────────────────────────────
describe("daemon malformed body handling", () => {
test("POST /api/boards rejects invalid JSON body with 400", async () => {
const bad = new Request("http://127.0.0.1:1234/api/boards", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: "{not json",
});
const r = await fetchHandler(bad);
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("Invalid JSON");
});
test("POST /api/boards rejects non-object body (e.g. JSON null) with 400", async () => {
// JS quirk: `typeof [] === "object"`, so arrays slip past the
// !body || typeof body !== "object" guard and fail at the missing-html
// check below. The "Expected JSON object" path only fires for genuinely
// non-object values like null, numbers, strings.
const bad = new Request("http://127.0.0.1:1234/api/boards", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: "null",
});
const r = await fetchHandler(bad);
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("Expected JSON object");
});
test("POST /api/boards: array body falls through to missing-html 400", async () => {
// Documents the actual behavior — arrays bypass the type guard but get
// caught by the html-field check. If we ever tighten the type check to
// reject arrays explicitly, this test will surface the change.
const r = await fetchHandler(req("POST", "/api/boards", [1, 2, 3] as any));
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("Missing 'html'");
});
test("POST /boards/<id>/api/reload rejects invalid JSON body with 400", async () => {
const board = await publishTestBoard();
const bad = new Request(
`http://127.0.0.1:1234/boards/${board.id}/api/reload`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: "{nope",
},
);
const r = await fetchHandler(bad);
expect(r.status).toBe(400);
});
test("POST /boards/<id>/api/reload rejects body missing html field with 400", async () => {
const board = await publishTestBoard();
const r = await fetchHandler(
req("POST", `/boards/${board.id}/api/reload`, { somethingElse: true }),
);
expect(r.status).toBe(400);
const body = (await r.json()) as any;
expect(body.error).toContain("HTML file not found");
});
});
// ─── Unknown routes ──────────────────────────────────────────────
describe("daemon unknown routes", () => {
test("404 on unknown path", async () => {
const r = await fetchHandler(req("GET", "/some/unknown/path"));
expect(r.status).toBe(404);
});
test("GET /api/boards (wrong method on publish endpoint) returns 404", async () => {
const r = await fetchHandler(req("GET", "/api/boards"));
expect(r.status).toBe(404);
});
});

View File

@ -0,0 +1,257 @@
/**
* End-to-end daemon round-trip test.
*
* Spawns a real design daemon and walks the full publish submit /
* regenerate / reload cycle via HTTP fetch (the same calls the board JS
* makes). Proves what design-shotgun and the rest of the design skills
* depend on:
*
* - $D compare --serve attaches to OR spawns a single shared daemon.
* - Two boards published into the same daemon get independent paths
* under /boards/<id>/ no port churn, no second process.
* - Submit writes feedback.json into the board's sourceDir with
* boardId + publishedAt fields the skill can poll for.
* - Regenerate writes feedback-pending.json, flips state to
* regenerating, /api/progress reflects it.
* - /api/reload swaps HTML in place second GET returns new content.
* - Even with two concurrent boards in flight, feedback for one does
* not contaminate the other's sourceDir.
*
* Browser-driven round-trip (feedback-roundtrip.test.ts) covers the same
* flow at the click level for the legacy --no-daemon path; this file is
* the daemon-path equivalent.
*/
import { afterEach, beforeEach, describe, expect, test } from "bun:test";
import fs from "fs";
import path from "path";
import { publishBoard } from "../src/daemon-client";
import { readStateFile } from "../src/daemon-state";
import {
makeBoardHtml,
makeTmpDir,
spawnDaemonForTest,
type SpawnedDaemon,
} from "./daemon-tests-fixtures";
let workDir: string;
let stateFile: string;
let daemons: SpawnedDaemon[] = [];
beforeEach(() => {
workDir = makeTmpDir("roundtrip-daemon");
stateFile = path.join(workDir, "design.json");
process.env.DESIGN_DAEMON_STATE_FILE = stateFile;
});
afterEach(async () => {
for (const d of daemons.splice(0)) {
try { await d.stop(); } catch {}
}
try { fs.unlinkSync(stateFile); } catch {}
delete process.env.DESIGN_DAEMON_STATE_FILE;
try { fs.rmSync(workDir, { recursive: true, force: true }); } catch {}
});
async function spawn1(): Promise<SpawnedDaemon> {
const d = await spawnDaemonForTest({ stateFile, idleMs: 60_000 });
daemons.push(d);
return d;
}
// ─── Submit round-trip ───────────────────────────────────────────
describe("daemon round-trip: publish → submit → feedback.json", () => {
test("Submit feedback lands at sourceDir with boardId + publishedAt", async () => {
const d = await spawn1();
const boardDir = makeTmpDir("board-submit");
try {
const htmlPath = makeBoardHtml(boardDir, "<p>round-trip board</p>");
const board = await publishBoard({ port: d.port, html: htmlPath });
expect(board.url).toBe(`http://127.0.0.1:${d.port}/boards/${board.id}/`);
expect(board.sourceDir).toBe(fs.realpathSync(boardDir));
// GET the board URL — same path the browser would hit
const page = await fetch(board.url);
expect(page.status).toBe(200);
const pageHtml = await page.text();
expect(pageHtml).toContain("round-trip board");
// POST submit (mirrors what the board JS does on Submit click)
const submit = await fetch(`${board.url}api/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
preferred: "A",
ratings: { A: 5, B: 3 },
comments: { A: "love it" },
overall: "ship A",
regenerated: false,
}),
});
expect(submit.status).toBe(200);
const submitBody = (await submit.json()) as any;
expect(submitBody.action).toBe("submitted");
// The skill side polls for feedback.json in the source directory
const feedbackPath = path.join(board.sourceDir, "feedback.json");
expect(fs.existsSync(feedbackPath)).toBe(true);
const written = JSON.parse(fs.readFileSync(feedbackPath, "utf-8"));
expect(written.preferred).toBe("A");
expect(written.ratings).toEqual({ A: 5, B: 3 });
expect(written.regenerated).toBe(false);
// Augmented fields the daemon adds
expect(written.boardId).toBe(board.id);
expect(typeof written.publishedAt).toBe("string");
expect(written.publishedAt).toMatch(/^\d{4}-\d{2}-\d{2}T/);
// The board's URL stays accessible after submit (history view)
const after = await fetch(board.url);
expect(after.status).toBe(200);
// Progress endpoint reflects done state
const progress = await fetch(`${board.url}api/progress`);
expect(((await progress.json()) as any).status).toBe("done");
} finally {
try { fs.rmSync(boardDir, { recursive: true, force: true }); } catch {}
}
});
test("GET /boards/<id> (no trailing slash) returns 301 to /boards/<id>/", async () => {
const d = await spawn1();
const boardDir = makeTmpDir("board-redir");
try {
const board = await publishBoard({
port: d.port,
html: makeBoardHtml(boardDir),
});
// Use redirect: 'manual' so we observe the 301 response itself
const res = await fetch(`http://127.0.0.1:${d.port}/boards/${board.id}`, {
redirect: "manual",
});
expect(res.status).toBe(301);
expect(res.headers.get("Location")).toBe(`/boards/${board.id}/`);
} finally {
try { fs.rmSync(boardDir, { recursive: true, force: true }); } catch {}
}
});
});
// ─── Regenerate + reload round-trip ──────────────────────────────
describe("daemon round-trip: publish → regenerate → reload → submit round 2", () => {
test("Full regen cycle: feedback-pending.json then reload swaps HTML", async () => {
const d = await spawn1();
const boardDir = makeTmpDir("board-regen");
try {
const r1Path = makeBoardHtml(boardDir, "<p>round 1 variants</p>");
const board = await publishBoard({ port: d.port, html: r1Path });
// Skill issues a regenerate via the board JS path
const regen = await fetch(`${board.url}api/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
preferred: "A",
ratings: { A: 4 },
regenerated: true,
regenerateAction: "more_like_A",
}),
});
expect(regen.status).toBe(200);
expect(((await regen.json()) as any).action).toBe("regenerate");
// Pending file exists, final feedback file does not
expect(fs.existsSync(path.join(board.sourceDir, "feedback-pending.json"))).toBe(true);
expect(fs.existsSync(path.join(board.sourceDir, "feedback.json"))).toBe(false);
// Progress reflects regenerating state
const prog1 = await fetch(`${board.url}api/progress`);
expect(((await prog1.json()) as any).status).toBe("regenerating");
// Agent generates round 2, writes a new HTML file, calls /api/reload
const r2Path = path.join(boardDir, "round2.html");
fs.writeFileSync(r2Path, "<!DOCTYPE html><html><body><p>round 2 variants</p></body></html>");
const reload = await fetch(`${board.url}api/reload`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ html: r2Path }),
});
expect(reload.status).toBe(200);
// Same URL now serves the round-2 content (no port change, no
// new browser tab — the user's existing tab can reload in place)
const r2Page = await fetch(board.url);
expect(await r2Page.text()).toContain("round 2 variants");
expect(((await (await fetch(`${board.url}api/progress`)).json()) as any).status).toBe(
"serving",
);
// User submits round 2
const finalSubmit = await fetch(`${board.url}api/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
preferred: "B",
ratings: { B: 5 },
regenerated: false,
}),
});
expect(finalSubmit.status).toBe(200);
const written = JSON.parse(
fs.readFileSync(path.join(board.sourceDir, "feedback.json"), "utf-8"),
);
expect(written.preferred).toBe("B");
expect(written.boardId).toBe(board.id);
} finally {
try { fs.rmSync(boardDir, { recursive: true, force: true }); } catch {}
}
});
});
// ─── Two-board, one-daemon attach behavior ───────────────────────
describe("daemon round-trip: two concurrent publishes share one daemon", () => {
test("Second publish attaches to the same daemon (no new spawn)", async () => {
const d = await spawn1();
const dirA = makeTmpDir("two-a");
const dirB = makeTmpDir("two-b");
try {
const a = await publishBoard({ port: d.port, html: makeBoardHtml(dirA) });
const b = await publishBoard({ port: d.port, html: makeBoardHtml(dirB) });
// Same daemon process — state file pid is stable
const state = readStateFile(stateFile);
expect(state!.pid).toBe(d.proc.pid);
// Two distinct board ids
expect(a.id).not.toBe(b.id);
// Both URLs serve their own content
const pageA = await fetch(a.url);
const pageB = await fetch(b.url);
expect(pageA.status).toBe(200);
expect(pageB.status).toBe(200);
// Feedback isolation: submit to A only affects A's sourceDir
await fetch(`${a.url}api/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ regenerated: false, preferred: "A" }),
});
expect(fs.existsSync(path.join(a.sourceDir, "feedback.json"))).toBe(true);
expect(fs.existsSync(path.join(b.sourceDir, "feedback.json"))).toBe(false);
// Index page lists both
const idx = await fetch(`http://127.0.0.1:${d.port}/`);
const idxHtml = await idx.text();
expect(idxHtml).toContain(a.id);
expect(idxHtml).toContain(b.id);
} finally {
try { fs.rmSync(dirA, { recursive: true, force: true }); } catch {}
try { fs.rmSync(dirB, { recursive: true, force: true }); } catch {}
}
});
});

View File

@ -55,7 +55,7 @@ beforeAll(async () => {
serverState = 'serving';
// This server mirrors the real serve.ts behavior:
// - Injects __GSTACK_SERVER_URL into the HTML
// - Serves board HTML at / (board JS uses relative URLs)
// - Handles POST /api/feedback with file writes
// - Handles GET /api/progress for regeneration polling
// - Handles POST /api/reload for board swapping
@ -67,11 +67,7 @@ beforeAll(async () => {
const url = new URL(req.url);
if (req.method === 'GET' && (url.pathname === '/' || url.pathname === '/index.html')) {
const injected = currentHtml.replace(
'</head>',
`<script>window.__GSTACK_SERVER_URL = '${url.origin}';</script>\n</head>`
);
return new Response(injected, {
return new Response(currentHtml, {
headers: { 'Content-Type': 'text/html; charset=utf-8' },
});
}
@ -140,14 +136,15 @@ describe('Submit: browser click → feedback.json on disk', () => {
if (fs.existsSync(feedbackPath)) fs.unlinkSync(feedbackPath);
serverState = 'serving';
// Navigate to the board (served with __GSTACK_SERVER_URL injected)
// Navigate to the board (board JS uses relative URLs + location.protocol detect)
await handleWriteCommand('goto', [baseUrl], bm);
// Verify __GSTACK_SERVER_URL was injected
const hasServerUrl = await handleReadCommand('js', [
'!!window.__GSTACK_SERVER_URL'
// Verify the board detects HTTP mode (so postFeedback will actually fetch
// instead of falling into the file:// DOM-only path)
const httpDetected = await handleReadCommand('js', [
"location.protocol === 'http:' || location.protocol === 'https:'"
], bm);
expect(hasServerUrl).toBe('true');
expect(httpDetected).toBe('true');
// User picks variant A, rates it 5 stars
await handleReadCommand('js', [

View File

@ -65,11 +65,9 @@ describe('Serve HTTP endpoints', () => {
const url = new URL(req.url);
if (req.method === 'GET' && url.pathname === '/') {
const injected = htmlContent.replace(
'</head>',
`<script>window.__GSTACK_SERVER_URL = '${url.origin}';</script>\n</head>`
);
return new Response(injected, {
// Board JS uses relative URLs (./api/feedback, ./api/progress)
// and a location.protocol feature-detect; no injection needed.
return new Response(htmlContent, {
headers: { 'Content-Type': 'text/html; charset=utf-8' },
});
}
@ -118,12 +116,17 @@ describe('Serve HTTP endpoints', () => {
server.stop();
});
test('GET / serves HTML with injected __GSTACK_SERVER_URL', async () => {
test('GET / serves HTML with relative-path board JS (no injection)', async () => {
const res = await fetch(baseUrl);
expect(res.status).toBe(200);
const html = await res.text();
expect(html).toContain('__GSTACK_SERVER_URL');
expect(html).toContain(baseUrl);
// No more per-origin URL injection; board JS uses relative paths.
expect(html).not.toContain('__GSTACK_SERVER_URL');
expect(html).not.toContain(baseUrl);
// Board JS calls relative endpoints so the same HTML works at / and at
// /boards/<id>/ (daemon mode).
expect(html).toContain("fetch('./api/feedback'");
expect(html).toContain("fetch('./api/progress')");
expect(html).toContain('Design Exploration');
});
@ -308,9 +311,12 @@ describe('Serve /api/reload — path traversal protection', () => {
}
// Production path validation — same as design/src/serve.ts
const resolvedReload = fs.realpathSync(path.resolve(body.html));
if (!resolvedReload.startsWith(allowedDir + path.sep) && resolvedReload !== allowedDir) {
if (!resolvedReload.startsWith(allowedDir + path.sep)) {
return Response.json({ error: `Path must be within: ${allowedDir}` }, { status: 403 });
}
if (!fs.statSync(resolvedReload).isFile()) {
return Response.json({ error: `Path must be a file, not a directory: ${body.html}` }, { status: 400 });
}
htmlContent = fs.readFileSync(resolvedReload, 'utf-8');
return Response.json({ reloaded: true });
})();
@ -369,6 +375,39 @@ describe('Serve /api/reload — path traversal protection', () => {
const page = await fetch(baseUrl);
expect(await page.text()).toContain('Safe reload');
});
// Regression for the directory-instead-of-file guard (Codex finding).
// Before: resolvedReload === allowedDir passed the guard and then
// readFileSync threw EISDIR with no helpful message.
test('blocks reload when path resolves to the allowed directory itself', async () => {
const res = await fetch(`${baseUrl}/api/reload`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ html: tmpDir }),
});
// tmpDir does not satisfy startsWith(allowedDir + sep), so the within-dir
// check rejects with 403 — but importantly, no EISDIR crash.
expect(res.status).toBe(403);
});
test('blocks reload when path is a subdirectory (not a file)', async () => {
const subdir = path.join(tmpDir, 'subdir-not-a-file');
fs.mkdirSync(subdir, { recursive: true });
try {
const res = await fetch(`${baseUrl}/api/reload`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ html: subdir }),
});
// Inside allowedDir but a directory — must fail before readFileSync,
// with a clear "must be a file" error instead of EISDIR.
expect(res.status).toBe(400);
const data = await res.json();
expect(data.error).toContain('must be a file');
} finally {
try { fs.rmSync(subdir, { recursive: true, force: true }); } catch {}
}
});
});
// ─── Full lifecycle: regeneration round-trip ──────────────────────

View File

@ -2,15 +2,7 @@
name: devex-review
preamble-tier: 3
version: 1.0.0
description: |
Live developer experience audit. Uses the browse tool to actually TEST the
developer experience: navigates docs, tries the getting started flow, times
TTHW, screenshots error messages, evaluates CLI help text. Produces a DX
scorecard with evidence. Compares against /plan-devex-review scores if they
exist (the boomerang: plan said 3 minutes, reality says 8). Use when asked to
"test the DX", "DX audit", "developer experience test", or "try the
onboarding". Proactively suggest after shipping a developer-facing feature. (gstack)
Voice triggers (speech-to-text aliases): "dx audit", "test the developer experience", "try the onboarding", "developer experience test".
description: Live developer experience audit. (gstack)
triggers:
- live dx audit
- test developer experience
@ -27,6 +19,19 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Uses the browse tool to actually TEST the
developer experience: navigates docs, tries the getting started flow, times
TTHW, screenshots error messages, evaluates CLI help text. Produces a DX
scorecard with evidence. Compares against /plan-devex-review scores if they
exist (the boomerang: plan said 3 minutes, reality says 8). Use when asked to
"test the DX", "DX audit", "developer experience test", or "try the
onboarding". Proactively suggest after shipping a developer-facing feature.
Voice triggers (speech-to-text aliases): "dx audit", "test the developer experience", "try the onboarding", "developer experience test".
## Preamble (run first)
```bash
@ -567,84 +572,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -0,0 +1,81 @@
# Fix #1671: `/office-hours` always reports SESSION_COUNT: 0
**Status:** SHIPPED
**Branch:** fix-1671-profile-migration
**Date:** 2026-05-23
**Issue:** https://github.com/garrytan/gstack/issues/1671
**Original PR that introduced the bug:** garrytan/gstack#1039 / commit `0a803f9` / v1.0.0.0 / 2026-04-18
## The problem
`/office-hours` reports `SESSION_COUNT: 0` and `TIER: introduction` on every invocation, even for users who have run the skill many times. The `welcome_back` tier (`bin/gstack-developer-profile:165-169`) that exists to skip the closing pitch for returning users is unreachable. Live ~5 weeks on every fresh-`$HOME` user since v1.0.0.0.
## Root cause
The v1.0.0.0 migration moved the read path to `~/.gstack/developer-profile.json` but left the writer in `office-hours/SKILL.md.tmpl` writing to the legacy `~/.gstack/builder-profile.jsonl`. The `ensure_profile` stub created on first read has `sessions: []`; subsequent writes go to a file the reader never re-reads. Reader and writer disagree on storage.
Full root-cause analysis (including RC2/RC3 follow-ups): https://github.com/garrytan/gstack/issues/1671
## The fix
Make the writer use the same file the reader does.
### Changes
1. **`bin/gstack-developer-profile`** — add `--log-session '<json>'` subcommand:
- Validates required fields (`date`, `mode`), silent-skip on invalid input (matches `bin/gstack-timeline-log:22-26`).
- Reads existing `developer-profile.json` via `bun -e`.
- Appends entry to `sessions[]`. Updates `signals_accumulated` (per-signal-string increment, same as `do_migrate:67-69`), unions `resources_shown` and `topics`.
- Atomic mktemp+mv write (matches existing pattern at line 54).
- Calls `gstack-brain-enqueue "developer-profile.json"` after write, mirroring `bin/gstack-timeline-log:40`.
2. **`bin/gstack-developer-profile:do_read`** — filter `mode:"resources"` entries when picking LAST_PROJECT / LAST_ASSIGNMENT / LAST_DESIGN_TITLE / CROSS_PROJECT / DESIGN_*. The Phase 6 resources auto-append happens after the real session in the same /office-hours invocation; without the filter, that resources entry clobbers real-session state for the user's next session. Latent bug that was masked by the broken writer; activated by the fix.
3. **`office-hours/SKILL.md.tmpl`** — swap writers at lines 490 and 893:
- From: `echo '{...}' >> "$GSTACK_STATE_ROOT/builder-profile.jsonl"`
- To: `~/.claude/skills/gstack/bin/gstack-developer-profile --log-session '{...}' 2>/dev/null || true`
- Run `bun run gen:skill-docs` to regenerate `office-hours/SKILL.md`.
### What's NOT in the fix (intentionally)
- **No new binary.** The owner binary for `developer-profile.json` is `gstack-developer-profile`; the writer belongs there as a subcommand. `--log-session` joins the binary's existing `--migrate` / `--derive` write-side subcommand boundary, not the `gstack-*-log` event-writer family. Verb name still matches `gstack-*-log`.
- **No mkdir-locks.** Concurrent /office-hours calls have a read-modify-write race on `developer-profile.json`. The codebase accepts the same race in `gstack-config` (r-m-w on YAML, no lock). Not introduced by this fix; out of scope.
- **No schema bump.** Schema stays at `schema_version: 1`. The fix doesn't change the schema, just makes the writer use it.
- **No auto-reconcile for affected users.** Existing users with stranded `builder-profile.jsonl` entries don't get their past history auto-merged into `developer-profile.json`. On their next /office-hours run, the first new session lands in `welcome_back`; past data stays in the legacy file (still readable by other tools during deprecation). Most affected users have only a handful of stranded sessions so the loss is mostly aesthetic. Dropped the one-release-only reconcile pathway as net noise — Garry's "right-sized diff" voice.
- **No autoplan timeline rollup (RC2).** Separate concern, separate PR.
- **No project-scope opt-in (RC3).** Separate concern, separate PR.
- **No gbrain glob change.** The office-hours manifest still globs `~/.gstack/builder-profile.jsonl` for context; once new writes stop landing there, the snapshot goes cold. Update in a follow-up if it becomes a UX issue.
### Tests (all gate-tier, free, deterministic)
1. **Regression test** in `test/gstack-developer-profile.test.ts`:
- Fresh `$HOME`.
- Run /office-hours preamble: gstack-developer-profile creates empty stub.
- Call `--log-session` with a startup-mode JSON.
- Run `--read` again. Assert `SESSION_COUNT: 1`, `TIER: welcome_back`.
- Fails on current main (subcommand doesn't exist). Passes with fix.
2. **`do_read` mode filter test:** after recording a startup session followed by a resources entry, `--read` returns LAST_PROJECT / LAST_ASSIGNMENT / LAST_DESIGN_TITLE from the real session, not from the resources entry. RESOURCES_SHOWN still aggregates correctly.
3. **Validation + aggregation tests:** `--log-session` silently skips invalid JSON / missing required fields, injects `ts` if missing, preserves user-set `ts`, correctly aggregates signals/resources/topics across multiple sessions.
4. **Static-grep invariant** in `test/static-no-legacy-writes.test.ts` (new): walks every skill dir, asserts no production code path writes to `builder-profile.jsonl` except allowlisted readers (`gstack-developer-profile`, `gstack-memory-ingest.ts`, `gstack-artifacts-init`, doc files). Prevents future writers from regressing onto the legacy file.
### Acceptance criteria
- Second `/office-hours` invocation on a fresh `$HOME` returns `TIER: welcome_back`.
- `bun test` passes on the touched files in isolation.
- `bun run gen:skill-docs` produces clean diff matching the `.tmpl` edits.
### Rollout
- One commit. PATCH version bump per CHANGELOG style guide.
- CHANGELOG entry written by `/ship`. User-facing voice: lead with what users experience now that they didn't before (welcome_back tier kicks in on second visit).
## Follow-up TODOs
- Deprecate `builder-profile.jsonl` entirely (writer + shim + memory-ingest type) after one release.
- Fix RC2 (autoplan inlines sub-skills, bypassing their timeline-log preambles).
- Add `GSTACK_PROFILE_SCOPE` opt-in for power users with multiple agent identities (RC3).
- /plan-tune doesn't currently call `--derive`, so `inferred`/`gap` can drift (pre-existing, unrelated to #1671).
- `mode:"resources"` entries inflate SESSION_COUNT under the existing tier aggregator (pre-existing, unrelated to #1671 root cause).

755
docs/designs/v2_PLAN.md Normal file
View File

@ -0,0 +1,755 @@
# gstack v2 — the lightest opinionated skill pack
## Context
gstack has an externally documented reputation for being "fat." Third-party reviews (dev.to, May 2026) explicitly say gstack "can feel bloated when all roles are turned on... potentially consuming 10K+ tokens before any real code is written, and daily usage burns through tokens fast... making even straightforward tasks feel sluggish and redundant." Anthropic's own canonical Skills guidance prescribes the "progressive disclosure" pattern (`SKILL.md` skeleton + `references/` loaded on demand) — gstack diverges from this.
The numbers back the criticism:
- 31 skills, 2.1MB total generated SKILL.md corpus
- 28 of 31 skills exceed the 40KB soft ceiling (~10K tokens each)
- ship.md is 164KB (~41K tokens); ship.md.tmpl is only 48KB — **115KB is resolver-injected**, the highest-leverage compression target
- Catalog in always-loaded system prompt: 50+ skills × multi-paragraph descriptions, voice triggers, proactive-suggest paragraphs
This plan ships gstack v2 in two coordinated releases: v1.45.0.0 lands the foundation + low-risk wins, then v2.0.0.0 ships the architectural break + marketing-grade repositioning 2-4 weeks later. The split came out of cross-model review: Codex argued v2 looks like posturing without real breakage; the hybrid shape gives the genuinely-breaking sections/ pattern the major bump it earns, while letting the risk-free wins ship immediately.
## Release shape
```
v1.45.0.0 (Foundation Release) v2.0.0.0 (gstack v2 Launch)
───────────────────────────── ─────────────────────────────
~1-2 weeks of CC work 2-4 weeks later, coordinated
Phase 0: Eval coverage matrix Phase B: sections/ pattern
gate + periodic for all 31 skills on 5 heavyweights
(ship, plan-ceo, office-hours,
Phase A: Build-time compression plan-eng, plan-design)
conditional resolver injection
jargon dedup Phase C: Eval annotations
terse-mode actually compresses + CI orphan check (WARN→FAIL)
Catalog trim (Codex high-leverage win) Lighter-touch migration
one-line skill descriptions release note + auto-regenerate
drop voice triggers/proactive blocks on /gstack-upgrade
Hard token budgets defined Marketing-grade CHANGELOG
enforced via budget-regression v1 vs v2 numbers table
README v2 banner
Normal release voice "lightest opinionated skill pack"
```
## Premise check (Step 0A findings)
1. **Is this the right problem?** YES — externally validated. The bloat criticism is quotable and represents real user pain (token cost, sluggish sessions). Doing nothing means losing users to Cursor/Codex for their "lighter touch" reputation.
2. **Doing nothing:** the criticism compounds. Recent releases (v1.38 → v1.44) all added features; no release has gone the other direction. Without an explicit reversal, the reputation calcifies.
3. **Risk of acting:** the lazy-section pattern introduces silent-behavior-loss as a new failure class. Mitigated by the eval-first foundation + mechanical enforcement + canary rollout (see Phase B integrity section).
## What already exists (reuse-first audit)
| Asset | Reuse |
|---|---|
| `scripts/gen-skill-docs.ts` lines 439-450 | Already does string substitution and per-host suppression; extend with `appliesTo` resolver gate (~15 LOC) |
| `scripts/resolvers/types.ts` | Add `ResolverEntry` union type |
| `scripts/resolvers/preamble.ts` | Already does tier-gated composition (1-4); add per-resolver gating |
| `scripts/jargon-list.json` | Already a single file; just stop inlining it 37× |
| `test/skill-e2e-budget-regression.test.ts` (existing gate-tier) | Extend with per-skill hard budgets |
| Real-PTY harness from v1.13.2.0 | Reuse for behavioral-contract evals (~$0.50/eval) |
| SDK harness | Reuse for cheap shape evals (~$0/eval where possible) |
| `gstack-upgrade/migrations/` | Pattern exists for state-format migrations; reuse for v2 auto-regenerate |
| `~/.gstack/analytics/skill-usage.jsonl` | Already collected; powers deferred `gstack budget` CLI |
We are catching up to Anthropic's canonical Skills pattern, not inventing one.
## Dream state delta
```
TODAY v1.45.0.0 v2.0.0.0
────── ───────── ────────
2.1MB corpus ~1.3MB corpus (-40%) ~700KB corpus (-67%)
ship.md: 164KB ship.md: ~80KB (-50%) ship.md: ~15KB skeleton
+ 5×~5KB sections
28/31 over 40KB ceiling ~10/31 over ceiling ~3/31 over ceiling
(cso, document-release,
design-consultation
kept as monoliths)
Catalog: multi-paragraph Catalog: one-line per skill Catalog: one-line per skill
descriptions, voice triggers (~70% catalog cut) (same)
No eval coverage matrix Every skill: ≥1 gate eval Section-level eval
+ ≥1 periodic eval annotations + CI orphan check
"Fat" reputation in third-party "Compressed, eval-protected" "Lightest opinionated skill
reviews internally measured pack" externally measured
```
## Phase 0 — Eval coverage matrix (v1.45.0.0)
**Goal:** every skill in gstack ships with at least one gate-tier eval AND one periodic-tier eval that asserts a must-have behavior. The eval suite becomes the design spec. This is the load-bearing claim of the plan — must come first.
**Cross-model tension noted:** Codex argued this is a procrastination trap and shape-asserts are shallow. User explicitly chose full tiered coverage anyway (D9 = A), with rationale: "the eval suite IS the design spec; that commitment is the load-bearing claim of the whole plan." We accept the larger upfront investment.
**Mitigation of Codex's "shape vs quality" critique:** for orchestration/judgment skills (plan-ceo, office-hours, autoplan), the must-have isn't deterministic output — it's structural compliance (does it call AskUserQuestion in the right shape? does it follow the section order? does it persist artifacts?). Eval design must capture structural contracts, not output content. Where structural eval is impossible, that section is explicitly noted as "judgment-dependent, not eval-protected" — Codex's #2 critique is honored by NOT then stripping unprotected judgment prose.
**Skills currently lacking dedicated E2E coverage** (eval-writing target):
| Skill | Gate eval (target) | Periodic eval (target) | Est. cost/run |
|---|---|---|---|
| qa-only | report-only flag triggers | full QA flow with fix-loop disabled | $0.30 / $1.50 |
| retro | weekly aggregate runs without error | full retro produces ranked output | $0.20 / $2.00 |
| document-release | reads CHANGELOG, produces Diataxis map | full post-ship doc update | $0.30 / $1.80 |
| document-generate | generates 4 doc types from prompt | E2E generation passes quality bar | $0.30 / $2.00 |
| context-save | persists state to expected path | round-trip restore preserves context | $0.10 / $0.50 |
| context-restore | reads latest save, applies to session | cross-workspace restore works | $0.10 / $0.50 |
| gstack-upgrade | detects install type, runs upgrade | full upgrade + migration round-trip | $0.20 / $1.00 |
| sync-gbrain | refreshes index without error | full sync produces searchable corpus | $0.20 / $1.50 |
| setup-gbrain | path 1-4 detection works | end-to-end setup for each path | $0.20 / $2.00 |
| setup-browser-cookies | picker UI loads without error | cookie import round-trip | $0.20 / $1.00 |
| setup-deploy | detects config, writes expected files | full deploy config setup | $0.20 / $1.00 |
| design-consultation | DESIGN.md template renders | full design system generation | $0.30 / $2.50 |
| design-shotgun | variants generated and saved | full multi-variant exploration | $0.30 / $2.00 |
| open-gstack-browser | launches browser without error | sidebar attaches and shows activity | $0.20 / $0.80 |
| pair-agent | setup key generated, instructions printed | full pair flow with second agent | $0.20 / $1.50 |
| land-and-deploy | merge gates check correctly | full merge → deploy → canary | $0.30 / $3.00 |
| canary | post-deploy loop runs, exits cleanly | full canary cycle with alert simulation | $0.20 / $1.50 |
| benchmark | runs and produces score | full regression detection | $0.20 / $2.00 |
| plan-devex-review | mode routing works | full DX review with scoring | $0.40 / $3.00 |
| devex-review | live DX audit produces scorecard | E2E DX measurement vs plan baseline | $0.40 / $2.50 |
Estimated added CI cost: **~$5/run gate, ~$30/run periodic.** Combined with existing E2E suite (~$15/gate, ~$30/periodic), total: ~$20/gate (every PR), ~$60/periodic (weekly). Acceptable.
**Eval matrix lives at:** `test/helpers/skill-coverage-matrix.ts` — a single source of truth mapping each skill to its gate + periodic eval test files. CI check in `test/skill-coverage-matrix.test.ts` fails the build if any skill is missing an entry.
**Critical files to add:**
- `test/skill-coverage-matrix.ts` — registry mapping skill → eval paths
- `test/skill-e2e-*.test.ts` — 20 new test files (gate-tier subset starts in gate config, periodic-tier subset in periodic config)
- `test/helpers/touchfiles.ts` — register new tests for diff-based selection
## Phase A — Build-time compression (v1.45.0.0)
**A.1 Conditional resolver injection** — extend `scripts/gen-skill-docs.ts` and `scripts/resolvers/`:
```ts
// scripts/resolvers/types.ts
export type ResolverFn = (ctx: TemplateContext, args?: string[]) => string;
export type ResolverEntry = ResolverFn | {
resolve: ResolverFn;
appliesTo?: (ctx: TemplateContext) => boolean;
};
```
```ts
// scripts/resolvers/index.ts — gate the heavy ones
QUESTION_TUNING: {
resolve: generateQuestionTuning,
appliesTo: (ctx) => ['plan-ceo-review','plan-eng-review','office-hours'].includes(ctx.skillName),
},
REVIEW_ARMY: {
resolve: generateReviewArmy,
appliesTo: (ctx) => ['ship','review'].includes(ctx.skillName),
},
REVIEW_DASHBOARD: {
resolve: generateReviewDashboard,
appliesTo: (ctx) => ['ship','plan-ceo-review','plan-eng-review','plan-design-review','plan-devex-review','devex-review'].includes(ctx.skillName),
},
// ... audit all 21 resolvers, gate per actual usage
```
```ts
// scripts/gen-skill-docs.ts (~line 444) — check the gate
const entry = RESOLVERS[resolverName];
const resolver = typeof entry === 'function' ? entry : entry.resolve;
const gate = typeof entry === 'function' ? undefined : entry.appliesTo;
if (gate && !gate(ctx)) return '';
return args.length > 0 ? resolver(ctx, args) : resolver(ctx);
```
**A.2 Jargon-list dedup** — currently `scripts/resolvers/preamble/generate-writing-style.ts` inlines the full 1.8KB jargon glossary into 37 skills. Replace inline with a reference: "For the canonical jargon list, Read `~/.claude/skills/gstack/scripts/jargon-list.json` on first use." Saves ~66KB total corpus.
**A.3 Terse-mode actually compresses** — read `~/.gstack/config.yaml` once in `gen-skill-docs.ts`, pass `explainLevel` into `TemplateContext`, and have `generate-writing-style.ts` / `generate-completeness.ts` / `generate-confusion-protocol.ts` / `generate-context-health.ts` return `''` when terse. Today the bytes ship regardless of config — the flag only changes runtime model behavior. Add `--explain-level=terse` build flag for benchmarking.
**A.4 Catalog trim** (moved up per Codex #6) — shorten skill descriptions in the always-loaded system prompt to one line per skill. Voice triggers move from catalog descriptions into in-skill content. Proactive-suggest paragraphs move to a separate `~/.claude/skills/gstack/scripts/proactive-suggestions.json` loaded only when the agent needs routing guidance. Per-skill description format:
```
- <skill-name>: <one-line outcome description, 80 chars> (gstack)
```
Estimated catalog cut: ~70% (largest single always-loaded reduction).
**A.5 cso/ targeted compression** (Codex #9) — cso gets resolver dedup + catalog trim. Security guidance prose stays uncompressed monolithically until Phase B audit shows specific sections can safely move to sections/ with eval coverage. Not "exempt" — just sequenced last.
**A.6 Hard token budgets** (Codex #10) — define and enforce in `test/skill-e2e-budget-regression.test.ts`:
| Budget | v1.44 actual | v1.45 target | v2.0 target |
|---|---|---|---|
| Max system-prompt catalog tokens | ~25K | ~8K | ~6K |
| Max per-skill SKILL.md size | 164KB (ship) | 100KB | 30KB (heavyweights) |
| Max corpus total | 2.1MB | 1.3MB | 700KB |
| Max first-invocation latency (heavyweight) | ~immediate | ~immediate | <500ms section reads |
CI fails if any budget exceeded. Tracked over time via existing budget-regression jsonl.
## Phase B — sections/ pattern for heavyweights (v2.0.0.0)
Convert 5 heavyweights to Anthropic-canon skeleton + `sections/*.md`:
```
ship/
├── SKILL.md # 12-15KB decision-tree skeleton + section manifest
├── SKILL.md.tmpl # source for the skeleton
├── sections/
│ ├── manifest.json # NEW: structured section registry (Codex #3 mitigation)
│ ├── version-bump.md
│ ├── changelog.md
│ ├── review-army.md
│ ├── todos-cleanup.md
│ ├── pr-body.md
│ └── ...
```
**Silent-behavior-loss mitigations** (Codex #3) — layered defense, not just self-check:
1. **Section manifest** (`sections/manifest.json`) — structured registry: `{section_file, applies_when, required_for}`. Decision-tree skeleton references entries by ID, not free-form prose.
2. **Imperative skeleton phrasing** — "STOP. Read `sections/version-bump.md` before computing the bump." Not "see ... for details."
3. **Top-of-file section index table** — situation → section file mapping.
4. **End-of-skill self-check** — "Confirm you Read every section your decision tree pointed to. List them." (weakest layer, kept as fallback.)
5. **Eval harness `requiredReads` declaration** — E2E test asserts which sections must appear in transcript Read calls for a given fixture. Mechanical enforcement at the test layer, not just prompt layer.
6. **Transcript inspection in canary cohort** — first week post-ship, log which sections actually get read by real sessions; alert on Read-miss for marked-required sections.
**Conversion order** (one at a time, validate each before next):
1. `ship/` — most invocations, biggest cost, riskiest. Land alone, observe 1 week.
2. `plan-ceo-review/` — conversational; risk of breaking flow. Land second, observe carefully.
3. `office-hours/` — most conversational. Land third only if 1+2 went clean.
4. `plan-eng-review/` and `plan-design-review/` — bundle, similar shape.
**Do not convert** unless explicitly approved later: `autoplan` (orchestrator that already chains skills), `design-review` (UI flow already tight), `qa` (single-purpose), `investigate` (single-purpose).
## Phase C — Eval annotations + CI orphan check (v2.0.0.0)
Per Codex #4 — warn-before-fail progression, not immediate strict gate.
```md
<!-- eval: test/skill-e2e-ship-version-bump.test.ts -->
<!-- coverage: asserts the queue-aware bump picks the next available version when the claimed version is taken -->
```
Annotations include **coverage semantics** (what behavior is protected) per Codex #5, not just paths. Path-only would be false confidence.
CI check in `gen-skill-docs.ts` walker:
- v2.0.0.0 ships in WARN mode — orphans logged to PR summary but build passes
- v2.1.0.0 (or 2 release cycles after v2.0): WARN escalates to FAIL
- Waiver: `<!-- eval: none — accept loss, reviewed YYYY-MM-DD by @user -->`
This avoids "maintenance theater" of mandatory annotations with no semantics, and gives users a transition window.
## Migration approach (v2.0.0.0, lighter touch per D11)
- Release note in v2.0.0.0 CHANGELOG explains the sections/ format change and concrete user impact: forks/copy-pasted SKILL.md files need re-fetch; first-invocation of heavyweight skills has ~200-500ms section-read latency added.
- `/gstack-upgrade` auto-regenerates on next invocation. No interactive migration prompts.
- Vendored installs get a single one-line warning at session start on first v2 contact (re-use existing vendored-install warning pattern in skill preamble).
- `gstack-upgrade --explain-v2` flag for users who want the full explanation on demand.
## Forks / customization compatibility (Codex #11)
Documented in v2.0.0.0 release note:
- Anyone who reads/copies/edits a heavyweight SKILL.md file directly: the file is now a skeleton; behavior lives in `sections/*.md`. They need to either treat the skill as a black box (recommended) or fork the full `skill/` directory including `sections/`.
- Anyone with local SKILL.md.tmpl edits in a fork: the templates are smaller; conflicts likely on regenerate. Fork docs updated with migration guidance.
- Anyone with docs/blog posts linking to specific lines of a generated SKILL.md: line numbers will shift; recommend linking to template + section name instead.
## Rollout strategy (Codex #12)
v1.45.0.0:
- Land in one PR; existing budget-regression test catches any per-skill size regression; eval matrix CI check catches any skill missing its evals.
- Dogfood: 1 week active use across all of Garry's workspaces before announcing.
v2.0.0.0:
- **Canary cohort**: ship to dogfood users (Garry + active agents) first via a v2.0.0-rc.1 tag. Real-PTY harness logs section Reads for top 5 workflows (`/ship`, `/qa`, `/review`, `/plan-ceo-review`, `/autoplan`); alert on Read-miss for required sections.
- **Manual verification**: top 5 workflows manually run before tagging v2.0.0.0 final, with before/after transcripts saved as eval baselines.
- **Regression dashboard**: existing `bun run eval:summary` extended with v1 vs v2 per-skill token + behavioral compliance comparison.
- **Rollback**: revert PR + `bun run gen:skill-docs` regenerates old shape. Documented in CONTRIBUTING.md.
## Review-section findings (Sections 1-11, condensed)
| Section | Findings | Status |
|---|---|---|
| 1. Architecture | Lazy-section silent-loss risk; mitigated via 6-layer defense above | Findings addressed in plan |
| 2. Errors/Rescues | gen-skill-docs gate-fail loud; missing sections fall back to skeleton; CI orphan check loud | Findings addressed |
| 3. Security | cso targeted dedup not blanket exemption (Codex #9); migration script runs at user-shell trust boundary, same as existing migrations | Findings addressed |
| 4. Data/UX edge cases | v1→v2 muscle-memory break warned in release note; vendored installs get one-line warning; concurrent dev-symlink sessions risk is existing CLAUDE.md caveat | Findings addressed |
| 5. Code quality | ~150 LOC additive across gen-skill-docs/types/index; ~20 new eval test files; sections/ extraction is mechanical | OK |
| 6. Tests | Phase 0 IS the test plan. Coverage matrix CI gate enforces every skill has its evals | Findings addressed |
| 7. Performance | Build time <2× current; runtime adds 200-500ms first-invocation for sectioned heavyweights; catalog trim reduces always-loaded prompt size on every session | Documented |
| 8. Observability | budget-regression test already exists; canary cohort transcript logging in Phase B; migration outcome logged to ~/.gstack/analytics/migrations.jsonl | Findings addressed |
| 9. Deployment | Two-release split + warn-before-fail eval annotations + rollback via revert | Findings addressed |
| 10. Long-term trajectory | Reversibility 3/5; sections/ pattern becomes template for future skills; deferred TODOs extend v2 narrative for v2.1+ | OK |
| 11. Design/UX | README v2 banner + CHANGELOG numbers table land in v2.0.0.0; concrete numbers, gstack voice, no AI slop | OK |
## NOT in scope
- **Skill removals.** User said "keep all functions." qa-only, design-shotgun, pair-agent, open-gstack-browser all stay. They get evals + catalog trim like everyone else.
- **Skill renames.** No `qa``qa-fix` collapses. Keep CLI surface stable.
- **gstack lite/pro install profiles.** Deferred to TODOS for post-v2.
- **gstack budget CLI.** Deferred to TODOS for post-v2.
- **Per-skill eval coverage badge in README.** Deferred to TODOS.
- **Cross-tool portability test/demo (Codex/Cursor compat).** Deferred to TODOS.
- **Token-cost preview on invocation.** Deferred to TODOS.
- **Skill autoload telemetry.** Deferred to TODOS.
- **gstack diff PR comment.** Deferred to TODOS.
## TODOS.md updates (deferred items, recommend bulk-add post-merge)
| TODO | Priority | Effort (human / CC) | Depends on |
|---|---|---|---|
| `gstack lite` install profile (5-skill core) | P2 | 2 days / 3-4 hrs | v2.0.0.0 |
| `gstack pro` opt-in upgrade path | P2 | 1 day / 1 hr | gstack lite |
| `gstack budget` CLI (per-skill token usage telemetry) | P2 | 1 day / 1 hr | v1.45.0.0 |
| Per-skill eval coverage badge in `gstack-skills list` + README | P3 | 1 day / 1 hr | Phase 0 |
| Cross-tool portability test/demo (Codex CLI, Cursor) | P3 | 2 days / 2 hrs | v2.0.0.0 |
| Token-cost preview on skill invocation | P3 | 1 day / 1 hr | gstack budget CLI |
| Skill autoload telemetry (dead-weight detection) | P3 | 2 days / 2 hrs | v1.45.0.0 |
| `gstack diff` PR comment (per-PR budget delta) | P3 | 1 day / 1 hr | budget-regression extended |
| Section-level eval annotations visible to user (confidence signal) | P3 | half day / 30 min | Phase C |
## Critical files
| Path | Change | Phase |
|---|---|---|
| `scripts/gen-skill-docs.ts` | Add resolver gate check (~line 444); read explain_level from config; add CI orphan walker | A, C |
| `scripts/resolvers/types.ts` | Add `ResolverEntry` union type | A |
| `scripts/resolvers/index.ts` | Wrap heavy resolvers with `appliesTo` predicates (audit all 21) | A |
| `scripts/resolvers/preamble/generate-writing-style.ts` | Replace inline jargon; return `''` on terse | A |
| `scripts/resolvers/preamble/generate-completeness.ts` | Return `''` on terse | A |
| `scripts/resolvers/preamble/generate-confusion-protocol.ts` | Return `''` on terse | A |
| `scripts/resolvers/preamble/generate-context-health.ts` | Return `''` on terse | A |
| `scripts/skill-catalog.ts` (new or in gen-skill-docs) | One-line catalog generator + voice-triggers JSON splitter | A.4 |
| `scripts/proactive-suggestions.json` (new) | Voice triggers + proactive suggestions, loaded on demand | A.4 |
| `test/skill-coverage-matrix.ts` (new) | Single-source-of-truth eval registry | Phase 0 |
| `test/skill-coverage-matrix.test.ts` (new) | CI gate: every skill has entries | Phase 0 |
| `test/skill-e2e-*.test.ts` (~20 new files) | New evals for skills currently lacking coverage | Phase 0 |
| `test/skill-e2e-budget-regression.test.ts` | Extend with per-skill hard budgets | A.6 |
| `test/helpers/touchfiles.ts` | Register new tests for diff-based selection | Phase 0 |
| `ship/SKILL.md.tmpl``ship/sections/manifest.json` + `ship/sections/*.md` | Skeleton extraction | B |
| `plan-ceo-review/SKILL.md.tmpl` → sections/ | Skeleton extraction | B |
| `office-hours/SKILL.md.tmpl` → sections/ | Skeleton extraction | B |
| `plan-eng-review/SKILL.md.tmpl` → sections/ | Skeleton extraction | B |
| `plan-design-review/SKILL.md.tmpl` → sections/ | Skeleton extraction | B |
| `gstack-upgrade/migrations/v2.0.0.0.sh` (new) | Auto-regenerate + vendored-install warning | B |
| `CHANGELOG.md` | v1.45.0.0 entry (normal), v2.0.0.0 entry (marketing-grade w/ numbers table) | A, B |
| `README.md` | v2.0.0.0 banner; "lightest opinionated skill pack" positioning | B |
| `CONTRIBUTING.md` | Document sections/ pattern + rollback procedure | B |
## Verification
**v1.45.0.0:**
1. `bun run gen:skill-docs` succeeds with no errors
2. `bun test` passes (skill-validation, gen-skill-docs.test.ts, browse integration, NEW skill-coverage-matrix.test.ts)
3. `bun run test:evals` passes — all new gate evals green; no regression on existing evals
4. `bun run test:evals:periodic` passes — all new periodic evals green
5. Catalog system-prompt size measured: target ≤8K tokens (vs ~25K current). Capture before/after in PR body.
6. Total SKILL.md corpus byte count: target ≤1.3MB (vs 2.1MB). Capture in PR body.
7. Top 3 heaviest skills under 100KB.
8. Manual smoke: invoke `/ship`, `/plan-ceo-review`, `/office-hours` in fresh Claude Code sessions; confirm no missing behavior. Save transcripts as v1.45 baselines.
**v2.0.0.0:**
1. All v1.45 checks pass
2. Sectioned skills: total corpus ≤700KB; heavyweight skeletons ≤30KB each
3. `test/skill-e2e-ship-section-loading.test.ts` (new): asserts `/ship` Reads expected sections per decision tree
4. Canary cohort: 1 week dogfood at v2.0.0-rc.1 with transcript logging; zero Read-miss for marked-required sections
5. Top 5 workflows manually verified; transcripts compared against v1.45 baselines
6. Migration: `gstack-upgrade` on a v1.45 install successfully regenerates without prompts; vendored-install warning appears once
7. CHANGELOG numbers table matches measured reality
8. WARN-mode orphan check: PR summary shows orphan list; build passes
## Cross-model agreements baked in
Items from Codex's review accepted and integrated above:
- #4 Warn-before-fail eval annotations (Phase C)
- #5 Coverage semantics in annotation comments, not just paths
- #6 Catalog trim moved up to Phase A (was buried after sections/)
- #9 cso gets resolver dedup + catalog trim (not blanket exempt)
- #10 Hard token budgets defined + enforced (Phase A.6)
- #11 Forks/customization compatibility documented (Migration section)
- #12 Rollout strategy with canary cohort + manual top-5-workflows verification (Rollout section)
Items from Codex's review explicitly rejected by user (D9, D10):
- #1 Eval-first scope: user kept full tiered coverage. Mitigated by structural-eval guidance (not output-content) for orphan/judgment skills.
- #7 v2.0.0.0 vs v1.x: user chose HYBRID. v1.45 absorbs low-risk wins; v2.0.0.0 carries the genuinely-breaking sections/ change.
Item where user accepted Codex over original pick:
- #8 Migration approach: user moved from hard-cut (D7) to lighter touch (D11) once v1.45 absorbed the low-risk work.
## Implementation Tasks
Synthesized from this review's findings. Each task derives from a specific phase/finding above. T1-T8 land in v1.45.0.0; T9-T16 land in v2.0.0.0.
- [ ] **T1 (P1, human: ~3 days / CC: ~7 hours)** — Phase 0 / coverage matrix — write gate+periodic evals for all 20 skills lacking coverage
- Surfaced by: Phase 0 section
- Files: `test/skill-coverage-matrix.ts`, `test/skill-coverage-matrix.test.ts`, ~20 new `test/skill-e2e-*.test.ts`, `test/helpers/touchfiles.ts`
- Verify: `bun test test/skill-coverage-matrix.test.ts` and `bun run test:evals` both pass with new evals
- [ ] **T2 (P1, human: ~1 day / CC: ~1 hour)** — A.1 conditional resolver injection — add `appliesTo` gate
- Surfaced by: Phase A section, Codex #10 (measurement before architecture)
- Files: `scripts/resolvers/types.ts`, `scripts/gen-skill-docs.ts:444`, `scripts/resolvers/index.ts`
- Verify: `bun run gen:skill-docs` produces smaller SKILL.md files; `bun test` passes
- [ ] **T3 (P1, human: ~half day / CC: ~30 min)** — A.2 + A.3 jargon dedup + terse-mode gen-time compression
- Surfaced by: Phase A section
- Files: `scripts/resolvers/preamble/generate-writing-style.ts`, `generate-completeness.ts`, `generate-confusion-protocol.ts`, `generate-context-health.ts`
- Verify: jargon-list no longer appears inlined in generated SKILL.md; `gstack-config set explain_level terse && bun run gen:skill-docs` produces shorter files
- [ ] **T4 (P1, human: ~1 day / CC: ~2 hours)** — A.4 catalog trim — one-line skill descriptions; voice triggers + proactive paragraphs moved to JSON
- Surfaced by: Codex #6 (highest-leverage), Phase A.4
- Files: `scripts/skill-catalog.ts` (new), `scripts/proactive-suggestions.json` (new), per-skill SKILL.md.tmpl frontmatter for one-line description field
- Verify: catalog system-prompt size <8K tokens; voice-triggered invocation still works
- [ ] **T5 (P1, human: ~half day / CC: ~30 min)** — A.6 hard token budgets in budget-regression
- Surfaced by: Codex #10
- Files: `test/skill-e2e-budget-regression.test.ts`
- Verify: budget-regression fails when artificially inflated test SKILL.md exceeds budget
- [ ] **T6 (P1, human: ~1 day / CC: ~1 hour)** — A.5 cso resolver dedup + catalog trim (NOT broader compression)
- Surfaced by: Codex #9
- Files: `cso/SKILL.md.tmpl` (no structural change, only resolver gate audit)
- Verify: cso SKILL.md size drops 20-30%; cso E2E evals still pass
- [ ] **T7 (P1, human: ~1 day / CC: ~1 hour)** — Regenerate all SKILL.md atomically + measure
- Surfaced by: Phase A
- Files: all `*/SKILL.md` regenerated
- Verify: PR body includes before/after corpus size, top 10 skill sizes, catalog size; budget-regression confirms targets met
- [ ] **T8 (P2, human: ~half day / CC: ~30 min)** — v1.45.0.0 CHANGELOG entry (normal voice; note that Phase 0 + Phase A landed)
- Surfaced by: Release shape section
- Files: `CHANGELOG.md`, `VERSION`
- Verify: CHANGELOG lints clean; reverse-chrono order preserved; entry covers the diff
- [ ] **T9 (P1, human: ~2 days / CC: ~3 hours)** — Phase B.1 convert ship/ to skeleton + sections/
- Surfaced by: Phase B section
- Files: `ship/SKILL.md.tmpl` → skeleton; `ship/sections/manifest.json` + `ship/sections/*.md`
- Verify: new `test/skill-e2e-ship-section-loading.test.ts` asserts expected Reads per decision tree; existing ship evals pass; ship.md skeleton <15KB
- [ ] **T10 (P1, human: ~1 day / CC: ~1 hour)** — Canary cohort for ship/ (1 week dogfood at v2.0.0-rc.1)
- Surfaced by: Rollout strategy section, Codex #12
- Files: `test/helpers/transcript-section-logger.ts` (new)
- Verify: zero Read-miss on marked-required sections in dogfood transcripts
- [ ] **T11 (P1, human: ~2 days / CC: ~3 hours)** — Phase B.2 convert plan-ceo-review/ (after ship/ proven)
- Surfaced by: Phase B section
- Files: `plan-ceo-review/SKILL.md.tmpl` + `plan-ceo-review/sections/`
- Verify: section-loading test green; plan-ceo evals pass
- [ ] **T12 (P2, human: ~3 days / CC: ~4 hours)** — Phase B.3 + B.4 convert office-hours/ + plan-eng-review/ + plan-design-review/
- Surfaced by: Phase B section
- Files: respective `SKILL.md.tmpl` + `sections/` directories
- Verify: section-loading tests green; respective evals pass
- [ ] **T13 (P1, human: ~1 day / CC: ~1 hour)** — Phase C eval annotations + WARN-mode CI orphan check
- Surfaced by: Phase C section, Codex #4 + #5
- Files: `scripts/gen-skill-docs.ts` (orphan walker), all `sections/*.md` (annotations with coverage semantics)
- Verify: orphan check reports correctly in PR summary; build still passes in WARN mode
- [ ] **T14 (P1, human: ~half day / CC: ~30 min)**`gstack-upgrade/migrations/v2.0.0.0.sh` lighter-touch auto-regenerate
- Surfaced by: Migration approach section
- Files: `gstack-upgrade/migrations/v2.0.0.0.sh`
- Verify: upgrade from v1.45 install produces clean v2 state without prompts; vendored install gets one-line warning
- [ ] **T15 (P1, human: ~half day / CC: ~1 hour)** — v2.0.0.0 marketing-grade CHANGELOG with v1 vs v2 numbers table
- Surfaced by: D5, Release shape, Codex #7 (real breakage documented)
- Files: `CHANGELOG.md`, `VERSION`, `README.md` (v2 banner)
- Verify: numbers table matches measured corpus; release note documents concrete breakage (sections/ format change, first-invocation latency, vendored-install deprecation); positioning past-tenses bloat reputation
- [ ] **T16 (P2, human: ~1 day / CC: ~1 hour)** — Bulk-add 9 deferred TODOS to TODOS.md (gstack lite, gstack budget, etc.)
- Surfaced by: TODOS.md updates section
- Files: `TODOS.md`
- Verify: TODOS format matches `.claude/skills/review/TODOS-format.md`
## Failure Modes Registry
| Codepath | Failure mode | Rescued? | Test? | User sees | Logged |
|---|---|---|---|---|---|
| gen-skill-docs.ts gate check | resolver `appliesTo` throws | Y — try/catch logs + skips resolver | Y (test/gen-skill-docs.test.ts extended) | "resolver X errored, skipped" in build output | stderr |
| sections/ Read at runtime | section file missing | Y — agent falls back to skeleton-only behavior | Y (test/skill-e2e-ship-section-loading.test.ts) | warning in agent prose | session transcript |
| CI orphan walker | sections/*.md missing eval annotation | WARN mode v2.0; FAIL v2.1+ | Y (test/skill-coverage-matrix.test.ts) | PR summary lists orphans | PR comment |
| Migration script v2.0.0.0.sh | regenerate fails on damaged install | Y — script aborts, prints repair steps | Y (migration test) | clear error + repair steps | ~/.gstack/analytics/migrations.jsonl |
| Catalog one-line generator | skill missing one-line description in frontmatter | Y — gen-skill-docs fails build loudly | Y (gen-skill-docs.test.ts extended) | build error | stderr |
| Canary section-Read logger | logger missing for a heavyweight skill | Y — silently skipped, gap visible in dashboard | Y (transcript-logger test) | none directly; surfaced in canary dashboard | ~/.gstack/analytics/section-reads.jsonl |
No critical gaps — every failure mode has a rescue, a test, and visibility.
## Diagrams
System architecture (build pipeline):
```
CONFIG (~/.gstack/config.yaml)
|
v
+-----------------+ +--------------------+
| gen-skill-docs | <--- | resolvers/*.ts |
| (with gate) | | (w/ appliesTo) |
+-----------------+ +--------------------+
|
v
+--------------------------+
| SKILL.md.tmpl per skill |
| + sections/manifest.json | (heavyweights only, v2)
| + sections/*.md | (heavyweights only, v2)
+--------------------------+
|
v
+--------------------+ +--------------------------+
| generated SKILL.md | <-----> | scripts/jargon-list.json |
| (skeleton for | | (referenced, not inlined)|
| heavyweights v2) | +--------------------------+
+--------------------+
|
v
+-------------------+ +----------------------+
| catalog (system | <--- | proactive-suggestions|
| prompt, one-line | | .json (loaded on |
| per skill) | | demand only) |
+-------------------+ +----------------------+
```
Section-Read flow (v2 runtime):
```
USER /ship
|
v
+-----------------------+
| ship/SKILL.md |
| (12-15KB skeleton) |
| reads: |
| - manifest.json |
| - decision tree |
+-----------------------+
|
v Agent walks decision tree, identifies which sections apply
|
+-----> Read sections/version-bump.md (if bumping)
+-----> Read sections/changelog.md (if writing entry)
+-----> Read sections/review-army.md (if pre-ship review)
+-----> ... only sections that apply
|
v
+-------------------------+
| end-of-skill self-check |
| "list sections I read" |
+-------------------------+
|
v Canary cohort: transcript-section-logger compares
| actual Reads vs manifest's required_for declarations
| alerts on miss
```
## Stale diagram audit
ASCII diagrams in CLAUDE.md / ARCHITECTURE.md that this plan affects:
| Diagram | File | Still accurate post-v2? |
|---|---|---|
| Sidebar message flow | `docs/designs/SIDEBAR_MESSAGE_FLOW.md` | YES (unrelated subsystem) |
| Dual-listener tunnel architecture | `ARCHITECTURE.md` | YES (unrelated) |
| Unicode sanitization at server egress | `ARCHITECTURE.md` | YES (unrelated) |
| (none for skill build pipeline) | — | New diagrams above are NEW, not updates |
No stale diagrams to fix.
## Completion summary
```
+====================================================================+
| MEGA PLAN REVIEW — COMPLETION SUMMARY |
+====================================================================+
| Mode selected | SCOPE EXPANSION |
| System Audit | bloat externally documented; prior design |
| | doc unrelated; budget-regression infra exists|
| Step 0 | EXPANSION + Approach C + eval-first + |
| | hybrid v1.45/v2.0 split + lighter migration |
| Section 1 (Arch) | 1 finding — silent-loss risk, 6-layer mit |
| Section 2 (Errors) | 6 failure modes mapped, 0 CRITICAL GAPS |
| Section 3 (Security)| cso targeted dedup (Codex #9 absorbed) |
| Section 4 (Data/UX) | v1→v2 muscle memory warned, vendored noted |
| Section 5 (Quality) | ~150 LOC additive, mechanical extraction |
| Section 6 (Tests) | Phase 0 IS the test plan |
| Section 7 (Perf) | <2× build time; +200-500ms first-invoke v2 |
| Section 8 (Observ) | budget-regression + canary + migrations.log |
| Section 9 (Deploy) | 2-release split + warn-before-fail + revert |
| Section 10 (Future) | Reversibility 3/5; sections/ becomes template|
| Section 11 (Design) | README banner + numbers table |
+--------------------------------------------------------------------+
| NOT in scope | written (9 items deferred) |
| What already exists | written (9 reuse points) |
| Dream state delta | written (TODAY / v1.45 / v2.0) |
| Error/rescue registry| 6 modes, 0 CRITICAL GAPS |
| Failure modes | covered in registry |
| TODOS.md updates | 9 items, bulk-add post-merge |
| Scope proposals | 3 surfaced, 1 accepted (launch positioning) |
| CEO plan | this plan IS the CEO plan |
| Outside voice | ran (codex); 3 tensions surfaced |
| Lake Score | 11/11 recommendations chose complete option |
| Diagrams produced | 2 (build pipeline, section-read flow) |
| Stale diagrams found | 0 |
| Unresolved decisions | 0 |
+====================================================================+
```
## Eng-review additions (from /plan-eng-review session)
### Architectural decisions locked in
- **D1 (manifest format):** `sections/manifest.json` is the structured per-heavyweight registry (JSON, machine-readable for gen-skill-docs CI checks). SKILL.md skeleton is markdown headers + imperative prose blocks ("STOP. If X, Read `sections/Y.md`"). Matches Anthropic's documented `references/` style. No invented DSL.
- **D2 (drift control):** `sections/*.md.tmpl` is the source of truth; `sections/*.md` is generated. gen-skill-docs walks `<skill>/sections/*.tmpl` and writes `<skill>/sections/*.md` using the same resolver pipeline as SKILL.md. Cost: ~30 LOC in `scripts/gen-skill-docs.ts`. Eliminates the drift class that `test/ship-version-sync.test.ts` already suffers from (TODOS:1120).
- **D3 (CI cost cap):** `EVALS_BUDGET_HARD_CAP=$30` env var enforced by `test/skill-e2e-budget-regression.test.ts`; build fails if a single run exceeds. Section-loading tests (Phase B) use minimal-bash fixtures (~$0.30 each) because they assert STRUCTURAL behavior (was the right file Read?) not output quality.
### Adjacent TODOS surfaced (informational, not blocking)
- **TODOS:161** — planned "resolver injection at session start" for browser-skills (P2). Has architectural overlap with this plan's `appliesTo` predicate. Decision: keep separate for now — browser-skill resolver injection is runtime (session-start hostname matching); our `appliesTo` is build-time (gen-skill-docs.ts). Different lifecycles, different concerns. Revisit only if the browser-skills work needs the same predicate shape.
- **TODOS:1120**`test/ship-version-sync.test.ts` reimplements ship/SKILL.md.tmpl Step 12 bash. D2 (sections/*.md.tmpl pipeline) is the structural fix. Phase B work obviates this TODO; mark as resolved when ship/ extraction lands.
- **TODOS:1136**`git show` fallback in ship/SKILL.md.tmpl Step 12 line 409. Phase B touches this; bundle the `git rev-parse --verify` fix into the version-bump section extraction.
### Test plan artifact
Test plan written to `~/.gstack/projects/garrytan-gstack/garrytan-garrytan-slim-skill-tokens-eng-review-test-plan-<timestamp>.md`. `/qa` and `/qa-only` consume this as primary test input. Covers: per-phase test coverage targets, fixture design for section-loading tests, CI budget enforcement check, migration round-trip test.
### Failure modes additions
Adding to the registry from §Failure Modes (already complete; new rows):
| Codepath | Failure mode | Rescued? | Test? | User sees | Logged |
|---|---|---|---|---|---|
| sections/*.md.tmpl generator | template references missing resolver | Y — gen-skill-docs fails build loudly | Y (gen-skill-docs.test.ts extended) | build error | stderr |
| Manifest ↔ filesystem consistency | manifest references section file that doesn't exist | Y — CI check fails | Y (new `test/section-manifest-consistency.test.ts`) | build error | PR summary |
| Manifest ↔ filesystem consistency | section file exists but not in manifest (orphan) | WARN v2.0; FAIL v2.1+ | Y (same test) | PR summary | PR comment |
| Budget cap exceeded | single test or aggregate exceeds `EVALS_BUDGET_HARD_CAP` | Y — CI fails | Y (budget-regression extended) | build error w/ cost breakdown | stderr |
Still 0 critical gaps. All new failure modes have rescue + test + visibility.
### Execution sequencing (sequential v1.45, integration-branch v2.0)
v1.45 runs **sequentially** in a single branch, T1 → T8. The parallelization map was reconsidered after codex's second-pass critique flagged that T2 (gen-skill-docs.ts TemplateContext changes) and T4 (catalog frontmatter additions) almost certainly touch each other at compile time — both branches passing alone, failing at integration. Sequential lands cleaner and avoids 3-way merge surprise. AI compression makes the wall-clock cost of sequential acceptable.
| Step | Modules touched | Depends on |
|---|---|---|
| T1 Phase 0 evals (~20 files) | `test/skill-e2e-*.test.ts`, `test/skill-coverage-matrix.ts`, `test/helpers/touchfiles.ts` | — |
| T2 conditional resolver gate | `scripts/gen-skill-docs.ts`, `scripts/resolvers/types.ts`, `scripts/resolvers/index.ts` | T1 |
| T3 jargon dedup + terse compression | `scripts/resolvers/preamble/*` | T2 |
| T4 catalog trim | `scripts/skill-catalog.ts`, `scripts/proactive-suggestions.json`, all SKILL.md.tmpl frontmatter | T2 |
| T5 hard token budgets + override path | `test/skill-e2e-budget-regression.test.ts` (per-suite caps + `EVALS_BUDGET_OVERRIDE_REASON`) | T1 |
| T6 cso targeted dedup | `cso/SKILL.md.tmpl` | T2, T3 |
| T7 regenerate all SKILL.md atomically | all `*/SKILL.md` | T1-T6 |
| T8 v1.45 CHANGELOG | `CHANGELOG.md`, `VERSION` | T7 |
| **— v1.45.0.0 ship boundary —** | | |
| T9 ship/ sections/ extraction | `ship/SKILL.md.tmpl`, `ship/sections/*`, gen-skill-docs (sections pipeline w/ TemplateContext contract) | T8 + sections-pipeline (T2/D2) |
| T10 ship/ canary cohort | `test/helpers/transcript-section-logger.ts` | T9 |
| T11 plan-ceo-review sections/ | `plan-ceo-review/SKILL.md.tmpl` + sections | T10 (ship/ proven) |
| T12 office-hours + plan-eng + plan-design sections/ | respective directories | T11 |
| T13 Phase C eval annotations + 3-tier orphan check | gen-skill-docs.ts orphan walker, all sections/*.md | T9-T12 |
| T14 migration script | `gstack-upgrade/migrations/v2.0.0.0.sh` | T13 |
| T15 v2.0.0.0 CHANGELOG + README banner | `CHANGELOG.md`, `README.md`, `VERSION` | T14 |
| T16 TODOS bulk-add | `TODOS.md` | — anytime |
**Execution recommendation:** single-worktree sequential for both v1.45 (T1→T8) and v2.0 (T9→T15). T16 lands whenever. The CC speedup comes from per-step compression (each step is ~1 hour vs human-days), not from parallel branches.
## Codex consult additions (second pass, post eng-review)
### Cathedral parity-eval suite (Phase 0 add-on, expanded to "11")
User said "do it like 11, not just 10. max it out and then some." Maxed-out scope:
- **ALL 31 skills** get golden-baseline transcripts (not just top 5)
- **Multiple fixtures per skill** (3-5 representative invocation paths each)
- **Quantitative + qualitative scoring:** LLM-as-judge similarity score (1-10) AND transcript-diff highlights (added/removed sections, missing nuance)
- **Token-efficiency ratio measured:** quality-per-token = judge_score / tokens_consumed (forces v2 to be measurably MORE efficient, not just smaller)
- **"Quality budget" alongside "token budget":** both enforced in CI. A v2 skill that compressed to half size but dropped from 9/10 quality to 6/10 fails the gate.
- **Side-by-side PR comment:** every PR that touches a heavyweight skill auto-posts a v1.45-baseline vs current parity comparison in the PR summary
- **Public benchmark page:** `gstack.benchmarks.md` (new), continuously updated. Quotable: "v2 average parity score: 9.2/10, average token reduction: 67%."
- **Continuous monitoring:** parity suite runs weekly on main; alerts if any skill drifts below baseline (Discord webhook or similar)
- **Baseline-capture script:** `test/helpers/capture-parity-baseline.ts` — run once at v1.44 HEAD to lock in golden transcripts before any Phase A work lands
Effort: human ~3-4 days / CC ~6-8 hours one-time + ~$30/week ongoing for continuous monitoring. Cost is justified — this is the ONLY mechanism that catches "looks green, feels worse" silent regression that section-loading and budget tests both miss. Adds new tasks T0a (baseline capture) and T0b (parity eval harness) BEFORE T1.
### Absorbed refinements from codex consult (no further user decision needed)
1. **TemplateContext contract for sections pipeline (codex D2 critique):** explicit spec required in T9. Section generation uses the SAME `TemplateContext` as SKILL.md generation — same `skillName`, same host suppression, same `explainLevel`, same tier gating. Documented in code comments + asserted by `test/template-context-parity.test.ts` (new).
2. **3-tier orphan classification (codex orphan-semantics critique):** the CI check (T13) distinguishes:
- **Generated orphan** (`sections/foo.md` exists, no `sections/foo.md.tmpl`) → FAIL immediately, every release
- **Manifest orphan** (`sections/foo.md.tmpl` exists, not in `manifest.json`) → WARN in v2.0, FAIL in v2.1+
- **Hand-edited generated file** (`sections/foo.md` diverges from what regen would produce) → FAIL immediately, with "this file is generated, edit `.tmpl` instead" message
3. **Budget cap override path (codex D3 critique):** `EVALS_BUDGET_HARD_CAP=$30` becomes the default; per-suite caps via `EVALS_BUDGET_HARD_CAP_GATE=$25`, `EVALS_BUDGET_HARD_CAP_PERIODIC=$70`; override path `EVALS_BUDGET_OVERRIDE_REASON="<text>"` env required to exceed cap (CI prints the reason in build output for audit trail); daily org-level spend alert via existing analytics (`~/.gstack/analytics/skill-usage.jsonl` aggregator).
4. **Manifest as passive data (codex D1 critique):** `manifest.json` fields are IDs, file paths, and human-readable trigger text ONLY. No `applies_when` predicate. The skill skeleton's decision-tree prose is the ONLY place "when to read X" is decided. Avoids inventing a fourth condition language alongside tier-gating + `appliesTo` + `requiredReads`.
5. **T7 as integration-branch flow (codex parallelization critique, now obviated by sequential):** sequential execution means T7 is just "atomic regenerate within the single v1.45 branch." Integration-branch dance not needed. The critique's intent (no 3-way merge surprise) is honored by collapsing to sequential.
### New failure modes (additions to registry)
| Codepath | Failure mode | Rescued? | Test? | User sees | Logged |
|---|---|---|---|---|---|
| Sections pipeline TemplateContext | sections generated with divergent ctx (e.g. wrong skillName) | Y — parity test fails | Y (`test/template-context-parity.test.ts`) | build error | stderr |
| Hand-edited generated section | user edits `sections/foo.md` directly instead of `.tmpl` | Y — CI fails with explicit message | Y (orphan-check 3-tier classification) | "this file is generated, edit `.tmpl` instead" | PR summary |
| Quality budget exceeded | v2 skill compressed but dropped >2 points on LLM-judge parity | Y — CI fails | Y (parity-eval suite) | "v2 X.md dropped from 9.2 to 6.4 vs v1.45 baseline" | PR comment with diff |
| Budget cap override audit | EVALS_BUDGET_OVERRIDE_REASON used | N (intentional escape valve) | Y (audit-log test) | reason printed in CI output, logged to spend-audit jsonl | analytics/spend-overrides.jsonl |
| Parity baseline drift on main | weekly continuous monitor detects regression | Y — Discord alert + ticket | Y (continuous-monitor test) | alert in team channel | analytics/parity-drift.jsonl |
Still 0 critical gaps.
## v2 launch copy specs (from /plan-devex-review)
These drafts become the source of truth for v2.0.0.0 launch tone. T15 implements them verbatim (unless workshopping at ship time produces a measurably better take, in which case update both plan and implementation in lockstep).
### JUST_UPGRADED notice (Persona A — existing user upgrading)
Triggered by `gstack-update-check` showing `JUST_UPGRADED v1.x v2.0.0.0`. Replaces the generic v1 "Running gstack v{to} (just updated!)" with persona-A-aware copy that names the perceived speed win AND signals "your muscle memory still works."
```
Running gstack v2.0.0.0 (just updated!) — your sessions are now ~67% lighter.
Heavyweight skills load only the sections they need; the catalog dropped to
one line per skill. Everything still works the same way — your /ship, /qa,
/review commands haven't changed. Run `/gstack-upgrade --explain-v2` for the
full migration story, or just keep working.
```
Voice rules honored: lead with the win ("67% lighter"); concrete numbers; reassurance that workflows are unchanged ("everything still works the same way"); escape hatch (`--explain-v2`). No em dashes. Aimed at a 5-second read.
Implementation: update `~/.claude/skills/gstack/gstack-upgrade/SKILL.md.tmpl` Inline upgrade flow with v2-aware message; existing `JUST_UPGRADED <from> <to>` detection in skill preamble fires it.
### CHANGELOG numbers table (Persona A's magical moment + Persona B's evaluation evidence)
Lands in `## [v2.0.0.0]` entry of CHANGELOG.md, immediately under the headline. Compare measured v1.44 actuals (baseline captured by `test/helpers/capture-parity-baseline.ts` BEFORE Phase A starts) vs v2.0.0.0 measured. Numbers must be REAL, not estimated; replace placeholders during T15.
| Metric | v1.44.1 (baseline) | v2.0.0.0 (measured) | Δ |
|---|---|---|---|
| Total SKILL.md corpus | 2.1 MB | ~700 KB | **67%** |
| ship.md (heaviest) | 164 KB | ~15 KB skeleton + 5×~5 KB sections | **76% first-Read** |
| plan-ceo-review.md | 131 KB | ~12 KB skeleton + sections on demand | **68% first-Read** |
| office-hours.md | 111 KB | ~10 KB skeleton + sections on demand | **71% first-Read** |
| Catalog tokens (always-loaded system prompt) | ~25K tokens | ~6K tokens | **76%** |
| Per-invocation tokens (typical /ship session) | ~41K | ~14K skeleton + on-demand sections | **~60% drop** |
| Eval coverage (skills with E2E protection) | ~16 of 31 | **31 of 31 + parity baselines** | quality gate enabled |
| Parity score vs v1.44 baseline (LLM judge, all 31 skills) | — | **≥9.0/10 floor** | (CI-enforced; see parity-eval suite) |
Below the table, one paragraph in gstack voice: "v1 was the heaviest opinionated skill pack. v2 is the lightest. The compression isn't free — every skill ships with both gate-tier and periodic-tier E2E evals, and a continuous parity-monitor catches silent quality regressions. The numbers above are measured against `test/helpers/parity-baseline-v1.44.1/` and reproduced by `bun run eval:parity`."
### README v2 banner
Placement: top of README.md, immediately under the existing Karpathy pull-quote, above "When I heard Karpathy say this..." Stays in place for 60 days post-launch, then collapses to a one-line "v2 released May 2026" entry in the Quick start section.
```markdown
> **gstack v2.0.0.0 — the lightest opinionated skill pack (May 2026)**
>
> Heavyweight skills now load only the sections they need. Total SKILL.md
> corpus dropped from 2.1 MB to ~700 KB. Every skill ships with E2E eval
> protection and a continuous parity-monitor against v1.44 baselines.
> See the [v2.0.0.0 release notes](CHANGELOG.md) for per-skill numbers and
> the migration story. Existing users: `/gstack-upgrade` auto-regenerates.
```
Voice rules honored: lead with the position ("lightest opinionated skill pack"); concrete numbers (2.1 MB → 700 KB); proof of rigor (eval protection + parity monitor); migration path explicit. No em dashes. Aimed at a 10-second read.
### Implementation notes (for T15)
- Lock the actual v1.44 baseline numbers into `test/helpers/parity-baseline-v1.44.1/` BEFORE Phase A regeneration starts. The "v1 vs v2" delta only quotes accurately if v1.44 was measured in the same units (token count via `tiktoken`, byte count via `wc -c`, eval coverage via `test/skill-coverage-matrix.ts`).
- If the measured v2 numbers come in LESS impressive than the drafts above (e.g., ship.md ends up at 25 KB instead of 15 KB), update the drafts to reflect reality. Never invent numbers; the marketing-grade ship moment dies the moment readers find a number they can disprove with `wc -c`.
- The JUST_UPGRADED notice fires automatically via existing `gstack-upgrade` detection — no new mechanism required.
- The README banner placement above the existing Karpathy quote is intentional: persona B (new evaluator) sees the v2 win BEFORE the Karpathy framing, anchoring "this is May 2026's most-current gstack."
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|---|---|---|---|---|---|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | SCOPE_EXPANSION mode; 3 expansion proposals (1 accepted: v2 launch positioning; 2 deferred: gstack lite, gstack budget); 11/11 sections reviewed; 0 critical gaps |
| Codex Review | `/codex review` | Independent 2nd opinion (outside voice) | 1 | issues_found | 12 challenges surfaced; 7 absorbed into plan (#4, #5, #6, #9, #10, #11, #12); 3 surfaced as user-decision (#1 user kept original pick, #7 hybrid split adopted, #8 user accepted codex) |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 1 | CLEAR | 3 architectural decisions locked (D1 JSON manifest, D2 sections/*.md.tmpl pipeline, D3 CI cost cap); 4 new failure modes added (all rescued+tested); test plan artifact written; parallelization map produced (3 lanes parallel in v1.45, sequential in v2.0); 0 critical gaps; 0 unresolved decisions |
| Codex Consult (2nd pass) | `/codex` (consult on eng-review additions) | Independent challenge of D1/D2/D3 + parallelization | 1 | issues_found | 7 additional findings on eng-review additions; 5 absorbed (TemplateContext contract, 3-tier orphan classification, budget cap override path, manifest as passive data not predicates, T7 as integration-flow obviated by sequential); 2 surfaced as user-decision (attention-architecture risk → cathedral parity-eval suite added at "11"; parallelization collapsed to sequential v1.45 per codex critique) |
| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | not required (no significant UI scope; README/CHANGELOG only) |
| DX Review | `/plan-devex-review` | Developer experience gaps | 1 | CLEAR | DX POLISH mode; product type = Claude Code Skill; 2 personas tracked equally (existing-user upgrader + new-user evaluator); initial 7.9/10 → 9.0/10 after launch-copy specs added to plan (JUST_UPGRADED notice, CHANGELOG numbers table, README v2 banner all drafted as T15 deliverables); all 8 passes evaluated; skill DX checklist passes |
**CODEX:** First pass (CEO): 12 findings, 7 absorbed, 3 cross-model user-decided, 2 baked into tasks. Second pass (post eng-review): 7 findings on the new D1/D2/D3 additions, 5 absorbed, 2 user-decided. Both passes preserved as audit trail. 19 total codex findings → 12 absorbed without friction, 5 user-decided across both passes, 2 quality-of-life refinements baked into tasks. DX review skipped fresh codex pass (3 prior passes already covered structural blind spots; remaining DX work is copy-craft, where codex adds less value than user taste).
**CROSS-MODEL:** Strong agreement on (a) phasing (catalog trim early, sections/ later), (b) measurement-first (hard token budgets + override audit trail), (c) forks/rollout-strategy gaps. Tensions resolved across all passes: eval-first scope (user kept), v2 vs v1.x (HYBRID adopted), migration heaviness (lighter touch adopted), parallelization (user accepted codex's sequential critique), attention-architecture risk (user expanded scope to cathedral parity-eval suite covering ALL 31 skills with quality budget alongside token budget), launch copy artifacts (user drafted all three in plan vs deferring to T15 implementation).
**UNRESOLVED:** 0 decisions outstanding across all 5 reviews.
**VERDICT:** CEO + ENG + CODEX×2 + DX CLEARED — ready to implement. The hybrid v1.45/v2.0 split de-risks the bloat-reputation fix; the sections/*.md.tmpl pipeline (D2) prevents drift; the CI cost cap with override audit (D3 + codex absorbed refinement) prevents runaway eval spend; the cathedral parity-eval suite (codex 2nd pass) catches silent attention-architecture regressions that section-loading + budget tests alone would miss; sequential v1.45 execution (codex absorbed) trades wall-clock for integration safety; v2 launch copy specs (DX review) make the marketing-grade ship moment land for both persona A (existing upgrader) and persona B (new evaluator). Plan is now executable.

View File

@ -2,13 +2,7 @@
name: document-generate
preamble-tier: 2
version: 1.0.0
description: |
Generate missing documentation from scratch for a feature, module, or entire project.
Uses the Diataxis framework (tutorial / how-to / reference / explanation) to produce
complete, structured documentation. Can be invoked standalone or called by
/document-release when it finds coverage gaps. Use when asked to "write docs",
"generate documentation", "document this feature", "create a tutorial", or
"explain this module". (gstack)
description: Generate missing documentation from scratch for a feature, module, or entire project. (gstack)
allowed-tools:
- Bash
- Read
@ -29,6 +23,15 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Uses the Diataxis framework (tutorial / how-to / reference / explanation) to produce
complete, structured documentation. Can be invoked standalone or called by
/document-release when it finds coverage gaps. Use when asked to "write docs",
"generate documentation", "document this feature", "create a tutorial", or
"explain this module".
## Preamble (run first)
```bash
@ -569,84 +572,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: document-release
preamble-tier: 2
version: 1.0.0
description: |
Post-ship documentation update. Reads all project docs, cross-references the
diff, builds a Diataxis coverage map (reference/how-to/tutorial/explanation),
updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped,
detects architecture diagram drift, polishes CHANGELOG voice with a sell-test
rubric, cleans up TODOS, and optionally bumps VERSION. Surfaces documentation
debt in the PR body. Use when asked to "update the docs", "sync documentation",
or "post-ship docs". Proactively suggest after a PR is merged or code is shipped. (gstack)
description: Post-ship documentation update. (gstack)
allowed-tools:
- Bash
- Read
@ -26,6 +19,17 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Reads all project docs, cross-references the
diff, builds a Diataxis coverage map (reference/how-to/tutorial/explanation),
updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped,
detects architecture diagram drift, polishes CHANGELOG voice with a sell-test
rubric, cleans up TODOS, and optionally bumps VERSION. Surfaces documentation
debt in the PR body. Use when asked to "update the docs", "sync documentation",
or "post-ship docs". Proactively suggest after a PR is merged or code is shipped.
## Preamble (run first)
```bash
@ -566,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,12 +1,7 @@
---
name: freeze
version: 0.1.0
description: |
Restrict file edits to a specific directory for the session. Blocks Edit and
Write outside the allowed path. Use when debugging to prevent accidentally
"fixing" unrelated code, or when you want to scope changes to one module.
Use when asked to "freeze", "restrict edits", "only edit this folder",
or "lock down edits". (gstack)
description: Restrict file edits to a specific directory for the session. (gstack)
triggers:
- freeze edits to directory
- lock editing scope
@ -31,6 +26,15 @@ hooks:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Blocks Edit and
Write outside the allowed path. Use when debugging to prevent accidentally
"fixing" unrelated code, or when you want to scope changes to one module.
Use when asked to "freeze", "restrict edits", "only edit this folder",
or "lock down edits".
# /freeze — Restrict Edits to a Directory
Lock file edits to a specific directory. Any Edit or Write operation targeting

View File

@ -1,11 +1,7 @@
---
name: gstack-upgrade
version: 1.1.0
description: |
Upgrade gstack to the latest version. Detects global vs vendored install,
runs the upgrade, and shows what's new. Use when asked to "upgrade gstack",
"update gstack", or "get latest version".
Voice triggers (speech-to-text aliases): "upgrade the tools", "update the tools", "gee stack upgrade", "g stack upgrade".
description: Upgrade gstack to the latest version.
triggers:
- upgrade gstack
- update gstack version
@ -19,6 +15,15 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Detects global vs vendored install,
runs the upgrade, and shows what's new. Use when asked to "upgrade gstack",
"update gstack", or "get latest version".
Voice triggers (speech-to-text aliases): "upgrade the tools", "update the tools", "gee stack upgrade", "g stack upgrade".
# /gstack-upgrade
Upgrade gstack to the latest version and show what's new.

View File

@ -13,6 +13,12 @@
#
# Idempotent: each insertion is gated on `not already present` so re-running
# the migration is a no-op.
#
# Done-marker discipline (#1581): the marker is only written when every
# required repair either succeeded or was provably unnecessary. Tracking
# happens via the `incomplete` flag; on any failure path (missing jq, broken
# JSON, append failure, mv failure) we set `incomplete=1` and skip the touch
# so the migration runner retries on the next /gstack-upgrade.
set -u
@ -34,19 +40,30 @@ NEW_PATTERNS=(
)
added_any=0
incomplete=0
# ----- .brain-allowlist ---------------------------------------------------
if [ -f "${ALLOWLIST}" ]; then
for PATTERN in "${NEW_PATTERNS[@]}"; do
if ! grep -Fq -- "${PATTERN}" "${ALLOWLIST}" 2>/dev/null; then
if grep -q '^# ---- USER ADDITIONS BELOW' "${ALLOWLIST}" 2>/dev/null; then
sed -i.bak "/^# ---- USER ADDITIONS BELOW/i\\
if sed -i.bak "/^# ---- USER ADDITIONS BELOW/i\\
${PATTERN}
" "${ALLOWLIST}" && rm -f "${ALLOWLIST}.bak"
added_any=1
" "${ALLOWLIST}" 2>/dev/null; then
rm -f "${ALLOWLIST}.bak"
added_any=1
else
echo " [v1.40.0.0] WARN: failed to insert ${PATTERN} into ${ALLOWLIST}; will retry on next upgrade." >&2
rm -f "${ALLOWLIST}.bak" 2>/dev/null || true
incomplete=1
fi
else
printf '%s\n' "${PATTERN}" >> "${ALLOWLIST}"
added_any=1
if printf '%s\n' "${PATTERN}" >> "${ALLOWLIST}" 2>/dev/null; then
added_any=1
else
echo " [v1.40.0.0] WARN: failed to append ${PATTERN} to ${ALLOWLIST}; will retry on next upgrade." >&2
incomplete=1
fi
fi
fi
done
@ -55,19 +72,39 @@ fi
# ----- .brain-privacy-map.json -------------------------------------------
if [ -f "${PRIVACY}" ]; then
if command -v jq >/dev/null 2>&1; then
for PATTERN in "${NEW_PATTERNS[@]}"; do
if ! jq -e --arg p "${PATTERN}" 'map(select(.pattern == $p)) | length > 0' "${PRIVACY}" >/dev/null 2>&1; then
if jq --arg p "${PATTERN}" '. += [{"pattern": $p, "class": "artifact"}]' "${PRIVACY}" > "${PRIVACY}.tmp" 2>/dev/null; then
mv "${PRIVACY}.tmp" "${PRIVACY}"
added_any=1
else
rm -f "${PRIVACY}.tmp"
echo " [v1.40.0.0] WARN: jq failed to patch ${PRIVACY}; skipping pattern ${PATTERN}." >&2
# Validate JSON shape up front. We won't try to repair a corrupt file —
# bail out and leave for manual fix.
if ! jq -e . "${PRIVACY}" >/dev/null 2>&1; then
echo " [v1.40.0.0] WARN: ${PRIVACY} is not valid JSON; skipping privacy-map repair. Fix manually or run gstack-artifacts-init." >&2
incomplete=1
else
for PATTERN in "${NEW_PATTERNS[@]}"; do
if ! jq -e --arg p "${PATTERN}" 'map(select(.pattern == $p)) | length > 0' "${PRIVACY}" >/dev/null 2>&1; then
tmp=$(mktemp "${PRIVACY}.tmp.XXXXXX" 2>/dev/null)
if [ -z "${tmp}" ] || [ ! -f "${tmp}" ]; then
echo " [v1.40.0.0] WARN: failed to create tempfile for ${PRIVACY}; skipping pattern ${PATTERN}." >&2
incomplete=1
continue
fi
if jq --arg p "${PATTERN}" '. += [{"pattern": $p, "class": "artifact"}]' "${PRIVACY}" > "${tmp}" 2>/dev/null; then
if mv "${tmp}" "${PRIVACY}" 2>/dev/null; then
added_any=1
else
echo " [v1.40.0.0] WARN: failed to rewrite ${PRIVACY}; skipping pattern ${PATTERN}." >&2
rm -f "${tmp}"
incomplete=1
fi
else
echo " [v1.40.0.0] WARN: jq mutation failed for ${PRIVACY}; skipping pattern ${PATTERN}." >&2
rm -f "${tmp}"
incomplete=1
fi
fi
fi
done
done
fi
else
echo " [v1.40.0.0] WARN: jq not found; skipping privacy-map repair. Install jq and re-run gstack-upgrade, or run gstack-artifacts-init manually." >&2
incomplete=1
fi
fi
@ -76,19 +113,27 @@ if [ -f "${GITATTRS}" ]; then
for PATTERN in "${NEW_PATTERNS[@]}"; do
RULE="${PATTERN} merge=union"
if ! grep -Fq -- "${RULE}" "${GITATTRS}" 2>/dev/null; then
printf '%s\n' "${RULE}" >> "${GITATTRS}"
added_any=1
if printf '%s\n' "${RULE}" >> "${GITATTRS}" 2>/dev/null; then
added_any=1
else
echo " [v1.40.0.0] WARN: failed to append rule to ${GITATTRS}; will retry on next upgrade." >&2
incomplete=1
fi
fi
done
fi
# Mark done even if no patches needed — a fresh-init user's
# bin/gstack-artifacts-init now writes the pattern directly, so re-runs
# should no-op. The touchfile keeps the migration runner from looping.
touch "${DONE}"
if [ "${added_any}" = "1" ]; then
echo " [v1.40.0.0] allowlist/privacy-map/gitattributes patched for /plan-eng-review test plans (idempotent)" >&2
if [ "${incomplete}" = "0" ]; then
# Mark done — every required repair either succeeded or was provably
# unnecessary. A fresh-init user's bin/gstack-artifacts-init now writes the
# pattern directly, so re-runs no-op. The touchfile keeps the migration
# runner from looping.
touch "${DONE}"
if [ "${added_any}" = "1" ]; then
echo " [v1.40.0.0] allowlist/privacy-map/gitattributes patched for /plan-eng-review test plans (idempotent)" >&2
fi
else
echo " [v1.40.0.0] INFO: marker not written; gstack-upgrade will retry once prerequisites are met." >&2
fi
# NEVER `git commit + push` from this migration. The user controls when the

View File

@ -1,12 +1,7 @@
---
name: guard
version: 0.1.0
description: |
Full safety mode: destructive command warnings + directory-scoped edits.
Combines /careful (warns before rm -rf, DROP TABLE, force-push, etc.) with
/freeze (blocks edits outside a specified directory). Use for maximum safety
when touching prod or debugging live systems. Use when asked to "guard mode",
"full safety", "lock it down", or "maximum safety". (gstack)
description: Full safety mode: destructive command warnings + directory-scoped edits. (gstack)
triggers:
- full safety mode
- guard against mistakes
@ -36,6 +31,14 @@ hooks:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Combines /careful (warns before rm -rf, DROP TABLE, force-push, etc.) with
/freeze (blocks edits outside a specified directory). Use for maximum safety
when touching prod or debugging live systems. Use when asked to "guard mode",
"full safety", "lock it down", or "maximum safety".
# /guard — Full Safety Mode
Activates both destructive command warnings and directory-scoped edit restrictions.

View File

@ -2,12 +2,7 @@
name: health
preamble-tier: 2
version: 1.0.0
description: |
Code quality dashboard. Wraps existing project tools (type checker, linter,
test runner, dead code detector, shell linter), computes a weighted composite
0-10 score, and tracks trends over time. Use when: "health check",
"code quality", "how healthy is the codebase", "run all checks",
"quality score". (gstack)
description: Code quality dashboard. (gstack)
triggers:
- code health check
- quality dashboard
@ -24,6 +19,15 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Wraps existing project tools (type checker, linter,
test runner, dead code detector, shell linter), computes a weighted composite
0-10 score, and tracks trends over time. Use when: "health check",
"code quality", "how healthy is the codebase", "run all checks",
"quality score".
## Preamble (run first)
```bash
@ -564,84 +568,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,14 +2,7 @@
name: investigate
preamble-tier: 2
version: 1.0.0
description: |
Systematic debugging with root cause investigation. Four phases: investigate,
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
Use when asked to "debug this", "fix this bug", "why is this broken",
"investigate this error", or "root cause analysis".
Proactively invoke this skill (do NOT debug directly) when the user reports
errors, 500 errors, stack traces, unexpected behavior, "it was working
yesterday", or is troubleshooting why something stopped working. (gstack)
description: Systematic debugging with root cause investigation. (gstack)
allowed-tools:
- Bash
- Read
@ -30,12 +23,12 @@ hooks:
- matcher: "Edit"
hooks:
- type: command
command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
command: 'bash -c ''S="${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"; [ -x "$S" ] || S="${CLAUDE_SKILL_DIR}/../gstack-freeze/bin/check-freeze.sh"; [ -x "$S" ] && bash "$S" || exit 0'''
statusMessage: "Checking debug scope boundary..."
- matcher: "Write"
hooks:
- type: command
command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
command: 'bash -c ''S="${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"; [ -x "$S" ] || S="${CLAUDE_SKILL_DIR}/../gstack-freeze/bin/check-freeze.sh"; [ -x "$S" ] && bash "$S" || exit 0'''
statusMessage: "Checking debug scope boundary..."
gbrain:
schema: 1
@ -63,6 +56,17 @@ gbrain:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Four phases: investigate,
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
Use when asked to "debug this", "fix this bug", "why is this broken",
"investigate this error", or "root cause analysis".
Proactively invoke this skill (do NOT debug directly) when the user reports
errors, 500 errors, stack traces, unexpected behavior, "it was working
yesterday", or is troubleshooting why something stopped working.
## Preamble (run first)
```bash
@ -603,84 +607,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake
@ -888,7 +815,9 @@ If any learnings come back, name which one applies to your investigation in one
After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep.
```bash
[ -x "${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
_FREEZE_SCRIPT="${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
[ -x "$_FREEZE_SCRIPT" ] || _FREEZE_SCRIPT="${CLAUDE_SKILL_DIR}/../gstack-freeze/bin/check-freeze.sh"
[ -x "$_FREEZE_SCRIPT" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
```
**If FREEZE_AVAILABLE:** Identify the narrowest directory containing the affected files. Write it to the freeze state file:

View File

@ -30,12 +30,12 @@ hooks:
- matcher: "Edit"
hooks:
- type: command
command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
command: 'bash -c ''S="${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"; [ -x "$S" ] || S="${CLAUDE_SKILL_DIR}/../gstack-freeze/bin/check-freeze.sh"; [ -x "$S" ] && bash "$S" || exit 0'''
statusMessage: "Checking debug scope boundary..."
- matcher: "Write"
hooks:
- type: command
command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
command: 'bash -c ''S="${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"; [ -x "$S" ] || S="${CLAUDE_SKILL_DIR}/../gstack-freeze/bin/check-freeze.sh"; [ -x "$S" ] && bash "$S" || exit 0'''
statusMessage: "Checking debug scope boundary..."
gbrain:
schema: 1
@ -118,7 +118,9 @@ If any learnings come back, name which one applies to your investigation in one
After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep.
```bash
[ -x "${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
_FREEZE_SCRIPT="${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
[ -x "$_FREEZE_SCRIPT" ] || _FREEZE_SCRIPT="${CLAUDE_SKILL_DIR}/../gstack-freeze/bin/check-freeze.sh"
[ -x "$_FREEZE_SCRIPT" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
```
**If FREEZE_AVAILABLE:** Identify the narrowest directory containing the affected files. Write it to the freeze state file:

View File

@ -2,15 +2,7 @@
name: ios-clean
preamble-tier: 3
version: 1.0.0
description: |
Remove the DebugBridge SPM package and all #if DEBUG wiring from an iOS
app. Cleans up StateServer, DebugOverlay, accessor codegen output, and
app-side hooks installed by /ios-qa. This is a convenience wrapper —
the structural Release-build guard (Package.swift conditional + CI
swift build -c release check) is the safety-critical path.
Use when asked to "clean the iOS debug bridge", "remove DebugBridge",
or "strip the gstack iOS instrumentation". (gstack)
Voice triggers (speech-to-text aliases): "clean the iOS debug bridge", "remove DebugBridge", "strip the gstack iOS instrumentation".
description: Remove the DebugBridge SPM package and all #if DEBUG wiring from an iOS app. (gstack)
allowed-tools:
- Bash
- Read
@ -26,6 +18,18 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Cleans up StateServer, DebugOverlay, accessor codegen output, and
app-side hooks installed by /ios-qa. This is a convenience wrapper —
the structural Release-build guard (Package.swift conditional + CI
swift build -c release check) is the safety-critical path.
Use when asked to "clean the iOS debug bridge", "remove DebugBridge",
or "strip the gstack iOS instrumentation".
Voice triggers (speech-to-text aliases): "clean the iOS debug bridge", "remove DebugBridge", "strip the gstack iOS instrumentation".
## Preamble (run first)
```bash
@ -566,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,17 +2,7 @@
name: ios-design-review
preamble-tier: 3
version: 1.0.0
description: |
Visual design audit for iOS apps on real hardware. Connects to a real
iPhone via the same StateServer as /ios-qa, screenshots every screen,
evaluates against Apple HIG, DESIGN.md, and design best practices. Scores
each dimension 0-10 with "what would make it a 10" framing — mirrors
/plan-design-review for browser. For plan-stage design review (before
implementation), use /plan-design-review. For live web visual audits, use
/design-review.
Use when asked to "review the iOS design", "audit the iPhone app's
visuals", or "design QA the iOS app". (gstack)
Voice triggers (speech-to-text aliases): "review the iOS design", "audit the iPhone app's visuals", "design QA the iPhone app".
description: Visual design audit for iOS apps on real hardware. (gstack)
allowed-tools:
- Bash
- Read
@ -27,6 +17,21 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Connects to a real
iPhone via the same StateServer as /ios-qa, screenshots every screen,
evaluates against Apple HIG, DESIGN.md, and design best practices. Scores
each dimension 0-10 with "what would make it a 10" framing — mirrors
/plan-design-review for browser. For plan-stage design review (before
implementation), use /plan-design-review. For live web visual audits, use
/design-review.
Use when asked to "review the iOS design", "audit the iPhone app's
visuals", or "design QA the iOS app".
Voice triggers (speech-to-text aliases): "review the iOS design", "audit the iPhone app's visuals", "design QA the iPhone app".
## Preamble (run first)
```bash
@ -567,84 +572,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,16 +2,7 @@
name: ios-fix
preamble-tier: 3
version: 1.0.0
description: |
Autonomous iOS bug fixer. Takes a bug found by /ios-qa, reads the source,
writes the fix, rebuilds, redeploys, and verifies the fix on the real
device. Closes the loop: find bug → fix bug → confirm fix — zero human
intervention. Captures the pre-bug state snapshot as a regression test
fixture, so the bug can never recur silently.
Use when /ios-qa reports a bug and you want it fixed automatically, or
when asked to "fix this iOS bug", "patch the iPhone app", or "auto-fix
the iOS issue". (gstack)
Voice triggers (speech-to-text aliases): "fix the iOS bug", "patch the iPhone app", "auto-fix the iOS issue".
description: Autonomous iOS bug fixer. (gstack)
allowed-tools:
- Bash
- Read
@ -28,6 +19,20 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Takes a bug found by /ios-qa, reads the source,
writes the fix, rebuilds, redeploys, and verifies the fix on the real
device. Closes the loop: find bug → fix bug → confirm fix — zero human
intervention. Captures the pre-bug state snapshot as a regression test
fixture, so the bug can never recur silently.
Use when /ios-qa reports a bug and you want it fixed automatically, or
when asked to "fix this iOS bug", "patch the iPhone app", or "auto-fix
the iOS issue".
Voice triggers (speech-to-text aliases): "fix the iOS bug", "patch the iPhone app", "auto-fix the iOS issue".
## Preamble (run first)
```bash
@ -568,84 +573,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,17 +2,7 @@
name: ios-qa
preamble-tier: 3
version: 1.0.0
description: |
Live-device iOS QA for SwiftUI apps. Connects to a real iPhone via USB
CoreDevice IPv6 tunnel, reads Swift source to understand every screen, then
runs a vision-driven agent loop: screenshot → analyze → decide → act →
verify → repeat. All interaction happens via HTTP to an embedded
StateServer in the app under test. Optionally exposes the device over
Tailscale so remote agents (OpenClaw, Codex, any HTTP-capable agent) can
run iOS QA from anywhere without touching the hardware.
Use when asked to "ios qa", "test my iPhone app", "find bugs on the device",
or "qa the iOS app". (gstack)
Voice triggers (speech-to-text aliases): "iOS quality check", "test the iPhone app", "run iOS QA".
description: Live-device iOS QA for SwiftUI apps. (gstack)
allowed-tools:
- Bash
- Read
@ -31,6 +21,21 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Connects to a real iPhone via USB
CoreDevice IPv6 tunnel, reads Swift source to understand every screen, then
runs a vision-driven agent loop: screenshot → analyze → decide → act →
verify → repeat. All interaction happens via HTTP to an embedded
StateServer in the app under test. Optionally exposes the device over
Tailscale so remote agents (OpenClaw, Codex, any HTTP-capable agent) can
run iOS QA from anywhere without touching the hardware.
Use when asked to "ios qa", "test my iPhone app", "find bugs on the device",
or "qa the iOS app".
Voice triggers (speech-to-text aliases): "iOS quality check", "test the iPhone app", "run iOS QA".
## Preamble (run first)
```bash
@ -571,84 +576,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -27,7 +27,32 @@ export interface ResolveImpl {
const defaultSpawn: SpawnImpl = (cmd, args) => spawnSync(cmd, args, { stdio: 'pipe', timeout: 60_000 });
/**
* Default resolver. Uses `dns.lookup` (getaddrinfo, goes through mDNSResponder
* on macOS) instead of `dns.resolve6` (libresolv, does NOT consult mDNS on
* recent macOS returns ESERVFAIL for `*.coredevice.local`).
*
* Prefer the IPv6 record but fall back to whatever getaddrinfo returns.
*/
const defaultResolve: ResolveImpl = async (hostname) => {
const dns = await import('dns');
return new Promise((resolve, reject) => {
dns.lookup(hostname, { family: 6, all: true }, (err, addrs) => {
if (err) { reject(err); return; }
const ipv6 = (addrs ?? []).filter((a) => a.family === 6).map((a) => a.address);
if (ipv6.length === 0) { reject(new Error(`no IPv6 records for ${hostname}`)); return; }
resolve(ipv6);
});
});
};
/**
* Last-resort resolver using `dns.resolve6`. Kept for backwards compatibility
* and for environments where mDNSResponder is not in the resolver chain. On
* macOS 26.x (Darwin 25.x) this typically fails with ESERVFAIL see comment
* on `defaultResolve` above.
*/
const legacyResolve6: ResolveImpl = async (hostname) => {
const dns = await import('dns');
return new Promise((resolve, reject) => {
dns.resolve6(hostname, (err, addrs) => {
@ -69,6 +94,89 @@ export function listDevices(spawn: SpawnImpl = defaultSpawn): DeviceEntry[] {
}
}
/**
* Resolve the CoreDevice tunnel's IPv6 address from `devicectl device info
* details --json-output`. This is the most reliable path on macOS 26.x: the
* tunnel IPv6 lives in `result.connectionProperties.tunnelIPAddress` and is
* authoritative (it's what CoreDevice itself uses to route).
*
* A side effect of running `devicectl device info details` is that it forces
* CoreDevice to bring up / refresh the tunnel session, which is why we prefer
* this over mDNS even on machines where mDNS works.
*
* Returns null when the device isn't found, isn't tunneled, or devicectl
* fails callers should fall through to mDNS resolution.
*/
export function getDeviceTunnelIPv6FromDevicectl(
udid: string,
spawn: SpawnImpl = defaultSpawn,
): string | null {
const tmp = join(tmpdir(), `devicectl-details-${process.pid}-${Date.now()}.json`);
try {
const r = spawn('xcrun', ['devicectl', 'device', 'info', 'details', '--device', udid, '--json-output', tmp]);
if (r.status !== 0) return null;
const raw = readFileSync(tmp, 'utf-8');
const obj = JSON.parse(raw);
// `result.connectionProperties.tunnelIPAddress` is the canonical location.
// Some Xcode/CoreDevice versions also surface it under `result.tunnel.ipAddress`
// — accept either.
const conn = obj?.result?.connectionProperties as Record<string, unknown> | undefined;
const tunnel = obj?.result?.tunnel as Record<string, unknown> | undefined;
const addr = (conn?.tunnelIPAddress ?? tunnel?.ipAddress) as string | undefined;
if (typeof addr === 'string' && addr.includes(':')) return addr;
return null;
} catch {
return null;
} finally {
try { rmSync(tmp, { force: true }); } catch { /* ignore */ }
}
}
/**
* Start a periodic devicectl `info details` poll that keeps the CoreDevice
* tunnel session alive. Xcode 26's CoreDevice only holds the tunnel up while
* a devicectl command is in-flight or Xcode itself is debugging. Without
* something poking it, the tunnel IPv6 becomes unroutable within seconds
* `curl` to the address times out even though the address looks valid.
*
* Implementation note: we chose `device info details` (cheap, ~10ms of CPU
* per tick, no persistent child process) over `device console` (which would
* keep the tunnel up continuously but spams stdout, can wedge on backpressure,
* and is harder to kill cleanly). The 5-second interval is comfortably under
* the empirically-observed tunnel teardown timeout (~10-15s of idle).
*
* Returns a `stop()` function that cancels the timer. Safe to call multiple
* times.
*/
export function startTunnelKeepalive(
udid: string,
opts: { intervalMs?: number; spawn?: SpawnImpl } = {},
): { stop: () => void } {
const intervalMs = opts.intervalMs ?? 5_000;
const spawn = opts.spawn ?? defaultSpawn;
let stopped = false;
const tick = () => {
if (stopped) return;
// Fire-and-forget: ignore result, the side-effect of the spawn is what
// keeps the tunnel up. We deliberately do not use the JSON output here.
try {
const tmp = join(tmpdir(), `devicectl-keepalive-${process.pid}-${Date.now()}.json`);
spawn('xcrun', ['devicectl', 'device', 'info', 'details', '--device', udid, '--json-output', tmp]);
try { rmSync(tmp, { force: true }); } catch { /* ignore */ }
} catch { /* ignore — next tick will retry */ }
};
const handle = setInterval(tick, intervalMs);
// Don't keep the event loop alive just for this — daemon owns the lifecycle.
if (typeof handle.unref === 'function') handle.unref();
return {
stop: () => {
if (stopped) return;
stopped = true;
clearInterval(handle);
},
};
}
/**
* Resolve the CoreDevice tunnel's IPv6 address for a device. The hostname is
* derived from the device name as printed by `devicectl list devices`. The
@ -95,6 +203,43 @@ export async function getDeviceTunnelIPv6(
}
}
/**
* Resolve a device's tunnel IPv6 using every strategy we know, in order of
* decreasing reliability:
*
* 1. `devicectl device info details --json-output` (most reliable on
* macOS 26.x; also has the useful side-effect of bumping the tunnel).
* 2. mDNS via `dns.lookup` (getaddrinfo path does consult mDNSResponder
* on macOS, unlike `dns.resolve6`).
* 3. mDNS via `dns.resolve6` (legacy path kept for backwards
* compatibility; will ESERVFAIL on recent macOS).
*
* Returns the first address that any strategy yields, or null.
*/
export async function resolveTunnelIPv6(opts: {
udid: string;
deviceName: string;
spawn?: SpawnImpl;
resolve?: ResolveImpl;
legacyResolve?: ResolveImpl;
}): Promise<string | null> {
const spawn = opts.spawn ?? defaultSpawn;
const resolveLookup = opts.resolve ?? defaultResolve;
const resolveLegacy = opts.legacyResolve ?? legacyResolve6;
// 1. devicectl-based
const fromDevicectl = getDeviceTunnelIPv6FromDevicectl(opts.udid, spawn);
if (fromDevicectl) return fromDevicectl;
// 2. mDNS via dns.lookup
const fromLookup = await getDeviceTunnelIPv6(opts.deviceName, resolveLookup);
if (fromLookup) return fromLookup;
// 3. last-resort: legacy dns.resolve6
const fromLegacy = await getDeviceTunnelIPv6(opts.deviceName, resolveLegacy);
return fromLegacy;
}
/**
* Check whether a specific bundle ID has a running process on the device.
*/

View File

@ -21,6 +21,7 @@ import { mintForCaller } from './auth-mint';
import { classifyRoute, proxyToDevice, type DeviceTunnel } from './proxy';
import { writeAudit, writeAttempt, sanitizeReplacer } from './audit';
import { bootstrapTunnel } from './tunnel-bootstrap';
import { startTunnelKeepalive } from './devicectl';
import type { Capability } from './types';
interface DaemonOptions {
@ -402,6 +403,12 @@ if (import.meta.main) {
// Default tunnelProvider: when GSTACK_IOS_TARGET_UDID (or a default with
// any connected paired device) is set, bootstrap a real CoreDevice tunnel.
// Otherwise return null (proxy will return 503 device_not_connected).
//
// After a successful bootstrap we spawn a periodic devicectl `info details`
// call to keep the CoreDevice tunnel session alive — Xcode 26's CoreDevice
// only holds the tunnel up while a devicectl command is in-flight, so
// without a poke every few seconds the IPv6 becomes unroutable.
let keepalive: { stop: () => void } | null = null;
const realTunnelProvider = async () => {
const result = await bootstrapTunnel({
udid: targetUDID,
@ -411,9 +418,18 @@ if (import.meta.main) {
process.stderr.write(`bootstrap error: ${result.error}${result.detail ? ' — ' + result.detail : ''}\n`);
return null;
}
if (keepalive) keepalive.stop();
keepalive = startTunnelKeepalive(result.tunnel.udid);
return result.tunnel;
};
const shutdown = () => {
if (keepalive) { keepalive.stop(); keepalive = null; }
};
process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);
process.on('exit', shutdown);
startDaemon({
loopbackPort: port,
tailnetEnabled: tailnet,

View File

@ -17,7 +17,7 @@ import { randomBytes } from 'crypto';
import type { DeviceTunnel } from './proxy';
import {
listDevices,
getDeviceTunnelIPv6,
resolveTunnelIPv6,
isAppRunning,
launchApp,
copyFileFromAppContainer,
@ -97,8 +97,21 @@ export async function bootstrapTunnel(opts: BootstrapOptions): Promise<Bootstrap
}
}
// Step 3: resolve tunnel IPv6
const ipv6 = await getDeviceTunnelIPv6(target.name, resolve);
// Step 3: resolve tunnel IPv6. Try devicectl `info details` first (most
// reliable on macOS 26.x), fall through to mDNS via dns.lookup, then
// dns.resolve6 as a last-ditch fallback. See devicectl.ts:resolveTunnelIPv6
// for the rationale.
// When tests inject `resolve`, use it for both the mDNS-lookup path AND the
// legacy resolve6 path — otherwise the legacy path would make a real DNS
// call. In production, only `resolve` is set (to the dns.lookup-based
// default) and the legacy path uses the real dns.resolve6.
const ipv6 = await resolveTunnelIPv6({
udid: target.identifier,
deviceName: target.name,
spawn,
resolve,
legacyResolve: resolve,
});
if (!ipv6) {
return { ok: false, error: 'resolve_failed', detail: target.name };
}

View File

@ -4,7 +4,12 @@
import { describe, test, expect } from 'bun:test';
import { bootstrapTunnel } from '../src/tunnel-bootstrap';
import type { SpawnImpl } from '../src/devicectl';
import {
getDeviceTunnelIPv6FromDevicectl,
resolveTunnelIPv6,
startTunnelKeepalive,
type SpawnImpl,
} from '../src/devicectl';
import { writeFileSync } from 'fs';
interface ScriptedCall {
@ -142,6 +147,12 @@ describe('bootstrapTunnel', () => {
jsonOutput: { result: { runningProcesses: [{ executable: 'file:///private/var/containers/Bundle/Application/.../com.test.app/com.test', processIdentifier: 1234 }] } },
stdout: 'com.test',
},
{
// devicectl device info details (devicectl-based IPv6 resolution).
// Return no tunnelIPAddress so we fall through to the injected resolver.
argsMatch: /devicectl device info details/,
jsonOutput: { result: { connectionProperties: {} } },
},
]);
const r = await bootstrapTunnel({
bundleId: 'com.test',
@ -173,6 +184,12 @@ describe('bootstrapTunnel', () => {
jsonOutput: { result: { runningProcesses: [{ executable: 'file:///var/containers/Bundle/Application/X/com.test.app/com.test', processIdentifier: 5678 }] } },
stdout: '/com.test.app/',
},
{
// devicectl-based IPv6 resolution succeeds — returns the tunnel
// address directly, so the injected resolveImpl is never called.
argsMatch: /devicectl device info details/,
jsonOutput: { result: { connectionProperties: { tunnelIPAddress: 'fd99::beef' } } },
},
{
argsMatch: /devicectl device copy from/,
destOutput: 'BOOT-TOKEN-XYZ-123\n',
@ -233,6 +250,11 @@ describe('bootstrapTunnel', () => {
// jsonOutput body contains the bundle id path, so isAppRunning() returns true.
jsonOutput: { result: { runningProcesses: [{ executable: 'file:///var/containers/Bundle/Application/X/com.test.app/com.test' }] } },
},
{
// devicectl device info details returns no tunnel address.
argsMatch: /devicectl device info details/,
jsonOutput: { result: { connectionProperties: {} } },
},
]);
const r = await bootstrapTunnel({
bundleId: 'com.test',
@ -258,6 +280,10 @@ describe('bootstrapTunnel', () => {
argsMatch: /devicectl device info processes -d B/,
jsonOutput: { result: { runningProcesses: [{ executable: 'file:///var/containers/Bundle/Application/X/com.test.app/com.test' }] } },
},
{
argsMatch: /devicectl device info details --device B/,
jsonOutput: { result: { connectionProperties: { tunnelIPAddress: 'fd00::b' } } },
},
{
argsMatch: /devicectl device copy from --device B/,
destOutput: 'TOKEN\n',
@ -274,3 +300,132 @@ describe('bootstrapTunnel', () => {
if (r.ok) expect(r.tunnel.udid).toBe('B');
});
});
describe('getDeviceTunnelIPv6FromDevicectl', () => {
test('extracts tunnelIPAddress from connectionProperties', () => {
const spawn = makeSpawn([
{
argsMatch: /devicectl device info details --device TEST-UDID/,
jsonOutput: { result: { connectionProperties: { tunnelIPAddress: 'fde4:2827:528e::1' } } },
},
]);
expect(getDeviceTunnelIPv6FromDevicectl('TEST-UDID', spawn)).toBe('fde4:2827:528e::1');
});
test('falls back to result.tunnel.ipAddress when connectionProperties absent', () => {
const spawn = makeSpawn([
{
argsMatch: /devicectl device info details/,
jsonOutput: { result: { tunnel: { ipAddress: 'fd00::dead:beef' } } },
},
]);
expect(getDeviceTunnelIPv6FromDevicectl('UDID', spawn)).toBe('fd00::dead:beef');
});
test('returns null when devicectl exits non-zero', () => {
const spawn = makeSpawn([
{ argsMatch: /devicectl device info details/, exitCode: 1, stderr: 'no such device' },
]);
expect(getDeviceTunnelIPv6FromDevicectl('UDID', spawn)).toBeNull();
});
test('returns null when tunnelIPAddress missing or non-string', () => {
const spawn = makeSpawn([
{ argsMatch: /devicectl device info details/, jsonOutput: { result: { connectionProperties: {} } } },
]);
expect(getDeviceTunnelIPv6FromDevicectl('UDID', spawn)).toBeNull();
});
});
describe('resolveTunnelIPv6 fallback chain', () => {
test('prefers devicectl-based resolution', async () => {
const spawn = makeSpawn([
{
argsMatch: /devicectl device info details/,
jsonOutput: { result: { connectionProperties: { tunnelIPAddress: 'fd11::1' } } },
},
]);
let resolveCalled = false;
const addr = await resolveTunnelIPv6({
udid: 'U',
deviceName: 'Test',
spawn,
resolve: async () => { resolveCalled = true; return ['fd99::99']; },
legacyResolve: async () => { resolveCalled = true; return ['fdAA::AA']; },
});
expect(addr).toBe('fd11::1');
expect(resolveCalled).toBe(false);
});
test('falls through to dns.lookup when devicectl yields no address', async () => {
const spawn = makeSpawn([
{ argsMatch: /devicectl device info details/, jsonOutput: { result: { connectionProperties: {} } } },
]);
let legacyCalled = false;
const addr = await resolveTunnelIPv6({
udid: 'U',
deviceName: 'Test',
spawn,
resolve: async () => ['fd22::2'],
legacyResolve: async () => { legacyCalled = true; return ['fdAA::AA']; },
});
expect(addr).toBe('fd22::2');
expect(legacyCalled).toBe(false);
});
test('falls through to legacy resolve6 when both devicectl and dns.lookup fail', async () => {
const spawn = makeSpawn([
{ argsMatch: /devicectl device info details/, exitCode: 1 },
]);
const addr = await resolveTunnelIPv6({
udid: 'U',
deviceName: 'Test',
spawn,
resolve: async () => { throw new Error('ESERVFAIL'); },
legacyResolve: async () => ['fd33::3'],
});
expect(addr).toBe('fd33::3');
});
test('returns null when all three strategies fail', async () => {
const spawn = makeSpawn([
{ argsMatch: /devicectl device info details/, exitCode: 1 },
]);
const addr = await resolveTunnelIPv6({
udid: 'U',
deviceName: 'Test',
spawn,
resolve: async () => { throw new Error('ESERVFAIL'); },
legacyResolve: async () => { throw new Error('ESERVFAIL'); },
});
expect(addr).toBeNull();
});
});
describe('startTunnelKeepalive', () => {
test('invokes devicectl on each interval tick', async () => {
const calls: string[] = [];
const spawn: SpawnImpl = ((cmd: string, args: string[]) => {
calls.push(`${cmd} ${args.slice(0, 4).join(' ')}`);
return makeReturn(0, '{}', '');
}) as SpawnImpl;
const ka = startTunnelKeepalive('UDID-X', { intervalMs: 20, spawn });
await new Promise((res) => setTimeout(res, 75));
ka.stop();
const before = calls.length;
// After stop, no more calls.
await new Promise((res) => setTimeout(res, 50));
expect(calls.length).toBe(before);
expect(before).toBeGreaterThanOrEqual(2);
expect(calls[0]).toContain('devicectl');
expect(calls[0]).toContain('device info details');
});
test('stop() is idempotent', () => {
const spawn: SpawnImpl = (() => makeReturn(0, '', '')) as SpawnImpl;
const ka = startTunnelKeepalive('U', { intervalMs: 1_000, spawn });
ka.stop();
ka.stop();
// no throw
});
});

View File

@ -2,14 +2,7 @@
name: ios-sync
preamble-tier: 3
version: 1.0.0
description: |
Regenerate the iOS debug bridge against the latest upstream gstack
templates. Updates StateServer.swift, DebugOverlay.swift, Package.swift,
and the typed @Observable state accessors. Use after you upgrade gstack
or add new ViewModels/properties that need accessor coverage.
Use when asked to "resync the iOS debug bridge", "regenerate iOS
accessors", or "update the gstack iOS instrumentation". (gstack)
Voice triggers (speech-to-text aliases): "resync the iOS debug bridge", "regenerate iOS accessors", "update the gstack iOS instrumentation".
description: Regenerate the iOS debug bridge against the latest upstream gstack templates. (gstack)
allowed-tools:
- Bash
- Read
@ -26,6 +19,17 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Updates StateServer.swift, DebugOverlay.swift, Package.swift,
and the typed @Observable state accessors. Use after you upgrade gstack
or add new ViewModels/properties that need accessor coverage.
Use when asked to "resync the iOS debug bridge", "regenerate iOS
accessors", or "update the gstack iOS instrumentation".
Voice triggers (speech-to-text aliases): "resync the iOS debug bridge", "regenerate iOS accessors", "update the gstack iOS instrumentation".
## Preamble (run first)
```bash
@ -566,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,11 +2,7 @@
name: land-and-deploy
preamble-tier: 4
version: 1.0.0
description: |
Land and deploy workflow. Merges the PR, waits for CI and deploy,
verifies production health via canary checks. Takes over after /ship
creates the PR. Use when: "merge", "land", "deploy", "merge and verify",
"land it", "ship it to production". (gstack)
description: Land and deploy workflow. (gstack)
allowed-tools:
- Bash
- Read
@ -21,6 +17,14 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Merges the PR, waits for CI and deploy,
verifies production health via canary checks. Takes over after /ship
creates the PR. Use when: "merge", "land", "deploy", "merge and verify",
"land it", "ship it to production".
## Preamble (run first)
```bash
@ -561,84 +565,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,12 +1,7 @@
---
name: landing-report
version: 0.1.0
description: |
Read-only queue dashboard for workspace-aware ship. Shows which VERSION slots
are currently claimed by open PRs, which sibling Conductor workspaces have
WIP work likely to ship soon, and what slot /ship would pick next. No
mutations — just a snapshot. Use when asked to "landing report", "what's in
the queue", "show me open PRs", or "which version do I claim next". (gstack)
description: Read-only queue dashboard for workspace-aware ship. (gstack)
triggers:
- landing report
- version queue
@ -20,6 +15,15 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Shows which VERSION slots
are currently claimed by open PRs, which sibling Conductor workspaces have
WIP work likely to ship soon, and what slot /ship would pick next. No
mutations — just a snapshot. Use when asked to "landing report", "what's in
the queue", "show me open PRs", or "which version do I claim next".
# /landing-report — Version Queue Dashboard
## Preamble (run first)
@ -562,84 +566,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,12 +2,7 @@
name: learn
preamble-tier: 2
version: 1.0.0
description: |
Manage project learnings. Review, search, prune, and export what gstack
has learned across sessions. Use when asked to "what have we learned",
"show learnings", "prune stale learnings", or "export learnings".
Proactively suggest when the user asks about past patterns or wonders
"didn't we fix this before?"
description: Manage project learnings.
triggers:
- show learnings
- what have we learned
@ -24,6 +19,15 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Review, search, prune, and export what gstack
has learned across sessions. Use when asked to "what have we learned",
"show learnings", "prune stale learnings", or "export learnings".
Proactively suggest when the user asks about past patterns or wonders
"didn't we fix this before?"
## Preamble (run first)
```bash
@ -564,84 +568,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,13 +2,7 @@
name: make-pdf
preamble-tier: 1
version: 1.0.0
description: |
Turn any markdown file into a publication-quality PDF. Proper 1in margins,
intelligent page breaks, page numbers, cover pages, running headers, curly
quotes and em dashes, clickable TOC, diagonal DRAFT watermark. Not a draft
artifact — a finished artifact. Use when asked to "make a PDF", "export to
PDF", "turn this markdown into a PDF", or "generate a document". (gstack)
Voice triggers (speech-to-text aliases): "make this a pdf", "make it a pdf", "export to pdf", "turn this into a pdf", "turn this markdown into a pdf", "generate a pdf", "make a pdf from", "pdf this markdown".
description: Turn any markdown file into a publication-quality PDF. (gstack)
triggers:
- markdown to pdf
- generate pdf
@ -22,6 +16,17 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Proper 1in margins,
intelligent page breaks, page numbers, cover pages, running headers, curly
quotes and em dashes, clickable TOC, diagonal DRAFT watermark. Not a draft
artifact — a finished artifact. Use when asked to "make a PDF", "export to
PDF", "turn this markdown into a PDF", or "generate a document".
Voice triggers (speech-to-text aliases): "make this a pdf", "make it a pdf", "export to pdf", "turn this into a pdf", "turn this markdown into a pdf", "generate a pdf", "make a pdf from", "pdf this markdown".
## Preamble (run first)
```bash

View File

@ -2,18 +2,7 @@
name: office-hours
preamble-tier: 3
version: 2.0.0
description: |
YC Office Hours — two modes. Startup mode: six forcing questions that expose
demand reality, status quo, desperate specificity, narrowest wedge, observation,
and future-fit. Builder mode: design thinking brainstorming for side projects,
hackathons, learning, and open source. Saves a design doc.
Use when asked to "brainstorm this", "I have an idea", "help me think through
this", "office hours", or "is this worth building".
Proactively invoke this skill (do NOT answer directly) when the user describes
a new product idea, asks whether something is worth building, wants to think
through design decisions for something that doesn't exist yet, or is exploring
a concept before any code is written.
Use before /plan-ceo-review or /plan-eng-review. (gstack)
description: YC Office Hours — two modes. (gstack)
allowed-tools:
- Bash
- Read
@ -59,6 +48,21 @@ gbrain:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Startup mode: six forcing questions that expose
demand reality, status quo, desperate specificity, narrowest wedge, observation,
and future-fit. Builder mode: design thinking brainstorming for side projects,
hackathons, learning, and open source. Saves a design doc.
Use when asked to "brainstorm this", "I have an idea", "help me think through
this", "office hours", or "is this worth building".
Proactively invoke this skill (do NOT answer directly) when the user describes
a new product idea, asks whether something is worth building, wants to think
through design decisions for something that doesn't exist yet, or is exploring
a concept before any code is written.
Use before /plan-ceo-review or /plan-eng-review.
## Preamble (run first)
```bash
@ -599,84 +603,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake
@ -1428,8 +1355,11 @@ If the JSON contains `"regenerated": true`:
1. Read `regenerateAction` (or `remixSpec` for remix requests)
2. Generate new variants with `$D iterate` or `$D variants` using updated brief
3. Create new board with `$D compare`
4. POST the new HTML to the running server via `curl -X POST http://localhost:PORT/api/reload -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
(parse the port from stderr: look for `SERVE_STARTED: port=XXXXX`)
4. POST the new HTML to the running board. Parse the board URL from stderr
(`BOARD_URL: http://127.0.0.1:N/boards/<id>/` — the daemon path) or fall
back to the legacy port (`SERVE_STARTED: port=N` — only emitted under
`--no-daemon`, hits `/api/reload` root). Daemon path:
`curl -X POST "${BOARD_URL}api/reload" -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
5. Board auto-refreshes in the same tab
If `"regenerated": false`: proceed with the approved variant.
@ -1551,12 +1481,9 @@ Count the signals. You'll use this count in Phase 6 to determine which tier of c
### Builder Profile Append
After counting signals, append a session entry to the builder profile. This is the single
source of truth for all closing state (tier, resource dedup, journey tracking).
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
mkdir -p "$GSTACK_STATE_ROOT"
```
source of truth for all closing state (tier, resource dedup, journey tracking). The
`gstack-developer-profile --log-session` binary handles its own directory creation
and writes via atomic mktemp+mv to `~/.gstack/developer-profile.json`.
Append one JSON line with these fields (substitute actual values from this session):
- `date`: current ISO 8601 timestamp
@ -1570,12 +1497,12 @@ Append one JSON line with these fields (substitute actual values from this sessi
- `topics`: array of 2-3 topic keywords that describe what this session was about
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
echo '{"date":"TIMESTAMP","mode":"MODE","project_slug":"SLUG","signal_count":N,"signals":SIGNALS_ARRAY,"design_doc":"DOC_PATH","assignment":"ASSIGNMENT_TEXT","resources_shown":[],"topics":TOPICS_ARRAY}' >> "$GSTACK_STATE_ROOT/builder-profile.jsonl"
~/.claude/skills/gstack/bin/gstack-developer-profile --log-session '{"date":"TIMESTAMP","mode":"MODE","project_slug":"SLUG","signal_count":N,"signals":SIGNALS_ARRAY,"design_doc":"DOC_PATH","assignment":"ASSIGNMENT_TEXT","resources_shown":[],"topics":TOPICS_ARRAY}' 2>/dev/null || true
```
This entry is append-only. The `resources_shown` field will be updated via a second append
after resource selection in Phase 6 Beat 3.5.
The session entry is appended to `developer-profile.json`'s `sessions[]` array. A second
session entry with `mode: "resources"` is appended via `--log-session` after resource
selection in Phase 6 Beat 3.5.
---
@ -2032,8 +1959,8 @@ PAUL GRAHAM ESSAYS:
1. Log the selected resource URLs to the builder profile (single source of truth).
Append a resource-tracking entry:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
echo '{"date":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","mode":"resources","project_slug":"'"${SLUG:-unknown}"'","signal_count":0,"signals":[],"design_doc":"","assignment":"","resources_shown":["URL1","URL2","URL3"],"topics":[]}' >> "$GSTACK_STATE_ROOT/builder-profile.jsonl"
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || true)"
~/.claude/skills/gstack/bin/gstack-developer-profile --log-session '{"date":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","mode":"resources","project_slug":"'"${SLUG:-unknown}"'","signal_count":0,"signals":[],"design_doc":"","assignment":"","resources_shown":["URL1","URL2","URL3"],"topics":[]}' 2>/dev/null || true
```
2. Log the selection to analytics:

View File

@ -471,12 +471,9 @@ Count the signals. You'll use this count in Phase 6 to determine which tier of c
### Builder Profile Append
After counting signals, append a session entry to the builder profile. This is the single
source of truth for all closing state (tier, resource dedup, journey tracking).
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
mkdir -p "$GSTACK_STATE_ROOT"
```
source of truth for all closing state (tier, resource dedup, journey tracking). The
`gstack-developer-profile --log-session` binary handles its own directory creation
and writes via atomic mktemp+mv to `~/.gstack/developer-profile.json`.
Append one JSON line with these fields (substitute actual values from this session):
- `date`: current ISO 8601 timestamp
@ -490,12 +487,12 @@ Append one JSON line with these fields (substitute actual values from this sessi
- `topics`: array of 2-3 topic keywords that describe what this session was about
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
echo '{"date":"TIMESTAMP","mode":"MODE","project_slug":"SLUG","signal_count":N,"signals":SIGNALS_ARRAY,"design_doc":"DOC_PATH","assignment":"ASSIGNMENT_TEXT","resources_shown":[],"topics":TOPICS_ARRAY}' >> "$GSTACK_STATE_ROOT/builder-profile.jsonl"
~/.claude/skills/gstack/bin/gstack-developer-profile --log-session '{"date":"TIMESTAMP","mode":"MODE","project_slug":"SLUG","signal_count":N,"signals":SIGNALS_ARRAY,"design_doc":"DOC_PATH","assignment":"ASSIGNMENT_TEXT","resources_shown":[],"topics":TOPICS_ARRAY}' 2>/dev/null || true
```
This entry is append-only. The `resources_shown` field will be updated via a second append
after resource selection in Phase 6 Beat 3.5.
The session entry is appended to `developer-profile.json`'s `sessions[]` array. A second
session entry with `mode: "resources"` is appended via `--log-session` after resource
selection in Phase 6 Beat 3.5.
---
@ -892,8 +889,8 @@ PAUL GRAHAM ESSAYS:
1. Log the selected resource URLs to the builder profile (single source of truth).
Append a resource-tracking entry:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
echo '{"date":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","mode":"resources","project_slug":"'"${SLUG:-unknown}"'","signal_count":0,"signals":[],"design_doc":"","assignment":"","resources_shown":["URL1","URL2","URL3"],"topics":[]}' >> "$GSTACK_STATE_ROOT/builder-profile.jsonl"
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || true)"
~/.claude/skills/gstack/bin/gstack-developer-profile --log-session '{"date":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","mode":"resources","project_slug":"'"${SLUG:-unknown}"'","signal_count":0,"signals":[],"design_doc":"","assignment":"","resources_shown":["URL1","URL2","URL3"],"topics":[]}' 2>/dev/null || true
```
2. Log the selection to analytics:

View File

@ -1,13 +1,7 @@
---
name: open-gstack-browser
version: 0.2.0
description: |
Launch GStack Browser — AI-controlled Chromium with the sidebar extension baked in.
Opens a visible browser window where you can watch every action in real time.
The sidebar shows a live activity feed and chat. Anti-bot stealth built in.
Use when asked to "open gstack browser", "launch browser", "connect chrome",
"open chrome", "real browser", "launch chrome", "side panel", or "control my browser".
Voice triggers (speech-to-text aliases): "show me the browser".
description: Launch GStack Browser — AI-controlled Chromium with the sidebar extension baked in.
triggers:
- open gstack browser
- launch chromium
@ -21,6 +15,16 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Opens a visible browser window where you can watch every action in real time.
The sidebar shows a live activity feed and chat. Anti-bot stealth built in.
Use when asked to "open gstack browser", "launch browser", "connect chrome",
"open chrome", "real browser", "launch chrome", "side panel", or "control my browser".
Voice triggers (speech-to-text aliases): "show me the browser".
## Preamble (run first)
```bash
@ -561,84 +565,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,6 +1,6 @@
{
"name": "gstack",
"version": "1.44.0.0",
"version": "1.46.0.0",
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
"license": "MIT",
"type": "module",

View File

@ -1,14 +1,7 @@
---
name: pair-agent
version: 0.1.0
description: |
Pair a remote AI agent with your browser. One command generates a setup key and
prints instructions the other agent can follow to connect. Works with OpenClaw,
Hermes, Codex, Cursor, or any agent that can make HTTP requests. The remote agent
gets its own tab with scoped access (read+write by default, admin on request).
Use when asked to "pair agent", "connect agent", "share browser", "remote browser",
"let another agent use my browser", or "give browser access". (gstack)
Voice triggers (speech-to-text aliases): "pair agent", "connect agent", "share my browser", "remote browser access".
description: Pair a remote AI agent with your browser. (gstack)
triggers:
- pair with agent
- connect remote agent
@ -22,6 +15,18 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
One command generates a setup key and
prints instructions the other agent can follow to connect. Works with OpenClaw,
Hermes, Codex, Cursor, or any agent that can make HTTP requests. The remote agent
gets its own tab with scoped access (read+write by default, admin on request).
Use when asked to "pair agent", "connect agent", "share browser", "remote browser",
"let another agent use my browser", or "give browser access".
Voice triggers (speech-to-text aliases): "pair agent", "connect agent", "share my browser", "remote browser access".
## Preamble (run first)
```bash
@ -562,84 +567,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -3,15 +3,7 @@ name: plan-ceo-review
preamble-tier: 3
interactive: true
version: 1.0.0
description: |
CEO/founder-mode plan review. Rethink the problem, find the 10-star product,
challenge premises, expand scope when it creates a better product. Four modes:
SCOPE EXPANSION (dream big), SELECTIVE EXPANSION (hold scope + cherry-pick
expansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials).
Use when asked to "think bigger", "expand scope", "strategy review", "rethink this",
or "is this ambitious enough".
Proactively suggest when the user is questioning scope or ambition of a plan,
or when the plan feels like it could be thinking bigger. (gstack)
description: CEO/founder-mode plan review. (gstack)
benefits-from: [office-hours]
allowed-tools:
- Read
@ -53,6 +45,18 @@ gbrain:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Rethink the problem, find the 10-star product,
challenge premises, expand scope when it creates a better product. Four modes:
SCOPE EXPANSION (dream big), SELECTIVE EXPANSION (hold scope + cherry-pick
expansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials).
Use when asked to "think bigger", "expand scope", "strategy review", "rethink this",
or "is this ambitious enough".
Proactively suggest when the user is questioning scope or ambition of a plan,
or when the plan feels like it could be thinking bigger.
## Preamble (run first)
```bash
@ -593,84 +597,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -3,14 +3,7 @@ name: plan-design-review
preamble-tier: 3
interactive: true
version: 2.0.0
description: |
Designer's eye plan review — interactive, like CEO and Eng review.
Rates each design dimension 0-10, explains what would make it a 10,
then fixes the plan to get there. Works in plan mode. For live site
visual audits, use /design-review. Use when asked to "review the design plan"
or "design critique".
Proactively suggest when the user has a plan with UI/UX components that
should be reviewed before implementation. (gstack)
description: Designer's eye plan review — interactive, like CEO and Eng review. (gstack)
allowed-tools:
- Read
- Edit
@ -26,6 +19,16 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Rates each design dimension 0-10, explains what would make it a 10,
then fixes the plan to get there. Works in plan mode. For live site
visual audits, use /design-review. Use when asked to "review the design plan"
or "design critique".
Proactively suggest when the user has a plan with UI/UX components that
should be reviewed before implementation.
## Preamble (run first)
```bash
@ -566,84 +569,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake
@ -1145,8 +1071,12 @@ This command generates the board HTML, starts an HTTP server on a random port,
and opens it in the user's default browser. **Run it in the background** with `&`
because the server needs to stay running while the user interacts with the board.
Parse the port from stderr output: `SERVE_STARTED: port=XXXXX`. You need this
for the board URL and for reloading during regeneration cycles.
Parse the board URL from stderr output. Default daemon path:
`BOARD_URL: http://127.0.0.1:N/boards/<id>/` (already includes the per-board
path; use this for the AskUserQuestion URL AND as the base for the reload
endpoint). Legacy `--no-daemon` path emits `SERVE_STARTED: port=XXXXX` and
serves a single board at `/`, with reload at `/api/reload` — only relevant
when an external caller explicitly passes `--no-daemon`.
**PRIMARY WAIT: AskUserQuestion with board URL**
@ -1154,11 +1084,14 @@ After the board is serving, use AskUserQuestion to wait for the user. Include th
board URL so they can click it if they lost the browser tab:
"I've opened a comparison board with the design variants:
http://127.0.0.1:<PORT>/ — Rate them, leave comments, remix
<BOARD_URL> — Rate them, leave comments, remix
elements you like, and click Submit when you're done. Let me know when you've
submitted your feedback (or paste your preferences here). If you clicked
Regenerate or Remix on the board, tell me and I'll generate new variants."
Substitute `<BOARD_URL>` with the URL parsed from stderr (the daemon path
emits `BOARD_URL: http://127.0.0.1:N/boards/<id>/`).
**Do NOT use AskUserQuestion to ask which variant the user prefers.** The comparison
board IS the chooser. AskUserQuestion is just the blocking wait mechanism.
@ -1202,8 +1135,13 @@ the approved variant.
2. If `regenerateAction` is `"remix"`, read `remixSpec` (e.g. `{"layout":"A","colors":"B"}`)
3. Generate new variants with `$D iterate` or `$D variants` using updated brief
4. Create new board: `$D compare --images "..." --output "$_DESIGN_DIR/design-board.html"`
5. Reload the board in the user's browser (same tab):
`curl -s -X POST http://127.0.0.1:PORT/api/reload -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
5. Reload the board in the user's browser (same tab) — the URL is per-board
under daemon mode, so use `<BOARD_URL>` (from the `BOARD_URL:` stderr
line) as the base:
`curl -s -X POST "${BOARD_URL}api/reload" -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'`
Under `--no-daemon` the reload endpoint is `/api/reload` at the legacy
port; this path only matters if the caller explicitly opted out of the
daemon.
6. The board auto-refreshes. **AskUserQuestion again** with the same board URL to
wait for the next round of feedback. Repeat until `feedback.json` appears.

View File

@ -3,16 +3,7 @@ name: plan-devex-review
preamble-tier: 3
interactive: true
version: 2.0.0
description: |
Interactive developer experience plan review. Explores developer personas,
benchmarks against competitors, designs magical moments, and traces friction
points before scoring. Three modes: DX EXPANSION (competitive advantage),
DX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).
Use when asked to "DX review", "developer experience audit", "devex review",
or "API design review".
Proactively suggest when the user has a plan for developer-facing products
(APIs, CLIs, SDKs, libraries, platforms, docs). (gstack)
Voice triggers (speech-to-text aliases): "dx review", "developer experience review", "devex review", "devex audit", "API design review", "onboarding review".
description: Interactive developer experience plan review. (gstack)
benefits-from: [office-hours]
allowed-tools:
- Read
@ -30,6 +21,20 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Explores developer personas,
benchmarks against competitors, designs magical moments, and traces friction
points before scoring. Three modes: DX EXPANSION (competitive advantage),
DX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).
Use when asked to "DX review", "developer experience audit", "devex review",
or "API design review".
Proactively suggest when the user has a plan for developer-facing products
(APIs, CLIs, SDKs, libraries, platforms, docs).
Voice triggers (speech-to-text aliases): "dx review", "developer experience review", "devex review", "devex audit", "API design review", "onboarding review".
## Preamble (run first)
```bash
@ -570,84 +575,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -3,14 +3,7 @@ name: plan-eng-review
preamble-tier: 3
interactive: true
version: 1.0.0
description: |
Eng manager-mode plan review. Lock in the execution plan — architecture,
data flow, diagrams, edge cases, test coverage, performance. Walks through
issues interactively with opinionated recommendations. Use when asked to
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation. (gstack)
Voice triggers (speech-to-text aliases): "tech review", "technical review", "plan engineering review".
description: Eng manager-mode plan review. (gstack)
benefits-from: [office-hours]
allowed-tools:
- Read
@ -28,6 +21,18 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Lock in the execution plan — architecture,
data flow, diagrams, edge cases, test coverage, performance. Walks through
issues interactively with opinionated recommendations. Use when asked to
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation.
Voice triggers (speech-to-text aliases): "tech review", "technical review", "plan engineering review".
## Preamble (run first)
```bash
@ -568,84 +573,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,19 +2,7 @@
name: plan-tune
preamble-tier: 2
version: 1.0.0
description: |
Self-tuning question sensitivity + developer psychographic for gstack (v1: observational).
Review which AskUserQuestion prompts fire across gstack skills, set per-question preferences
(never-ask / always-ask / ask-only-for-one-way), inspect the dual-track
profile (what you declared vs what your behavior suggests), and enable/disable
question tuning. Conversational interface — no CLI syntax required.
Use when asked to "tune questions", "stop asking me that", "too many questions",
"show my profile", "what questions have I been asked", "show my vibe",
"developer profile", or "turn off question tuning". (gstack)
Proactively suggest when the user says the same gstack question has come up before,
or when they explicitly override a recommendation for the Nth time.
description: Self-tuning question sensitivity + developer psychographic for gstack (v1: observational). (gstack)
triggers:
- tune questions
- stop asking me that
@ -35,6 +23,21 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Review which AskUserQuestion prompts fire across gstack skills, set per-question preferences
(never-ask / always-ask / ask-only-for-one-way), inspect the dual-track
profile (what you declared vs what your behavior suggests), and enable/disable
question tuning. Conversational interface — no CLI syntax required.
Use when asked to "tune questions", "stop asking me that", "too many questions",
"show my profile", "what questions have I been asked", "show my vibe",
"developer profile", or "turn off question tuning".
Proactively suggest when the user says the same gstack question has come up before,
or when they explicitly override a recommendation for the Nth time.
## Preamble (run first)
```bash
@ -575,84 +578,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,13 +2,7 @@
name: qa-only
preamble-tier: 4
version: 1.0.0
description: |
Report-only QA testing. Systematically tests a web application and produces a
structured report with health score, screenshots, and repro steps — but never
fixes anything. Use when asked to "just report bugs", "qa report only", or
"test but don't fix". For the full test-fix-verify loop, use /qa instead.
Proactively suggest when the user wants a bug report without any code changes. (gstack)
Voice triggers (speech-to-text aliases): "bug report", "just check for bugs".
description: Report-only QA testing. (gstack)
allowed-tools:
- Bash
- Read
@ -23,6 +17,17 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Systematically tests a web application and produces a
structured report with health score, screenshots, and repro steps — but never
fixes anything. Use when asked to "just report bugs", "qa report only", or
"test but don't fix". For the full test-fix-verify loop, use /qa instead.
Proactively suggest when the user wants a bug report without any code changes.
Voice triggers (speech-to-text aliases): "bug report", "just check for bugs".
## Preamble (run first)
```bash
@ -563,84 +568,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,16 +2,7 @@
name: qa
preamble-tier: 4
version: 2.0.0
description: |
Systematically QA test a web application and fix bugs found. Runs QA testing,
then iteratively fixes bugs in source code, committing each fix atomically and
re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs",
"test and fix", or "fix what's broken".
Proactively suggest when the user says a feature is ready for testing
or asks "does this work?". Three tiers: Quick (critical/high only),
Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,
fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only. (gstack)
Voice triggers (speech-to-text aliases): "quality check", "test the app", "run QA".
description: Systematically QA test a web application and fix bugs found. (gstack)
allowed-tools:
- Bash
- Read
@ -29,6 +20,20 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Runs QA testing,
then iteratively fixes bugs in source code, committing each fix atomically and
re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs",
"test and fix", or "fix what's broken".
Proactively suggest when the user says a feature is ready for testing
or asks "does this work?". Three tiers: Quick (critical/high only),
Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,
fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
Voice triggers (speech-to-text aliases): "quality check", "test the app", "run QA".
## Preamble (run first)
```bash
@ -569,84 +574,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,12 +2,7 @@
name: retro
preamble-tier: 2
version: 2.0.0
description: |
Weekly engineering retrospective. Analyzes commit history, work patterns,
and code quality metrics with persistent history and trend tracking.
Team-aware: breaks down per-person contributions with praise and growth areas.
Use when asked to "weekly retro", "what did we ship", or "engineering retrospective".
Proactively suggest at the end of a work week or sprint. (gstack)
description: Weekly engineering retrospective. (gstack)
allowed-tools:
- Bash
- Read
@ -41,6 +36,15 @@ gbrain:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Analyzes commit history, work patterns,
and code quality metrics with persistent history and trend tracking.
Team-aware: breaks down per-person contributions with praise and growth areas.
Use when asked to "weekly retro", "what did we ship", or "engineering retrospective".
Proactively suggest at the end of a work week or sprint.
## Preamble (run first)
```bash
@ -581,84 +585,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,11 +2,7 @@
name: review
preamble-tier: 4
version: 1.0.0
description: |
Pre-landing PR review. Analyzes diff against the base branch for SQL safety, LLM trust
boundary violations, conditional side effects, and other structural issues. Use when
asked to "review this PR", "code review", "pre-landing review", or "check my diff".
Proactively suggest when the user is about to merge or land code changes. (gstack)
description: Pre-landing PR review. (gstack)
allowed-tools:
- Bash
- Read
@ -26,6 +22,14 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Analyzes diff against the base branch for SQL safety, LLM trust
boundary violations, conditional side effects, and other structural issues. Use when
asked to "review this PR", "code review", "pre-landing review", or "check my diff".
Proactively suggest when the user is about to merge or land code changes.
## Preamble (run first)
```bash
@ -566,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,13 +1,7 @@
---
name: scrape
version: 1.0.0
description: |
Pull data from a web page. First call on a new intent prototypes the flow
via $B primitives and returns JSON. Subsequent calls on a matching intent
route to a codified browser-skill and return in ~200ms. Read-only — for
mutating flows (form fills, clicks, submissions), use /automate.
Use when asked to "scrape", "get data from", "pull", "extract from", or
"what's on" a page. (gstack)
description: Pull data from a web page. (gstack)
allowed-tools:
- Bash
- Read
@ -22,6 +16,16 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
First call on a new intent prototypes the flow
via $B primitives and returns JSON. Subsequent calls on a matching intent
route to a codified browser-skill and return in ~200ms. Read-only — for
mutating flows (form fills, clicks, submissions), use /automate.
Use when asked to "scrape", "get data from", "pull", "extract from", or
"what's on" a page.
## Preamble (run first)
```bash
@ -562,84 +566,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -0,0 +1,54 @@
#!/usr/bin/env bun
/**
* CLI for capturing a parity baseline snapshot.
*
* Usage:
* bun run scripts/capture-baseline.ts # default path
* bun run scripts/capture-baseline.ts --tag v1.44.1 # tag the snapshot
* bun run scripts/capture-baseline.ts --out path/to/baseline.json
*
* The default output path is test/fixtures/parity-baseline-<tag>.json,
* or test/fixtures/parity-baseline-current.json when no tag is given.
*/
import * as fs from 'fs';
import * as path from 'path';
import { captureBaseline } from '../test/helpers/capture-parity-baseline';
const ROOT = path.resolve(import.meta.dir, '..');
function arg(name: string): string | undefined {
const i = process.argv.indexOf(name);
if (i === -1) return undefined;
return process.argv[i + 1];
}
const tag = arg('--tag');
const outOverride = arg('--out');
const defaultOut = path.join(
ROOT,
'test',
'fixtures',
`parity-baseline-${tag ?? 'current'}.json`,
);
const outPath = outOverride ? path.resolve(outOverride) : defaultOut;
const baseline = captureBaseline({ repoRoot: ROOT, tag });
fs.mkdirSync(path.dirname(outPath), { recursive: true });
fs.writeFileSync(outPath, JSON.stringify(baseline, null, 2) + '\n');
const totalKB = Math.round(baseline.totalCorpusBytes / 1024);
const top3 = baseline.topHeaviest.slice(0, 3);
console.log(`Parity baseline captured: ${outPath}`);
console.log(` tag: ${baseline.tag}`);
console.log(` commit: ${baseline.capturedFromCommit}`);
console.log(` branch: ${baseline.capturedFromBranch}`);
console.log(` skills: ${baseline.totalSkills}`);
console.log(` total corpus: ${totalKB} KB`);
console.log(` catalog tokens: ~${baseline.estTotalCatalogTokens}`);
console.log(` top 3 heaviest:`);
for (const s of top3) {
const kb = Math.round(s.skillMdBytes / 1024);
console.log(` ${s.skill.padEnd(28)} ${kb} KB (${s.skillMdLines} lines, ~${s.estTokens} tokens)`);
}

View File

@ -16,7 +16,7 @@ import { writeLlmsTxt } from './gen-llms-txt';
import * as fs from 'fs';
import * as path from 'path';
import type { Host, TemplateContext } from './resolvers/types';
import { HOST_PATHS } from './resolvers/types';
import { HOST_PATHS, unwrapResolver } from './resolvers/types';
import { RESOLVERS } from './resolvers/index';
import { externalSkillName, extractHookSafetyProse as _extractHookSafetyProse, extractNameAndDescription as _extractNameAndDescription, condenseOpenAIShortDescription as _condenseOpenAIShortDescription, generateOpenAIYaml as _generateOpenAIYaml } from './resolvers/codex-helpers';
import { generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './resolvers/review';
@ -59,6 +59,41 @@ const MODEL_ARG_VAL: Model = (() => {
return resolved;
})();
// ─── Catalog Mode (v1.45.0.0 T4) ────────────────────────────
// 'trim' (default): shorten frontmatter description to lead sentence,
// move routing/voice prose into a "## When to invoke" body section, and
// emit scripts/proactive-suggestions.json (single file across all skills).
// 'full': legacy v1.44 behavior — full description stays in frontmatter.
const CATALOG_MODE_ARG = process.argv.find(a => a.startsWith('--catalog-mode'));
const CATALOG_MODE: 'trim' | 'full' = (() => {
if (!CATALOG_MODE_ARG) return 'trim';
const val = CATALOG_MODE_ARG.includes('=')
? CATALOG_MODE_ARG.split('=')[1]
: process.argv[process.argv.indexOf(CATALOG_MODE_ARG) + 1];
if (val !== 'trim' && val !== 'full') {
throw new Error(`Unknown catalog mode: ${val}. Use 'trim' (default) or 'full'.`);
}
return val;
})();
// ─── Explain-level Overlay ──────────────────────────────────
// --explain-level=terse compresses preamble prose (writing-style, completeness,
// confusion-protocol, context-health) to a single pointer line at gen time.
// Default keeps the runtime-conditional behavior (sections render unconditionally,
// the model skips them when EXPLAIN_LEVEL: terse appears in the preamble echo).
// Opt-in via the build flag so most users get the runtime-flexible default.
const EXPLAIN_LEVEL_ARG = process.argv.find(a => a.startsWith('--explain-level'));
const EXPLAIN_LEVEL: 'default' | 'terse' = (() => {
if (!EXPLAIN_LEVEL_ARG) return 'default';
const val = EXPLAIN_LEVEL_ARG.includes('=')
? EXPLAIN_LEVEL_ARG.split('=')[1]
: process.argv[process.argv.indexOf(EXPLAIN_LEVEL_ARG) + 1];
if (val !== 'default' && val !== 'terse') {
throw new Error(`Unknown explain level: ${val}. Use 'default' or 'terse'.`);
}
return val;
})();
// HostPaths, HOST_PATHS, and TemplateContext imported from ./resolvers/types (line 7-8)
// Design constants (AI_SLOP_BLACKLIST, OPENAI_HARD_REJECTIONS, OPENAI_LITMUS_CHECKS)
// live in ./resolvers/constants and are consumed by resolvers directly.
@ -172,6 +207,169 @@ function processVoiceTriggers(content: string): string {
// Export for testing
export { extractVoiceTriggers, processVoiceTriggers };
// ─── Catalog Trim (v1.45.0.0 T4) ─────────────────────────────
//
// Frontmatter `description:` blocks today pack: a one-line outcome, "Use when
// asked to..." voice triggers, "Proactively..." routing guidance, and a
// "(gstack)" tag. This pile is the always-loaded catalog surface — every
// session pays for the full text. The catalog trim splits the description
// into a one-line catalog entry (lead sentence + "(gstack)") that stays in
// the frontmatter, and a "## When to invoke" body section that holds the
// routing/voice triggers prose for in-skill discovery. A registry written
// to scripts/proactive-suggestions.json (one entry per skill) makes routing
// available to agents that need it without paying the always-loaded cost.
//
// Opt-out: `--catalog-mode=full` keeps v1.44 behavior (no trim, full
// description in frontmatter). Use when debugging routing regressions or
// when shipping skills to hosts that depend on the legacy fat catalog.
export interface CatalogParts {
lead: string; // First sentence — kept in catalog
routingProse: string; // "Use when asked to...", "Proactively..." paragraphs
voiceLine: string | null; // "Voice triggers (speech-to-text aliases): ..." line if present
hasGstackTag: boolean;
}
export function splitCatalogDescription(description: string): CatalogParts {
// Voice triggers line (folded in by processVoiceTriggers earlier)
const voiceMatch = description.match(/Voice triggers \(speech-to-text aliases\):[^\n]+/);
const voiceLine = voiceMatch ? voiceMatch[0] : null;
let working = voiceLine ? description.replace(voiceLine, '').trim() : description.trim();
const hasGstackTag = /\(gstack\)/.test(working);
if (hasGstackTag) working = working.replace(/\(gstack\)/, '').trim();
// Lead = first sentence (up to first period followed by space or end of string).
// We tolerate sentences with embedded periods (URLs, "v1.45.0.0") by requiring
// the period to be followed by whitespace OR end-of-text.
// First normalize to single-line for sentence detection, then back out.
const collapsed = working.replace(/\s+/g, ' ').trim();
const sentenceMatch = collapsed.match(/^([^.!?]*[.!?])(?:\s|$)/);
// sentenceLead is the FULL first sentence (no truncation). We compute routing
// from this position, then optionally truncate the displayed lead afterwards.
// Truncating first then computing routing was the v1.45.0.0 bug — when the
// first sentence exceeded 200 chars, the routing extraction would lose the
// entire tail of the description (design-consultation's "Use when..."
// routing prose silently dropped).
const sentenceLead = sentenceMatch ? sentenceMatch[1].trim() : collapsed.split(/\s/).slice(0, 20).join(' ');
// Routing prose: everything AFTER the first sentence boundary in the collapsed view.
const leadInCollapsed = collapsed.indexOf(sentenceLead);
const routingCollapsed = leadInCollapsed >= 0
? collapsed.slice(leadInCollapsed + sentenceLead.length).trim()
: '';
// Now produce the displayed lead — truncated if too long. The original
// sentenceLead is preserved for routing extraction below.
let lead = sentenceLead;
if (lead.length > 200) {
const trunc = lead.slice(0, 197);
const lastSpace = trunc.lastIndexOf(' ');
lead = (lastSpace > 60 ? trunc.slice(0, lastSpace) : trunc) + '...';
}
// Restore line breaks for routing prose by mapping back to original layout.
// Use original whitespace structure where possible; fall back to collapsed.
// Anchor recovery on sentenceLead (the untruncated first sentence) — not
// `lead` (which may have a "..." suffix and won't substring-match `working`).
let routingProse = routingCollapsed;
const collapsedLeadIdx = working.replace(/\s+/g, ' ').indexOf(sentenceLead);
if (collapsedLeadIdx >= 0) {
let consumed = 0;
let cut = 0;
for (let i = 0; i < working.length && consumed < collapsedLeadIdx + sentenceLead.length; i++) {
if (/\s/.test(working[i])) {
if (i === 0 || /\s/.test(working[i - 1])) continue;
consumed += 1;
} else {
consumed += 1;
}
cut = i + 1;
}
const tail = working.slice(cut).trim();
if (tail.length > 0) routingProse = tail;
}
return { lead, routingProse, voiceLine, hasGstackTag };
}
/** Build the catalog-trimmed `description:` block. */
export function buildTrimmedDescription(parts: CatalogParts): string {
const lead = parts.lead.trim();
const suffix = parts.hasGstackTag ? ' (gstack)' : '';
return `${lead}${suffix}`;
}
/** Build the body section that holds the routing/voice prose. */
export function buildWhenToInvokeSection(parts: CatalogParts): string {
const lines: string[] = ['## When to invoke this skill', ''];
if (parts.routingProse) {
lines.push(parts.routingProse);
lines.push('');
}
if (parts.voiceLine) {
lines.push(parts.voiceLine);
lines.push('');
}
return lines.join('\n');
}
/**
* Apply catalog trim to a SKILL.md body:
* - shorten frontmatter `description:` to lead + (gstack)
* - insert "## When to invoke" body section AFTER the generated header
* (so it lands near the top of body content, where routing guidance
* belongs)
*
* Returns the rewritten content plus the parts (used for proactive-suggestions
* JSON aggregation at the end of the run).
*/
export function applyCatalogTrim(content: string, skillName: string): { content: string; parts: CatalogParts } | null {
// Locate description block in frontmatter
if (!content.startsWith('---\n')) return null;
const fmEnd = content.indexOf('\n---', 4);
if (fmEnd === -1) return null;
const frontmatter = content.slice(4, fmEnd);
// Match `description: |` block + indented body lines
const descMatch = frontmatter.match(/^description:\s*\|?\s*\n((?:\s{2,}.*(?:\n|$))+)/m)
|| frontmatter.match(/^description:\s+(.+)$/m);
if (!descMatch) return null;
// Extract full description text
let descText: string;
if (descMatch[0].startsWith('description: |') || /^description:\s*\|/.test(descMatch[0])) {
descText = descMatch[1].split('\n').map(l => l.replace(/^\s{2}/, '')).join('\n').trim();
} else {
descText = descMatch[1].trim();
}
// Skip skills with very short descriptions (already trimmed or no routing prose).
// Below ~120 chars, splitting adds no value.
if (descText.length < 120) return null;
const parts = splitCatalogDescription(descText);
// If lead + (gstack) is already most of the text, no trim needed.
const trimmedLen = buildTrimmedDescription(parts).length;
if (trimmedLen >= descText.length - 20) return null;
// Replace description in frontmatter — keep trailing newline so the next
// YAML field doesn't collide on the same line as the description value.
const newDesc = buildTrimmedDescription(parts);
const newFrontmatter = frontmatter.replace(descMatch[0], `description: ${newDesc}\n`);
let newContent = '---\n' + newFrontmatter + content.slice(fmEnd);
// Insert body section after frontmatter (after the closing ---\n and any
// existing GENERATED header). We insert before the first non-comment line.
const bodyStart = newContent.indexOf('\n---\n') + 5;
const whenToInvoke = '\n' + buildWhenToInvokeSection(parts).trim() + '\n';
// Skip past the generated header if present (it lives after frontmatter close)
const headerMatch = newContent.slice(bodyStart).match(/^(<!--[^>]*-->\s*\n)+/);
const insertAt = bodyStart + (headerMatch ? headerMatch[0].length : 0);
newContent = newContent.slice(0, insertAt) + whenToInvoke + '\n' + newContent.slice(insertAt);
return { content: newContent, parts };
}
const OPENAI_SHORT_DESCRIPTION_LIMIT = 120;
function condenseOpenAIShortDescription(description: string): string {
@ -401,7 +599,7 @@ function processExternalHost(
return { content: result, outputPath, outputDir, symlinkLoop };
}
function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath: string; content: string; symlinkLoop?: boolean } {
function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath: string; content: string; symlinkLoop?: boolean; catalogParts?: CatalogParts | null } {
const tmplContent = fs.readFileSync(tmplPath, 'utf-8');
const relTmplPath = path.relative(ROOT, tmplPath);
let outputPath = tmplPath.replace(/\.tmpl$/, '');
@ -430,7 +628,7 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host], preambleTier, model: MODEL_ARG_VAL, interactive };
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host], preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL };
// Replace placeholders (supports parameterized: {{NAME:arg1:arg2}})
// Config-driven: suppressedResolvers return empty string for this host
@ -441,9 +639,11 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
const resolverName = parts[0];
const args = parts.slice(1);
if (suppressed.has(resolverName)) return '';
const resolver = RESOLVERS[resolverName];
if (!resolver) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
return args.length > 0 ? resolver(ctx, args) : resolver(ctx);
const entry = RESOLVERS[resolverName];
if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
const { resolve, appliesTo } = unwrapResolver(entry);
if (appliesTo && !appliesTo(ctx)) return '';
return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
});
// Check for any remaining unresolved placeholders
@ -483,7 +683,17 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
content = header + content;
}
return { outputPath, content, symlinkLoop };
// Catalog trim (Claude only — external hosts have their own frontmatter shapes)
let catalogParts: CatalogParts | null = null;
if (host === 'claude' && CATALOG_MODE === 'trim') {
const trimmed = applyCatalogTrim(content, skillName);
if (trimmed) {
content = trimmed.content;
catalogParts = trimmed.parts;
}
}
return { outputPath, content, symlinkLoop, catalogParts };
}
// ─── Main ───────────────────────────────────────────────────
@ -503,6 +713,14 @@ for (const currentHost of hostsToRun) {
let hasChanges = false;
const tokenBudget: Array<{ skill: string; lines: number; tokens: number }> = [];
// T4 catalog trim: collect routing/voice parts across all Claude skills,
// then write scripts/proactive-suggestions.json once per gen-skill-docs run.
const proactiveAggregate: Record<string, {
lead: string;
routing: string;
voice_line: string | null;
}> = {};
const currentHostConfig = getHostConfig(currentHost);
for (const tmplPath of findTemplates()) {
const dir = path.basename(path.dirname(tmplPath));
@ -516,7 +734,24 @@ for (const currentHost of hostsToRun) {
if (currentHostConfig.generation.skipSkills.includes(dir)) continue;
}
const { outputPath, content, symlinkLoop } = processTemplate(tmplPath, currentHost);
const { outputPath, content, symlinkLoop, catalogParts } = processTemplate(tmplPath, currentHost);
if (catalogParts) {
// Root-skill detection: when the template lives at ROOT/SKILL.md.tmpl,
// path.basename(path.dirname(tmplPath)) returns the repo's directory
// name (e.g. "seville-v3" in a Conductor worktree, "gstack" on CI).
// That's non-deterministic across machines and breaks CI freshness
// checks. Use the frontmatter `name` field as the registry key — the
// root SKILL.md.tmpl declares `name: gstack` explicitly. For all other
// skills, `dir` matches the directory name which matches the
// frontmatter name by convention.
const isRoot = path.dirname(tmplPath) === ROOT;
const key = isRoot ? 'gstack' : dir;
proactiveAggregate[key] = {
lead: catalogParts.lead,
routing: catalogParts.routingProse,
voice_line: catalogParts.voiceLine,
};
}
const relOutput = path.relative(ROOT, outputPath);
if (symlinkLoop) {
@ -620,6 +855,40 @@ The orchestrator will persist the plan link to its own memory/knowledge store.
failures.push({ host: currentHost, error: new Error('Stale files detected') });
}
// T4 catalog trim: write aggregated proactive-suggestions.json (Claude only).
// The JSON registry lets agents pull voice triggers / routing prose for any
// skill on demand instead of paying for it always-loaded in the catalog.
//
// No timestamp field — keeps the file content-deterministic across runs so
// CI dry-run freshness checks don't flap on regen. If a per-run timestamp
// is ever needed for debugging, write it to a separate `.gen-stamp` file.
if (currentHost === 'claude' && CATALOG_MODE === 'trim' && Object.keys(proactiveAggregate).length > 0 && !DRY_RUN) {
const proactivePath = path.join(ROOT, 'scripts', 'proactive-suggestions.json');
// Sort keys alphabetically so the serialized JSON is identical across
// machines regardless of filesystem-iteration order. Without this, CI
// freshness checks fail when the local dev machine and CI runner
// discover templates in different orders.
const sortedSkills: typeof proactiveAggregate = {};
for (const key of Object.keys(proactiveAggregate).sort()) {
sortedSkills[key] = proactiveAggregate[key];
}
const payload = {
$schema: 'https://gstack.dev/schemas/proactive-suggestions.json',
catalog_mode: 'trim',
note: 'Routing / voice-trigger prose extracted from SKILL.md frontmatter descriptions during catalog trim. Loaded on demand when routing guidance is needed.',
skills: sortedSkills,
};
const serialized = JSON.stringify(payload, null, 2) + '\n';
// Only write if content actually changed — prevents needless touches that
// would flap CI freshness checks. Read existing file, compare, skip write
// when identical.
let existing = '';
try { existing = fs.readFileSync(proactivePath, 'utf-8'); } catch { /* first run */ }
if (existing !== serialized) {
fs.writeFileSync(proactivePath, serialized);
}
}
// Print token budget summary
if (!DRY_RUN && tokenBudget.length > 0) {
tokenBudget.sort((a, b) => b.lines - a.lines);

View File

@ -0,0 +1,267 @@
{
"$schema": "https://gstack.dev/schemas/proactive-suggestions.json",
"catalog_mode": "trim",
"note": "Routing / voice-trigger prose extracted from SKILL.md frontmatter descriptions during catalog trim. Loaded on demand when routing guidance is needed.",
"skills": {
"autoplan": {
"lead": "Auto-review pipeline — reads the full CEO, design, eng, and DX review skills from disk and runs them sequentially with auto-decisions using 6 decision principles.",
"routing": "Surfaces\ntaste decisions (close approaches, borderline scope, codex disagreements) at a final\napproval gate. One command, fully reviewed plan out.\nUse when asked to \"auto review\", \"autoplan\", \"run all reviews\", \"review this plan\nautomatically\", or \"make the decisions for me\".\nProactively suggest when the user has a plan file and wants to run the full review\ngauntlet without answering 15-30 intermediate questions.",
"voice_line": "Voice triggers (speech-to-text aliases): \"auto plan\", \"automatic review\"."
},
"benchmark": {
"lead": "Performance regression detection using the browse daemon.",
"routing": "Establishes\nbaselines for page load times, Core Web Vitals, and resource sizes.\nCompares before/after on every PR. Tracks performance trends over time.\nUse when: \"performance\", \"benchmark\", \"page speed\", \"lighthouse\", \"web vitals\",\n\"bundle size\", \"load time\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"speed test\", \"check performance\"."
},
"benchmark-models": {
"lead": "Cross-model benchmark for gstack skills.",
"routing": "Runs the same prompt through Claude,\nGPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,\nand optionally quality via LLM judge. Answers \"which model is actually best\nfor this skill?\" with data instead of vibes. Separate from /benchmark, which\nmeasures web page performance. Use when: \"benchmark models\", \"compare models\",\n\"which model is best for X\", \"cross-model comparison\", \"model shootout\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"compare models\", \"model shootout\", \"which model is best\"."
},
"browse": {
"lead": "Fast headless browser for QA testing and site dogfooding.",
"routing": "Navigate any URL, interact with\nelements, verify page state, diff before/after actions, take annotated screenshots, check\nresponsive layouts, test forms and uploads, handle dialogs, and assert element states.\n~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a\nuser flow, or file a bug with evidence. Use when asked to \"open in browser\", \"test the\nsite\", \"take a screenshot\", or \"dogfood this\".",
"voice_line": null
},
"canary": {
"lead": "Post-deploy canary monitoring.",
"routing": "Watches the live app for console errors,\nperformance regressions, and page failures using the browse daemon. Takes\nperiodic screenshots, compares against pre-deploy baselines, and alerts\non anomalies. Use when: \"monitor deploy\", \"canary\", \"post-deploy check\",\n\"watch production\", \"verify deploy\".",
"voice_line": null
},
"careful": {
"lead": "Safety guardrails for destructive commands.",
"routing": "Warns before rm -rf, DROP TABLE,\nforce-push, git reset --hard, kubectl delete, and similar destructive operations.\nUser can override each warning. Use when touching prod, debugging live systems,\nor working in a shared environment. Use when asked to \"be careful\", \"safety mode\",\n\"prod mode\", or \"careful mode\".",
"voice_line": null
},
"codex": {
"lead": "OpenAI Codex CLI wrapper — three modes.",
"routing": "Code review: independent diff review via\ncodex review with pass/fail gate. Challenge: adversarial mode that tries to break\nyour code. Consult: ask codex anything with session continuity for follow-ups.\nThe \"200 IQ autistic developer\" second opinion. Use when asked to \"codex review\",\n\"codex challenge\", \"ask codex\", \"second opinion\", or \"consult codex\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"code x\", \"code ex\", \"get another opinion\"."
},
"context-restore": {
"lead": "Restore working context saved earlier by /context-save.",
"routing": "Loads the most recent\nsaved state (across all branches by default) so you can pick up where you\nleft off — even across Conductor workspace handoffs.\nUse when asked to \"resume\", \"restore context\", \"where was I\", or\n\"pick up where I left off\". Pair with /context-save.\nFormerly /checkpoint resume — renamed because Claude Code treats /checkpoint\nas a native rewind alias in current environments.",
"voice_line": null
},
"context-save": {
"lead": "Save working context.",
"routing": "Captures git state, decisions made, and remaining work\nso any future session can pick up without losing a beat.\nUse when asked to \"save progress\", \"save state\", \"context save\", or\n\"save my work\". Pair with /context-restore to resume later.\nFormerly /checkpoint — renamed because Claude Code treats /checkpoint as a\nnative rewind alias in current environments, which was shadowing this skill.",
"voice_line": null
},
"cso": {
"lead": "Chief Security Officer mode.",
"routing": "Infrastructure-first security audit: secrets archaeology,\ndependency supply chain, CI/CD pipeline security, LLM/AI security, skill supply chain\nscanning, plus OWASP Top 10, STRIDE threat modeling, and active verification.\nTwo modes: daily (zero-noise, 8/10 confidence gate) and comprehensive (monthly deep\nscan, 2/10 bar). Trend tracking across audit runs.\nUse when: \"security audit\", \"threat model\", \"pentest review\", \"OWASP\", \"CSO review\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"see-so\", \"see so\", \"security review\", \"security check\", \"vulnerability scan\", \"run security\"."
},
"design-consultation": {
"lead": "Design consultation: understands your product, researches the landscape, proposes a complete design system (aesthetic, typography, color, layout, spacing, motion), and generates font+color preview...",
"routing": "Creates DESIGN.md as your project's design source\nof truth. For existing sites, use /plan-design-review to infer the system instead.\nUse when asked to \"design system\", \"brand guidelines\", or \"create DESIGN.md\".\nProactively suggest when starting a new project's UI with no existing\ndesign system or DESIGN.md.",
"voice_line": null
},
"design-html": {
"lead": "Design finalization: generates production-quality Pretext-native HTML/CSS.",
"routing": "Works with approved mockups from /design-shotgun, CEO plans from /plan-ceo-review,\ndesign review context from /plan-design-review, or from scratch with a user\ndescription. Text actually reflows, heights are computed, layouts are dynamic.\n30KB overhead, zero deps. Smart API routing: picks the right Pretext patterns\nfor each design type. Use when: \"finalize this design\", \"turn this into HTML\",\n\"build me a page\", \"implement this design\", or after any planning skill.\nProactively suggest when user has approved a design or has a plan ready.",
"voice_line": "Voice triggers (speech-to-text aliases): \"build the design\", \"code the mockup\", \"make it real\"."
},
"design-review": {
"lead": "Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems, AI slop patterns, and slow interactions — then fixes them.",
"routing": "Iteratively fixes issues\nin source code, committing each fix atomically and re-verifying with before/after\nscreenshots. For plan-mode design review (before implementation), use /plan-design-review.\nUse when asked to \"audit the design\", \"visual QA\", \"check if it looks good\", or \"design polish\".\nProactively suggest when the user mentions visual inconsistencies or\nwants to polish the look of a live site.",
"voice_line": null
},
"design-shotgun": {
"lead": "Design shotgun: generate multiple AI design variants, open a comparison board, collect structured feedback, and iterate.",
"routing": "Standalone design exploration you can\nrun anytime. Use when: \"explore designs\", \"show me options\", \"design variants\",\n\"visual brainstorm\", or \"I don't like how this looks\".\nProactively suggest when the user describes a UI feature but hasn't seen\nwhat it could look like.",
"voice_line": null
},
"devex-review": {
"lead": "Live developer experience audit.",
"routing": "Uses the browse tool to actually TEST the\ndeveloper experience: navigates docs, tries the getting started flow, times\nTTHW, screenshots error messages, evaluates CLI help text. Produces a DX\nscorecard with evidence. Compares against /plan-devex-review scores if they\nexist (the boomerang: plan said 3 minutes, reality says 8). Use when asked to\n\"test the DX\", \"DX audit\", \"developer experience test\", or \"try the\nonboarding\". Proactively suggest after shipping a developer-facing feature.",
"voice_line": "Voice triggers (speech-to-text aliases): \"dx audit\", \"test the developer experience\", \"try the onboarding\", \"developer experience test\"."
},
"document-generate": {
"lead": "Generate missing documentation from scratch for a feature, module, or entire project.",
"routing": "Uses the Diataxis framework (tutorial / how-to / reference / explanation) to produce\ncomplete, structured documentation. Can be invoked standalone or called by\n/document-release when it finds coverage gaps. Use when asked to \"write docs\",\n\"generate documentation\", \"document this feature\", \"create a tutorial\", or\n\"explain this module\".",
"voice_line": null
},
"document-release": {
"lead": "Post-ship documentation update.",
"routing": "Reads all project docs, cross-references the\ndiff, builds a Diataxis coverage map (reference/how-to/tutorial/explanation),\nupdates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped,\ndetects architecture diagram drift, polishes CHANGELOG voice with a sell-test\nrubric, cleans up TODOS, and optionally bumps VERSION. Surfaces documentation\ndebt in the PR body. Use when asked to \"update the docs\", \"sync documentation\",\nor \"post-ship docs\". Proactively suggest after a PR is merged or code is shipped.",
"voice_line": null
},
"freeze": {
"lead": "Restrict file edits to a specific directory for the session.",
"routing": "Blocks Edit and\nWrite outside the allowed path. Use when debugging to prevent accidentally\n\"fixing\" unrelated code, or when you want to scope changes to one module.\nUse when asked to \"freeze\", \"restrict edits\", \"only edit this folder\",\nor \"lock down edits\".",
"voice_line": null
},
"gstack": {
"lead": "Fast headless browser for QA testing and site dogfooding.",
"routing": "Navigate pages, interact with\nelements, verify state, diff before/after, take annotated screenshots, test responsive\nlayouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or\ntest a site, verify a deployment, dogfood a user flow, or file a bug with screenshots.",
"voice_line": null
},
"gstack-upgrade": {
"lead": "Upgrade gstack to the latest version.",
"routing": "Detects global vs vendored install,\nruns the upgrade, and shows what's new. Use when asked to \"upgrade gstack\",\n\"update gstack\", or \"get latest version\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"upgrade the tools\", \"update the tools\", \"gee stack upgrade\", \"g stack upgrade\"."
},
"guard": {
"lead": "Full safety mode: destructive command warnings + directory-scoped edits.",
"routing": "Combines /careful (warns before rm -rf, DROP TABLE, force-push, etc.) with\n/freeze (blocks edits outside a specified directory). Use for maximum safety\nwhen touching prod or debugging live systems. Use when asked to \"guard mode\",\n\"full safety\", \"lock it down\", or \"maximum safety\".",
"voice_line": null
},
"health": {
"lead": "Code quality dashboard.",
"routing": "Wraps existing project tools (type checker, linter,\ntest runner, dead code detector, shell linter), computes a weighted composite\n0-10 score, and tracks trends over time. Use when: \"health check\",\n\"code quality\", \"how healthy is the codebase\", \"run all checks\",\n\"quality score\".",
"voice_line": null
},
"investigate": {
"lead": "Systematic debugging with root cause investigation.",
"routing": "Four phases: investigate,\nanalyze, hypothesize, implement. Iron Law: no fixes without root cause.\nUse when asked to \"debug this\", \"fix this bug\", \"why is this broken\",\n\"investigate this error\", or \"root cause analysis\".\nProactively invoke this skill (do NOT debug directly) when the user reports\nerrors, 500 errors, stack traces, unexpected behavior, \"it was working\nyesterday\", or is troubleshooting why something stopped working.",
"voice_line": null
},
"ios-clean": {
"lead": "Remove the DebugBridge SPM package and all #if DEBUG wiring from an iOS app.",
"routing": "Cleans up StateServer, DebugOverlay, accessor codegen output, and\napp-side hooks installed by /ios-qa. This is a convenience wrapper —\nthe structural Release-build guard (Package.swift conditional + CI\nswift build -c release check) is the safety-critical path.\nUse when asked to \"clean the iOS debug bridge\", \"remove DebugBridge\",\nor \"strip the gstack iOS instrumentation\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"clean the iOS debug bridge\", \"remove DebugBridge\", \"strip the gstack iOS instrumentation\"."
},
"ios-design-review": {
"lead": "Visual design audit for iOS apps on real hardware.",
"routing": "Connects to a real\niPhone via the same StateServer as /ios-qa, screenshots every screen,\nevaluates against Apple HIG, DESIGN.md, and design best practices. Scores\neach dimension 0-10 with \"what would make it a 10\" framing — mirrors\n/plan-design-review for browser. For plan-stage design review (before\nimplementation), use /plan-design-review. For live web visual audits, use\n/design-review.\nUse when asked to \"review the iOS design\", \"audit the iPhone app's\nvisuals\", or \"design QA the iOS app\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"review the iOS design\", \"audit the iPhone app's visuals\", \"design QA the iPhone app\"."
},
"ios-fix": {
"lead": "Autonomous iOS bug fixer.",
"routing": "Takes a bug found by /ios-qa, reads the source,\nwrites the fix, rebuilds, redeploys, and verifies the fix on the real\ndevice. Closes the loop: find bug → fix bug → confirm fix — zero human\nintervention. Captures the pre-bug state snapshot as a regression test\nfixture, so the bug can never recur silently.\nUse when /ios-qa reports a bug and you want it fixed automatically, or\nwhen asked to \"fix this iOS bug\", \"patch the iPhone app\", or \"auto-fix\nthe iOS issue\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"fix the iOS bug\", \"patch the iPhone app\", \"auto-fix the iOS issue\"."
},
"ios-qa": {
"lead": "Live-device iOS QA for SwiftUI apps.",
"routing": "Connects to a real iPhone via USB\nCoreDevice IPv6 tunnel, reads Swift source to understand every screen, then\nruns a vision-driven agent loop: screenshot → analyze → decide → act →\nverify → repeat. All interaction happens via HTTP to an embedded\nStateServer in the app under test. Optionally exposes the device over\nTailscale so remote agents (OpenClaw, Codex, any HTTP-capable agent) can\nrun iOS QA from anywhere without touching the hardware.\nUse when asked to \"ios qa\", \"test my iPhone app\", \"find bugs on the device\",\nor \"qa the iOS app\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"iOS quality check\", \"test the iPhone app\", \"run iOS QA\"."
},
"ios-sync": {
"lead": "Regenerate the iOS debug bridge against the latest upstream gstack templates.",
"routing": "Updates StateServer.swift, DebugOverlay.swift, Package.swift,\nand the typed @Observable state accessors. Use after you upgrade gstack\nor add new ViewModels/properties that need accessor coverage.\nUse when asked to \"resync the iOS debug bridge\", \"regenerate iOS\naccessors\", or \"update the gstack iOS instrumentation\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"resync the iOS debug bridge\", \"regenerate iOS accessors\", \"update the gstack iOS instrumentation\"."
},
"land-and-deploy": {
"lead": "Land and deploy workflow.",
"routing": "Merges the PR, waits for CI and deploy,\nverifies production health via canary checks. Takes over after /ship\ncreates the PR. Use when: \"merge\", \"land\", \"deploy\", \"merge and verify\",\n\"land it\", \"ship it to production\".",
"voice_line": null
},
"landing-report": {
"lead": "Read-only queue dashboard for workspace-aware ship.",
"routing": "Shows which VERSION slots\nare currently claimed by open PRs, which sibling Conductor workspaces have\nWIP work likely to ship soon, and what slot /ship would pick next. No\nmutations — just a snapshot. Use when asked to \"landing report\", \"what's in\nthe queue\", \"show me open PRs\", or \"which version do I claim next\".",
"voice_line": null
},
"learn": {
"lead": "Manage project learnings.",
"routing": "Review, search, prune, and export what gstack\nhas learned across sessions. Use when asked to \"what have we learned\",\n\"show learnings\", \"prune stale learnings\", or \"export learnings\".\nProactively suggest when the user asks about past patterns or wonders\n\"didn't we fix this before?\"",
"voice_line": null
},
"make-pdf": {
"lead": "Turn any markdown file into a publication-quality PDF.",
"routing": "Proper 1in margins,\nintelligent page breaks, page numbers, cover pages, running headers, curly\nquotes and em dashes, clickable TOC, diagonal DRAFT watermark. Not a draft\nartifact — a finished artifact. Use when asked to \"make a PDF\", \"export to\nPDF\", \"turn this markdown into a PDF\", or \"generate a document\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"make this a pdf\", \"make it a pdf\", \"export to pdf\", \"turn this into a pdf\", \"turn this markdown into a pdf\", \"generate a pdf\", \"make a pdf from\", \"pdf this markdown\"."
},
"office-hours": {
"lead": "YC Office Hours — two modes.",
"routing": "Startup mode: six forcing questions that expose\ndemand reality, status quo, desperate specificity, narrowest wedge, observation,\nand future-fit. Builder mode: design thinking brainstorming for side projects,\nhackathons, learning, and open source. Saves a design doc.\nUse when asked to \"brainstorm this\", \"I have an idea\", \"help me think through\nthis\", \"office hours\", or \"is this worth building\".\nProactively invoke this skill (do NOT answer directly) when the user describes\na new product idea, asks whether something is worth building, wants to think\nthrough design decisions for something that doesn't exist yet, or is exploring\na concept before any code is written.\nUse before /plan-ceo-review or /plan-eng-review.",
"voice_line": null
},
"open-gstack-browser": {
"lead": "Launch GStack Browser — AI-controlled Chromium with the sidebar extension baked in.",
"routing": "Opens a visible browser window where you can watch every action in real time.\nThe sidebar shows a live activity feed and chat. Anti-bot stealth built in.\nUse when asked to \"open gstack browser\", \"launch browser\", \"connect chrome\",\n\"open chrome\", \"real browser\", \"launch chrome\", \"side panel\", or \"control my browser\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"show me the browser\"."
},
"pair-agent": {
"lead": "Pair a remote AI agent with your browser.",
"routing": "One command generates a setup key and\nprints instructions the other agent can follow to connect. Works with OpenClaw,\nHermes, Codex, Cursor, or any agent that can make HTTP requests. The remote agent\ngets its own tab with scoped access (read+write by default, admin on request).\nUse when asked to \"pair agent\", \"connect agent\", \"share browser\", \"remote browser\",\n\"let another agent use my browser\", or \"give browser access\".",
"voice_line": "Voice triggers (speech-to-text aliases): \"pair agent\", \"connect agent\", \"share my browser\", \"remote browser access\"."
},
"plan-ceo-review": {
"lead": "CEO/founder-mode plan review.",
"routing": "Rethink the problem, find the 10-star product,\nchallenge premises, expand scope when it creates a better product. Four modes:\nSCOPE EXPANSION (dream big), SELECTIVE EXPANSION (hold scope + cherry-pick\nexpansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials).\nUse when asked to \"think bigger\", \"expand scope\", \"strategy review\", \"rethink this\",\nor \"is this ambitious enough\".\nProactively suggest when the user is questioning scope or ambition of a plan,\nor when the plan feels like it could be thinking bigger.",
"voice_line": null
},
"plan-design-review": {
"lead": "Designer's eye plan review — interactive, like CEO and Eng review.",
"routing": "Rates each design dimension 0-10, explains what would make it a 10,\nthen fixes the plan to get there. Works in plan mode. For live site\nvisual audits, use /design-review. Use when asked to \"review the design plan\"\nor \"design critique\".\nProactively suggest when the user has a plan with UI/UX components that\nshould be reviewed before implementation.",
"voice_line": null
},
"plan-devex-review": {
"lead": "Interactive developer experience plan review.",
"routing": "Explores developer personas,\nbenchmarks against competitors, designs magical moments, and traces friction\npoints before scoring. Three modes: DX EXPANSION (competitive advantage),\nDX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).\nUse when asked to \"DX review\", \"developer experience audit\", \"devex review\",\nor \"API design review\".\nProactively suggest when the user has a plan for developer-facing products\n(APIs, CLIs, SDKs, libraries, platforms, docs).",
"voice_line": "Voice triggers (speech-to-text aliases): \"dx review\", \"developer experience review\", \"devex review\", \"devex audit\", \"API design review\", \"onboarding review\"."
},
"plan-eng-review": {
"lead": "Eng manager-mode plan review.",
"routing": "Lock in the execution plan — architecture,\ndata flow, diagrams, edge cases, test coverage, performance. Walks through\nissues interactively with opinionated recommendations. Use when asked to\n\"review the architecture\", \"engineering review\", or \"lock in the plan\".\nProactively suggest when the user has a plan or design doc and is about to\nstart coding — to catch architecture issues before implementation.",
"voice_line": "Voice triggers (speech-to-text aliases): \"tech review\", \"technical review\", \"plan engineering review\"."
},
"plan-tune": {
"lead": "Self-tuning question sensitivity + developer psychographic for gstack (v1: observational).",
"routing": "Review which AskUserQuestion prompts fire across gstack skills, set per-question preferences\n(never-ask / always-ask / ask-only-for-one-way), inspect the dual-track\nprofile (what you declared vs what your behavior suggests), and enable/disable\nquestion tuning. Conversational interface — no CLI syntax required.\n\nUse when asked to \"tune questions\", \"stop asking me that\", \"too many questions\",\n\"show my profile\", \"what questions have I been asked\", \"show my vibe\",\n\"developer profile\", or \"turn off question tuning\". \n\nProactively suggest when the user says the same gstack question has come up before,\nor when they explicitly override a recommendation for the Nth time.",
"voice_line": null
},
"qa": {
"lead": "Systematically QA test a web application and fix bugs found.",
"routing": "Runs QA testing,\nthen iteratively fixes bugs in source code, committing each fix atomically and\nre-verifying. Use when asked to \"qa\", \"QA\", \"test this site\", \"find bugs\",\n\"test and fix\", or \"fix what's broken\".\nProactively suggest when the user says a feature is ready for testing\nor asks \"does this work?\". Three tiers: Quick (critical/high only),\nStandard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,\nfix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.",
"voice_line": "Voice triggers (speech-to-text aliases): \"quality check\", \"test the app\", \"run QA\"."
},
"qa-only": {
"lead": "Report-only QA testing.",
"routing": "Systematically tests a web application and produces a\nstructured report with health score, screenshots, and repro steps — but never\nfixes anything. Use when asked to \"just report bugs\", \"qa report only\", or\n\"test but don't fix\". For the full test-fix-verify loop, use /qa instead.\nProactively suggest when the user wants a bug report without any code changes.",
"voice_line": "Voice triggers (speech-to-text aliases): \"bug report\", \"just check for bugs\"."
},
"retro": {
"lead": "Weekly engineering retrospective.",
"routing": "Analyzes commit history, work patterns,\nand code quality metrics with persistent history and trend tracking.\nTeam-aware: breaks down per-person contributions with praise and growth areas.\nUse when asked to \"weekly retro\", \"what did we ship\", or \"engineering retrospective\".\nProactively suggest at the end of a work week or sprint.",
"voice_line": null
},
"review": {
"lead": "Pre-landing PR review.",
"routing": "Analyzes diff against the base branch for SQL safety, LLM trust\nboundary violations, conditional side effects, and other structural issues. Use when\nasked to \"review this PR\", \"code review\", \"pre-landing review\", or \"check my diff\".\nProactively suggest when the user is about to merge or land code changes.",
"voice_line": null
},
"scrape": {
"lead": "Pull data from a web page.",
"routing": "First call on a new intent prototypes the flow\nvia $B primitives and returns JSON. Subsequent calls on a matching intent\nroute to a codified browser-skill and return in ~200ms. Read-only — for\nmutating flows (form fills, clicks, submissions), use /automate.\nUse when asked to \"scrape\", \"get data from\", \"pull\", \"extract from\", or\n\"what's on\" a page.",
"voice_line": null
},
"setup-browser-cookies": {
"lead": "Import cookies from your real Chromium browser into the headless browse session.",
"routing": "Opens an interactive picker UI where you select which cookie domains to import.\nUse before QA testing authenticated pages. Use when asked to \"import cookies\",\n\"login to the site\", or \"authenticate the browser\".",
"voice_line": null
},
"setup-deploy": {
"lead": "Configure deployment settings for /land-and-deploy.",
"routing": "Detects your deploy\nplatform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom),\nproduction URL, health check endpoints, and deploy status commands. Writes\nthe configuration to CLAUDE.md so all future deploys are automatic.\nUse when: \"setup deploy\", \"configure deployment\", \"set up land-and-deploy\",\n\"how do I deploy with gstack\", \"add deploy config\".",
"voice_line": null
},
"setup-gbrain": {
"lead": "Set up gbrain for this coding agent: install the CLI, initialize a local PGLite or Supabase brain, register MCP, capture per-remote trust policy.",
"routing": "One command from zero to \"gbrain is running, and this agent\ncan call it.\" Use when: \"setup gbrain\", \"connect gbrain\", \"start\ngbrain\", \"install gbrain\", \"configure gbrain for this machine\".",
"voice_line": null
},
"ship": {
"lead": "Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR.",
"routing": "Use when asked to \"ship\", \"deploy\",\n\"push to main\", \"create a PR\", \"merge and push\", or \"get it deployed\".\nProactively invoke this skill (do NOT push/PR directly) when the user says code\nis ready, asks about deploying, wants to push code up, or asks to create a PR.",
"voice_line": null
},
"skillify": {
"lead": "Codify the most recent successful /scrape flow into a permanent browser-skill on disk.",
"routing": "Future /scrape calls with the same intent run\nthe codified script in ~200ms instead of re-driving the page. Walks\nback through the conversation, synthesizes script.ts + script.test.ts\n+ fixture, runs the test in a temp dir, and asks before committing.\nUse when asked to \"skillify\", \"codify\", \"save this scrape\", or\n\"make this permanent\".",
"voice_line": null
},
"sync-gbrain": {
"lead": "Keep gbrain current with this repo's code and refresh agent search guidance in CLAUDE.md. Wraps the gstack-gbrain-sync orchestrator with state",
"routing": "probing, native code-surface registration, capability checks,\nand a verdict block. Re-runnable, idempotent. Use when: \"sync gbrain\",\n\"refresh gbrain\", \"re-index this repo\", \"gbrain search isn't finding\nthings\".",
"voice_line": null
},
"unfreeze": {
"lead": "Clear the freeze boundary set by /freeze, allowing edits to all directories again.",
"routing": "Use when you want to widen edit scope without ending the session.\nUse when asked to \"unfreeze\", \"unlock edits\", \"remove freeze\", or\n\"allow all edits\".",
"voice_line": null
}
}
}

View File

@ -891,8 +891,11 @@ If the JSON contains \`"regenerated": true\`:
1. Read \`regenerateAction\` (or \`remixSpec\` for remix requests)
2. Generate new variants with \`$D iterate\` or \`$D variants\` using updated brief
3. Create new board with \`$D compare\`
4. POST the new HTML to the running server via \`curl -X POST http://localhost:PORT/api/reload -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'\`
(parse the port from stderr: look for \`SERVE_STARTED: port=XXXXX\`)
4. POST the new HTML to the running board. Parse the board URL from stderr
(\`BOARD_URL: http://127.0.0.1:N/boards/<id>/\` — the daemon path) or fall
back to the legacy port (\`SERVE_STARTED: port=N\` — only emitted under
\`--no-daemon\`, hits \`/api/reload\` root). Daemon path:
\`curl -X POST "\${BOARD_URL}api/reload" -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'\`
5. Board auto-refreshes in the same tab
If \`"regenerated": false\`: proceed with the approved variant.
@ -919,8 +922,12 @@ This command generates the board HTML, starts an HTTP server on a random port,
and opens it in the user's default browser. **Run it in the background** with \`&\`
because the server needs to stay running while the user interacts with the board.
Parse the port from stderr output: \`SERVE_STARTED: port=XXXXX\`. You need this
for the board URL and for reloading during regeneration cycles.
Parse the board URL from stderr output. Default daemon path:
\`BOARD_URL: http://127.0.0.1:N/boards/<id>/\` (already includes the per-board
path; use this for the AskUserQuestion URL AND as the base for the reload
endpoint). Legacy \`--no-daemon\` path emits \`SERVE_STARTED: port=XXXXX\` and
serves a single board at \`/\`, with reload at \`/api/reload\` — only relevant
when an external caller explicitly passes \`--no-daemon\`.
**PRIMARY WAIT: AskUserQuestion with board URL**
@ -928,11 +935,14 @@ After the board is serving, use AskUserQuestion to wait for the user. Include th
board URL so they can click it if they lost the browser tab:
"I've opened a comparison board with the design variants:
http://127.0.0.1:<PORT>/ — Rate them, leave comments, remix
<BOARD_URL> Rate them, leave comments, remix
elements you like, and click Submit when you're done. Let me know when you've
submitted your feedback (or paste your preferences here). If you clicked
Regenerate or Remix on the board, tell me and I'll generate new variants."
Substitute \`<BOARD_URL>\` with the URL parsed from stderr (the daemon path
emits \`BOARD_URL: http://127.0.0.1:N/boards/<id>/\`).
**Do NOT use AskUserQuestion to ask which variant the user prefers.** The comparison
board IS the chooser. AskUserQuestion is just the blocking wait mechanism.
@ -976,8 +986,13 @@ the approved variant.
2. If \`regenerateAction\` is \`"remix"\`, read \`remixSpec\` (e.g. \`{"layout":"A","colors":"B"}\`)
3. Generate new variants with \`$D iterate\` or \`$D variants\` using updated brief
4. Create new board: \`$D compare --images "..." --output "$_DESIGN_DIR/design-board.html"\`
5. Reload the board in the user's browser (same tab):
\`curl -s -X POST http://127.0.0.1:PORT/api/reload -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'\`
5. Reload the board in the user's browser (same tab) the URL is per-board
under daemon mode, so use \`<BOARD_URL>\` (from the \`BOARD_URL:\` stderr
line) as the base:
\`curl -s -X POST "\${BOARD_URL}api/reload" -H 'Content-Type: application/json' -d '{"html":"$_DESIGN_DIR/design-board.html"}'\`
Under \`--no-daemon\` the reload endpoint is \`/api/reload\` at the legacy
port; this path only matters if the caller explicitly opted out of the
daemon.
6. The board auto-refreshes. **AskUserQuestion again** with the same board URL to
wait for the next round of feedback. Repeat until \`feedback.json\` appears.

View File

@ -1,9 +1,20 @@
/**
* RESOLVERS record maps {{PLACEHOLDER}} names to generator functions.
* RESOLVERS record maps {{PLACEHOLDER}} names to generator functions
* or gated entries.
*
* Each resolver takes a TemplateContext and returns the replacement string.
* Resolvers may be either a bare function (always fires) or a gated entry
* ({ resolve, appliesTo }) where appliesTo can return false to skip the
* resolver for a given skill. See ./types.ts: ResolverEntry.
*
* Most resolvers don't need a gate the {{NAME}} placeholder system is
* already conditional at the template level (the resolver only fires for
* skills that reference it). Use a gate when you want a structural
* guardrail that says "this placeholder is meaningful only in skills X, Y, Z"
* even if someone later adds {{NAME}} to skill W.
*/
import type { TemplateContext, ResolverFn } from './types';
import type { TemplateContext, ResolverFn, ResolverValue } from './types';
// Domain modules
import { generatePreamble } from './preamble';
@ -24,7 +35,7 @@ import { generateQuestionPreferenceCheck, generateQuestionLog, generateInlineTun
import { generateMakePdfSetup } from './make-pdf';
import { generateTasksSectionEmit, generateTasksSectionAggregate } from './tasks-section';
export const RESOLVERS: Record<string, ResolverFn> = {
export const RESOLVERS: Record<string, ResolverValue> = {
SLUG_EVAL: generateSlugEval,
SLUG_SETUP: generateSlugSetup,
COMMAND_REFERENCE: generateCommandReference,

View File

@ -109,10 +109,10 @@ export function generatePreamble(ctx: TemplateContext): string {
...(tier >= 2 ? [
generateContextRecovery(ctx),
generateWritingStyle(ctx),
generateCompletenessSection(),
generateConfusionProtocol(),
generateCompletenessSection(ctx),
generateConfusionProtocol(ctx),
generateContinuousCheckpoint(),
generateContextHealth(),
generateContextHealth(ctx),
generateQuestionTuning(ctx),
] : []),
...(tier >= 3 ? [generateRepoModeSection(), generateSearchBeforeBuildingSection(ctx)] : []),

View File

@ -1,6 +1,7 @@
import type { TemplateContext } from '../types';
export function generateCompletenessSection(): string {
export function generateCompletenessSection(ctx?: TemplateContext): string {
if (ctx?.explainLevel === 'terse') return '';
return `## Completeness Principle — Boil the Lake
AI makes completeness cheap. Recommend complete lakes (tests, edge cases, error paths); flag oceans (rewrites, multi-quarter migrations).

View File

@ -1,4 +1,7 @@
export function generateConfusionProtocol(): string {
import type { TemplateContext } from '../types';
export function generateConfusionProtocol(ctx?: TemplateContext): string {
if (ctx?.explainLevel === 'terse') return '';
return `## Confusion Protocol
For high-stakes ambiguity (architecture, data model, destructive scope, missing context), STOP. Name it in one sentence, present 2-3 options with tradeoffs, and ask. Do not use for routine coding or obvious changes.`;

View File

@ -1,6 +1,7 @@
import type { TemplateContext } from '../types';
export function generateContextHealth(): string {
export function generateContextHealth(ctx?: TemplateContext): string {
if (ctx?.explainLevel === 'terse') return '';
return `## Context Health (soft directive)
During long-running skill sessions, periodically write a brief \`[PROGRESS]\` summary: done, next, surprises.

View File

@ -1,25 +1,24 @@
import * as fs from 'fs';
import * as path from 'path';
import type { TemplateContext } from '../types';
function loadJargonList(): string[] {
const jargonPath = path.join(__dirname, '..', '..', 'jargon-list.json');
try {
const raw = fs.readFileSync(jargonPath, 'utf-8');
const data = JSON.parse(raw);
if (Array.isArray(data?.terms)) return data.terms.filter((t: unknown): t is string => typeof t === 'string');
} catch {
// Missing or malformed: fall back to empty list. Writing Style block still fires,
// but with no terms to gloss — graceful degradation.
/**
* Writing Style preamble section.
*
* v1.45.0.0 changes (T3):
* - Jargon list is referenced by path, not inlined. The 80-term list was
* duplicated into every tier-2+ skill (~1.5-2 KB × 48 skills = ~80 KB
* across the corpus). The pointer asks the agent to Read the JSON on
* first jargon term encountered one extra Read per session, but the
* per-corpus payload is ~30 bytes.
* - When `ctx.explainLevel === 'terse'`, the entire section is replaced
* with a one-line pointer. Saves ~1.5 KB per tier-2+ skill in the
* opt-in terse build.
*/
export function generateWritingStyle(ctx: TemplateContext): string {
if (ctx.explainLevel === 'terse') {
return `## Writing Style\n\nTerse mode (build-time): skip jargon glossing, outcome-framing layer, and decision-impact closers. Lead with the answer.\n`;
}
return [];
}
export function generateWritingStyle(_ctx: TemplateContext): string {
const terms = loadJargonList();
const jargonBlock = terms.length > 0
? `Jargon list, gloss on first use if the term appears:\n${terms.map(t => `- ${t}`).join('\n')}`
: `Jargon list unavailable. Skip jargon glossing until \`scripts/jargon-list.json\` is restored.`;
const jargonPath = `${ctx.paths.skillRoot}/scripts/jargon-list.json`;
return `## Writing Style (skip entirely if \`EXPLAIN_LEVEL: terse\` appears in the preamble echo OR the user's current message explicitly requests terse / no-explanations output)
@ -32,6 +31,6 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
${jargonBlock}
Curated jargon list lives at \`${jargonPath}\` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the \`terms\` array as the canonical list. The list is repo-owned and may grow between releases.
`;
}

View File

@ -62,7 +62,56 @@ export interface TemplateContext {
preambleTier?: number; // 1-4, controls which preamble sections are included
model?: Model; // model family for behavioral overlay. Omitted/undefined → no overlay.
interactive?: boolean; // true → emit plan-mode handshake in preamble. Generator-only, not written to SKILL.md.
/**
* Build-time compression mode. Defaults to 'default'.
*
* - 'default': full preamble prose ships as today (writing style, completeness,
* confusion protocol, context health are all present).
* - 'terse': writing-style + completeness + confusion-protocol + context-health
* sections are compressed to a one-line pointer at gen time. Saves ~3-5 KB
* per tier-2+ skill. Opt-in via `--explain-level=terse` build flag for
* users who want shipped skills to match their runtime preference and
* avoid the per-session terse-mode prose.
*
* Default builds keep the runtime-conditional behavior intact (Writing Style
* section says "skip entirely if EXPLAIN_LEVEL: terse appears in preamble echo").
* Terse builds make the compression structural bytes never ship in the first place.
*/
explainLevel?: 'default' | 'terse';
}
/** Resolver function signature. args is populated for parameterized placeholders like {{INVOKE_SKILL:name}}. */
export type ResolverFn = (ctx: TemplateContext, args?: string[]) => string;
/**
* Optional gated resolver. When the gate returns false, the resolver is
* skipped (substituted with empty string) same effect as the placeholder
* not being referenced. Use when a resolver's output is only meaningful for
* a known subset of skills, so future template authors get a structural
* guardrail instead of relying on social knowledge.
*
* Most resolvers don't need this the {{NAME}} placeholder system is
* already conditional at the template level. Use only when a resolver
* lives inside another resolver (e.g. via preamble composition) AND must
* be conditionalized, or when a top-level resolver has a small, well-defined
* audience.
*/
export interface ResolverEntry {
resolve: ResolverFn;
appliesTo?: (ctx: TemplateContext) => boolean;
}
/** Anything the RESOLVERS map accepts — either a bare function or a gated entry. */
export type ResolverValue = ResolverFn | ResolverEntry;
/**
* Type-narrowing helper for the gen-skill-docs lookup.
* Returns (resolverFn, gate) so callers can do gate?.(ctx) before invoking.
*/
export function unwrapResolver(entry: ResolverValue): {
resolve: ResolverFn;
appliesTo?: (ctx: TemplateContext) => boolean;
} {
if (typeof entry === 'function') return { resolve: entry };
return { resolve: entry.resolve, appliesTo: entry.appliesTo };
}

View File

@ -2,11 +2,7 @@
name: setup-browser-cookies
preamble-tier: 1
version: 1.0.0
description: |
Import cookies from your real Chromium browser into the headless browse session.
Opens an interactive picker UI where you select which cookie domains to import.
Use before QA testing authenticated pages. Use when asked to "import cookies",
"login to the site", or "authenticate the browser". (gstack)
description: Import cookies from your real Chromium browser into the headless browse session. (gstack)
triggers:
- import browser cookies
- login to test site
@ -19,6 +15,13 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Opens an interactive picker UI where you select which cookie domains to import.
Use before QA testing authenticated pages. Use when asked to "import cookies",
"login to the site", or "authenticate the browser".
## Preamble (run first)
```bash

View File

@ -2,13 +2,7 @@
name: setup-deploy
preamble-tier: 2
version: 1.0.0
description: |
Configure deployment settings for /land-and-deploy. Detects your deploy
platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom),
production URL, health check endpoints, and deploy status commands. Writes
the configuration to CLAUDE.md so all future deploys are automatic.
Use when: "setup deploy", "configure deployment", "set up land-and-deploy",
"how do I deploy with gstack", "add deploy config".
description: Configure deployment settings for /land-and-deploy.
triggers:
- configure deploy
- setup deployment
@ -25,6 +19,16 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Detects your deploy
platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom),
production URL, health check endpoints, and deploy status commands. Writes
the configuration to CLAUDE.md so all future deploys are automatic.
Use when: "setup deploy", "configure deployment", "set up land-and-deploy",
"how do I deploy with gstack", "add deploy config".
## Preamble (run first)
```bash
@ -565,84 +569,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,12 +2,7 @@
name: setup-gbrain
preamble-tier: 2
version: 1.0.0
description: |
Set up gbrain for this coding agent: install the CLI, initialize a
local PGLite or Supabase brain, register MCP, capture per-remote trust
policy. One command from zero to "gbrain is running, and this agent
can call it." Use when: "setup gbrain", "connect gbrain", "start
gbrain", "install gbrain", "configure gbrain for this machine". (gstack)
description: Set up gbrain for this coding agent: install the CLI, initialize a local PGLite or Supabase brain, register MCP, capture per-remote trust policy. (gstack)
triggers:
- setup gbrain
- install gbrain
@ -26,6 +21,13 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
One command from zero to "gbrain is running, and this agent
can call it." Use when: "setup gbrain", "connect gbrain", "start
gbrain", "install gbrain", "configure gbrain for this machine".
## Preamble (run first)
```bash
@ -566,84 +568,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,12 +2,7 @@
name: ship
preamble-tier: 4
version: 1.0.0
description: |
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION,
update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy",
"push to main", "create a PR", "merge and push", or "get it deployed".
Proactively invoke this skill (do NOT push/PR directly) when the user says code
is ready, asks about deploying, wants to push code up, or asks to create a PR. (gstack)
description: Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. (gstack)
allowed-tools:
- Bash
- Read
@ -27,6 +22,14 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Use when asked to "ship", "deploy",
"push to main", "create a PR", "merge and push", or "get it deployed".
Proactively invoke this skill (do NOT push/PR directly) when the user says code
is ready, asks about deploying, wants to push code up, or asks to create a PR.
## Preamble (run first)
```bash
@ -567,84 +570,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,14 +1,7 @@
---
name: skillify
version: 1.0.0
description: |
Codify the most recent successful /scrape flow into a permanent
browser-skill on disk. Future /scrape calls with the same intent run
the codified script in ~200ms instead of re-driving the page. Walks
back through the conversation, synthesizes script.ts + script.test.ts
+ fixture, runs the test in a temp dir, and asks before committing.
Use when asked to "skillify", "codify", "save this scrape", or
"make this permanent". (gstack)
description: Codify the most recent successful /scrape flow into a permanent browser-skill on disk. (gstack)
allowed-tools:
- Bash
- Read
@ -23,6 +16,16 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Future /scrape calls with the same intent run
the codified script in ~200ms instead of re-driving the page. Walks
back through the conversation, synthesizes script.ts + script.test.ts
+ fixture, runs the test in a temp dir, and asks before committing.
Use when asked to "skillify", "codify", "save this scrape", or
"make this permanent".
## Preamble (run first)
```bash
@ -563,84 +566,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -1,18 +1,7 @@
---
name: spec
version: 0.1.0
description: |
Turn vague intent into a precise, executable spec in five phases. Pipe the spec
into a spawned Claude Code agent with `--execute`, dedupe against existing issues
with `--dedupe`, or hand off to GitHub. Every spec passes a codex quality gate
before file. Interrogates the user in strict phases — why, scope, technical,
draft, final — and refuses to produce an issue until ambiguity is gone. Use
after /office-hours has settled the shape of an idea, or any time the user
describes work that's not yet backlog-ready.
Use when asked to "spec this out", "file an issue", "write up a ticket", "make
this a GitHub issue", or "turn this into a backlog item". e.g., type `/spec` on
a vague bug → 5-phase interrogation → filed issue → spawned agent in ~4 minutes.
(gstack)
description: Turn vague intent into a precise, executable spec in five phases. (gstack)
allowed-tools:
- Bash
- Read
@ -30,6 +19,14 @@ triggers:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
Files the issue,
optionally spawns a Claude Code agent in a fresh worktree, and lets /ship close
the source issue on merge. Use when asked to "spec this out", "file an issue",
"write up a ticket", "make this a GitHub issue", or "turn this into a backlog item".
## Preamble (run first)
```bash
@ -570,84 +567,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake
@ -1554,84 +1474,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -2,16 +2,10 @@
name: spec
version: 0.1.0
description: |
Turn vague intent into a precise, executable spec in five phases. Pipe the spec
into a spawned Claude Code agent with `--execute`, dedupe against existing issues
with `--dedupe`, or hand off to GitHub. Every spec passes a codex quality gate
before file. Interrogates the user in strict phases — why, scope, technical,
draft, final — and refuses to produce an issue until ambiguity is gone. Use
after /office-hours has settled the shape of an idea, or any time the user
describes work that's not yet backlog-ready.
Use when asked to "spec this out", "file an issue", "write up a ticket", "make
this a GitHub issue", or "turn this into a backlog item". e.g., type `/spec` on
a vague bug → 5-phase interrogation → filed issue → spawned agent in ~4 minutes.
Turn vague intent into a precise, executable spec in five phases. Files the issue,
optionally spawns a Claude Code agent in a fresh worktree, and lets /ship close
the source issue on merge. Use when asked to "spec this out", "file an issue",
"write up a ticket", "make this a GitHub issue", or "turn this into a backlog item".
(gstack)
allowed-tools:
- Bash

View File

@ -2,13 +2,7 @@
name: sync-gbrain
preamble-tier: 2
version: 1.0.0
description: |
Keep gbrain current with this repo's code and refresh agent search
guidance in CLAUDE.md. Wraps the gstack-gbrain-sync orchestrator with
state probing, native code-surface registration, capability checks,
and a verdict block. Re-runnable, idempotent. Use when: "sync gbrain",
"refresh gbrain", "re-index this repo", "gbrain search isn't finding
things". (gstack)
description: Keep gbrain current with this repo's code and refresh agent search guidance in CLAUDE.md. Wraps the gstack-gbrain-sync orchestrator with state (gstack)
triggers:
- sync gbrain
- refresh gbrain
@ -26,6 +20,14 @@ allowed-tools:
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## When to invoke this skill
probing, native code-surface registration, capability checks,
and a verdict block. Re-runnable, idempotent. Use when: "sync gbrain",
"refresh gbrain", "re-index this repo", "gbrain search isn't finding
things".
## Preamble (run first)
```bash
@ -566,84 +568,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
Jargon list, gloss on first use if the term appears:
- idempotent
- idempotency
- race condition
- deadlock
- cyclomatic complexity
- N+1
- N+1 query
- backpressure
- memoization
- eventual consistency
- CAP theorem
- CORS
- CSRF
- XSS
- SQL injection
- prompt injection
- DDoS
- rate limit
- throttle
- circuit breaker
- load balancer
- reverse proxy
- SSR
- CSR
- hydration
- tree-shaking
- bundle splitting
- code splitting
- hot reload
- tombstone
- soft delete
- cascade delete
- foreign key
- composite index
- covering index
- OLTP
- OLAP
- sharding
- replication lag
- quorum
- two-phase commit
- saga
- outbox pattern
- inbox pattern
- optimistic locking
- pessimistic locking
- thundering herd
- cache stampede
- bloom filter
- consistent hashing
- virtual DOM
- reconciliation
- closure
- hoisting
- tail call
- GIL
- zero-copy
- mmap
- cold start
- warm start
- green-blue deploy
- canary deploy
- feature flag
- kill switch
- dead letter queue
- fan-out
- fan-in
- debounce
- throttle (UI)
- hydration mismatch
- memory leak
- GC pause
- heap fragmentation
- stack overflow
- null pointer
- dangling pointer
- buffer overflow
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
## Completeness Principle — Boil the Lake

View File

@ -0,0 +1,66 @@
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
// Static invariants guarding Windows artifact-sync (bin/gstack-brain-sync).
//
// These are deliberately static, not behavioral. The brain-sync integration
// suite (test/brain-sync.test.ts) spawns the bin/ scripts directly, which
// Node/Bun cannot exec on Windows (they are bash-shebang scripts), so that
// suite is excluded from the Windows CI lane. Instead we assert the source
// keeps the properties that make `--discover-new` and the `--once` drain work
// on Windows. Each maps to a confirmed, separately-reproduced failure:
//
// 1. os.path.relpath yields BACKSLASH separators on Windows, which never
// match the forward-slash allowlist globs (e.g. "projects/*/learnings.jsonl"),
// so nested artifacts were silently never discovered.
// 2. discover-new enqueued via subprocess.run([bash-shim]); Windows Python
// cannot exec a shebang script, so it enqueued nothing even once matched.
// 3. compute_paths_to_stage's python print() emits CRLF on Windows; the bash
// `read -r` keeps the trailing \r, so `git add -- "path\r"` matches
// nothing and the drain silently stages/commits nothing.
//
// Plus two robustness properties (independent codex review, both [P2]):
// 4. the inline enqueue must append one atomic record at a time (O_APPEND),
// or a concurrent writer-shim append can interleave mid-record and produce
// a malformed queue line that the drain silently drops.
// 5. the skip-list must be normalized to the same separator form as `rel`,
// or a backslash entry in .brain-skip.txt stops matching and a file the
// user explicitly skipped gets synced.
const ROOT = path.resolve(import.meta.dir, '..');
const SRC = fs.readFileSync(path.join(ROOT, 'bin', 'gstack-brain-sync'), 'utf-8');
describe('gstack-brain-sync — Windows path/exec invariants', () => {
test('discover-new normalizes relpath separators before fnmatch (bug 1)', () => {
expect(SRC).toContain('os.path.relpath(full, gstack_home).replace(os.sep, "/")');
});
test('no python subprocess exec — Windows cannot exec the bash shims (bug 2)', () => {
// The whole script must never shell out to a bin/ bash script from Python;
// that is the exec failure that left discover enqueuing nothing on Windows.
expect(SRC).not.toContain('subprocess');
});
test('drain loop strips trailing CR before git add (bug 3)', () => {
const CR_STRIP = "p=\"${p%$'\\r'}\"";
expect(SRC).toContain(CR_STRIP);
// The strip must precede the staging call, or the pathspec still carries \r.
expect(SRC.indexOf(CR_STRIP)).toBeLessThan(SRC.indexOf('add -f -- "$p"'));
});
test('inline enqueue appends one atomic record at a time (codex P2 #1)', () => {
expect(SRC).toContain('os.O_APPEND');
expect(SRC).toContain('os.write(fd');
// No buffered batch write to the queue (the interleave-corruption shape).
expect(SRC).not.toContain('open(queue_path, "a"');
});
test('skip-list is normalized on BOTH discover and drain sides (codex P2 #2)', () => {
// The drain (compute_paths_to_stage) is the real staging boundary, so it
// must normalize skip entries identically to discover_new — otherwise a
// backslash .brain-skip.txt entry is honored at discovery but bypassed at
// commit, syncing a file the user explicitly skipped.
const NORM = 's.replace(os.sep, "/") for s in load_lines(skip_path)';
expect(SRC.split(NORM).length - 1).toBeGreaterThanOrEqual(2);
});
});

View File

@ -0,0 +1,118 @@
/**
* Gap B (v1.46.0.0): --catalog-mode=full opt-out behavior.
*
* The catalog trim is the default. The opt-out (`--catalog-mode=full`)
* preserves v1.44 multi-line frontmatter descriptions for users / hosts
* that depend on the legacy fat catalog. Without this test, someone could
* break the conditional `if (host === 'claude' && CATALOG_MODE === 'trim')`
* and silently turn the opt-out path into a no-op users with the flag
* still get trim'd output, the v1.44 behavior is gone.
*
* Two layers:
* 1. Static: the CATALOG_MODE flag is wired into gen-skill-docs.ts and
* the conditional gate is in the pipeline.
* 2. Smoke: running with --catalog-mode=full produces a frontmatter
* `description: |` block (multi-line) instead of the trim'd one-line
* `description: ...(gstack)` form.
*
* The smoke test mutates the working tree mid-run. It restores the default
* trim'd state in a finally block so a crash mid-test still leaves a clean
* working tree.
*/
import { describe, test, expect } from 'bun:test';
import { spawnSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
const REPO_ROOT = path.resolve(import.meta.dir, '..');
const GEN_SKILL_DOCS = path.join(REPO_ROOT, 'scripts', 'gen-skill-docs.ts');
const SHIP_SKILL = path.join(REPO_ROOT, 'ship', 'SKILL.md');
describe('--catalog-mode=full opt-out wiring (static)', () => {
test('CATALOG_MODE_ARG parsing is wired into gen-skill-docs.ts', () => {
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
expect(src).toContain('CATALOG_MODE_ARG');
expect(src).toContain("a.startsWith('--catalog-mode')");
});
test('CATALOG_MODE accepts only "trim" or "full" — anything else throws', () => {
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
expect(src).toMatch(/val !== 'trim' && val !== 'full'/);
expect(src).toContain('Unknown catalog mode');
});
test('catalog trim only fires when CATALOG_MODE === "trim"', () => {
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
// The applyCatalogTrim call is gated by both host and CATALOG_MODE checks.
expect(src).toMatch(/CATALOG_MODE === 'trim'/);
expect(src).toContain('applyCatalogTrim(content, skillName)');
});
test('default CATALOG_MODE is "trim" (opt-out, not opt-in)', () => {
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
// The const initializer falls back to 'trim' when --catalog-mode is unset.
expect(src).toMatch(/if \(!CATALOG_MODE_ARG\) return 'trim'/);
});
});
describe('--catalog-mode=full opt-out behavior (smoke)', () => {
test('--catalog-mode=full produces multi-line description in frontmatter', () => {
// Save the trim'd state so we can restore it.
const trimmedShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
expect(trimmedShip).toMatch(/^description: Ship workflow:[^\n]*\(gstack\)\n/m);
try {
// Run with --catalog-mode=full. Mutates working tree.
const result = spawnSync('bun', ['run', 'gen:skill-docs', '--catalog-mode=full'], {
cwd: REPO_ROOT,
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 60_000,
});
expect(result.status).toBe(0);
// After --catalog-mode=full, frontmatter description is the legacy
// multi-line block, not the trim'd one-line form.
const fullShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
expect(fullShip).toMatch(/^description: \|\s*$/m); // YAML block scalar
// Legacy multi-line content includes "Use when asked to..." in the
// frontmatter (in trim mode this lives in the body section).
const fmEnd = fullShip.indexOf('\n---', 4);
const fm = fullShip.slice(0, fmEnd);
expect(fm).toMatch(/Use when asked to/i);
// "When to invoke" body section should NOT be present in full mode
// (because the routing prose stayed in frontmatter).
const body = fullShip.slice(fmEnd);
expect(body).not.toContain('## When to invoke this skill');
} finally {
// Restore default trim state regardless of test outcome.
const restore = spawnSync('bun', ['run', 'gen:skill-docs'], {
cwd: REPO_ROOT,
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 60_000,
});
if (restore.status !== 0) {
// eslint-disable-next-line no-console
console.error(
'CRITICAL: failed to restore default trim state. Run `bun run gen:skill-docs` to clean up.',
);
}
// Sanity-check the restored state matches what we saw at the start.
const restoredShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
expect(restoredShip).toMatch(/^description: Ship workflow:[^\n]*\(gstack\)\n/m);
}
}, 180_000);
test('--catalog-mode=invalid throws a clear error', () => {
const result = spawnSync('bun', ['run', 'gen:skill-docs', '--catalog-mode=invalid'], {
cwd: REPO_ROOT,
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 30_000,
});
expect(result.status).not.toBe(0);
const stderr = result.stderr?.toString() ?? '';
expect(stderr).toMatch(/Unknown catalog mode/);
expect(stderr).toMatch(/invalid/);
});
});

Some files were not shown because too many files have changed in this diff Show More