gstack/test/helpers
Garry Tan f8bb59094d
v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733)
* feat(issue): add /issue skill for backlog-ready GitHub issue authoring

Interrogates an ambiguous request through five strict phases (why, scope,
technical, draft, final) and produces a GitHub issue precise enough that an
unfamiliar engineer or AI agent can execute it without follow-up. Slots in
after /office-hours (when the idea has passed the "worth building" bar) and
before /plan-eng-review (which assumes a plan already exists).

- issue/SKILL.md.tmpl + generated SKILL.md
- routing entry in root SKILL.md.tmpl
- llms.txt regenerated to include the new skill

* chore(spec): rename /issue → /spec + fix duplicate analytics block

Foundation commit for the /spec skill (extends PR #1698 by @jayzalowitz).

- Renames issue/ → spec/ (template + generated)
- Removes the hand-rolled analytics block in spec/SKILL.md.tmpl (lines 46-49 of the original); {{PREAMBLE}} already emits the analytics write with the telemetry opt-out guard, so the duplicate would have bypassed gstack-config set telemetry off
- Updates frontmatter (name: spec, expanded description with magical-moment preview, triggers reordered to lead with "spec this out")
- Updates root SKILL.md.tmpl routing entry → /spec
- Regenerates spec/SKILL.md and gstack/llms.txt via bun run gen:skill-docs

Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(spec): expansions — flags, archive, quality gate, plan-mode-aware Phase 5, /ship integration, tests

Builds on the @jayzalowitz foundation (commit a4e6ee38) with the full
expansion set from CEO + Eng + DX review (24 user decisions + 23 of 28
codex adversarial findings).

spec/SKILL.md.tmpl additions:
- Flag reference table (--dedupe / --no-gate / --audit / --execute /
  --no-execute / --file-only / --plan-file / --sync-archive).
- Phase 1b --dedupe (default ON): gh issue list --search with graceful
  skip on gh-not-installed / unauthed / rate-limited / other errors.
  AskUserQuestion when matches found (merge / file-new / cancel).
- Phase 3 HARD requirement: agent MUST grep/read at least one piece of
  evidence before asking. Project-level fallback prose for prompts with
  no concrete file mapping. Greenfield escape clause.
- Phase 4.5 quality gate (default ON): codex adversarial dispatch with
  fail-closed redaction (AWS/GitHub/Anthropic/OpenAI/private-key regex),
  hard <<<USER_SPEC>>> delimiters + instruction boundary (prompt-injection
  defense), score 0-10 with <7 block, up to 3 iterations, AskUserQuestion
  escape on persistent <7 (ship anyway / save draft / one more try).
- Phase 5 plan-mode-aware dispatch: reads GSTACK_PLAN_MODE env. Active
  → file-only + load into plan file. Inactive → file + --execute spawn
  by default. CLI overrides for explicit control.
- Archive block via eval $(gstack-paths) → $GSTACK_STATE_ROOT/projects/
  $SLUG/specs/<datetime>-<pid>-<slug>.md. Atomic .tmp/mv write. Sync
  excluded by default; --sync-archive to opt in.
- --execute path: dirty-worktree gate (porcelain check + 3-option AUQ
  continue/stash/cancel), TOCTOU re-check after AUQ answer, SHA pin
  via git rev-parse HEAD, unique branch spec/<slug>-$$ + PID-suffixed
  worktree, mandatory final-confirm gate, stash policy with restore
  safety (preserve ref, never auto-drop).
- TTHW timestamps captured at Phase 1 / first citation / file-or-spawn,
  emitted as ttfc_ms + tthw_ms in preamble telemetry envelope.

Cross-system plumbing:
- scripts/resolvers/preamble/generate-preamble-bash.ts: emit
  GSTACK_PLAN_MODE=active|inactive based on CLAUDE_PLAN_FILE presence.
- scripts/resolvers/preamble/generate-routing-injection.ts: add /spec
  to the routing block injected into project CLAUDE.md.
- ship/SKILL.md.tmpl: new "Linked Spec" PR-body section. Reads archive
  frontmatter spec_issue_number and adds Closes #N when full delivery
  confirmed by existing plan-completion gate (codex F4 — conditional).
  Branch-name inference NOT used (codex F3 — fragile under rebase).

Tests (W7):
- test/spec-template-invariants.test.ts: 35 deterministic assertions
  covering Phase 1 hard gate, Phase 3 hard-grep mandate, --dedupe
  graceful-skip paths, --execute race + security hardening (TOCTOU,
  SHA pin, unique branch), quality-gate redaction + BLOCKED path,
  archive atomic write + sync exclusion, plan-mode-aware Phase 5.
- test/spec-template-sync.test.ts: regen + byte-identical check.
- test/skill-e2e-spec-execute.test.ts (periodic-tier scaffold).
- test/skill-llm-eval-spec.test.ts (periodic-tier scaffold).
- test/helpers/touchfiles.ts: register both periodics in E2E_TIERS +
  LLM_JUDGE_TOUCHFILES.

37/37 /spec tests pass. Full bun test exit 0 (pre-existing
url-validation timeout unrelated to /spec).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: v1.45.0.0 — regen all SKILL.md, bump VERSION, CHANGELOG entry

Mechanical regen pulling in two template-side changes:
- /spec expansion (spec/SKILL.md picks up ~1100 new lines)
- {{PREAMBLE}} now echoes GSTACK_PLAN_MODE env (every skill picks up
  the new echo line in the preamble bash block)

VERSION 1.44.0.0 → 1.45.0.0 (MINOR per scale-aware rules: substantial
new capability — /spec skill with 5 CLI flags + race/security
hardening + plan-mode-aware Phase 5 + /ship integration).

CHANGELOG entry frames /spec as agent feedstock with the two-line
headline, "numbers that matter" table, and "what this means for
builders" close. Credits @jayzalowitz for the foundation contribution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): register /spec in scripts/proactive-suggestions.json

Auto-generated by bun run gen:skill-docs after the v1.46 catalog-trim
contract picked up /spec's frontmatter. lead + routing extracted from
spec/SKILL.md.tmpl description: block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): TODOS deferrals + package.json sync for v1.47.0.0

- TODOS.md: add P2 entry for /spec --epic mode (deferred from CEO SCOPE
  EXPANSION review), P3 entry for --dedupe semantic matching upgrade.
  Both have full context blocks so future picker can resume cold.
- package.json: bump 1.46.0.0 → 1.47.0.0 to match VERSION (was stale
  from the main merge; /ship Step 12 idempotency caught it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: register /spec skill in README, AGENTS, CLAUDE.md project tree

Adds /spec to the three discoverability surfaces it was missing:
- README.md sprint skills table (between /autoplan and /learn)
- AGENTS.md plan-mode reviews table
- CLAUDE.md project structure tree (between /investigate and /retro)

/spec shipped in v1.47.0.0 with CHANGELOG coverage but the entry-point
docs hadn't been updated; a user landing on README or AGENTS would not
discover the skill exists without reading CHANGELOG.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:36:53 -07:00
..
providers v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391) 2026-05-09 08:06:47 -07:00
agent-sdk-runner.ts v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252) 2026-05-01 07:21:28 -07:00
benchmark-judge.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
benchmark-runner.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
budget-override.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
budget-override.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
capture-parity-baseline.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
capture-parity-baseline.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
claude-pty-runner.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
claude-pty-runner.unit.test.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
codex-session-runner.ts fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391) 2026-03-23 08:44:08 -07:00
e2e-helpers.ts v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534) 2026-05-16 12:32:33 -07:00
eval-store.test.ts feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83) 2026-03-15 23:55:39 -05:00
eval-store.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
gemini-session-runner.test.ts feat: Gemini CLI E2E tests (v0.9.2.0) (#252) 2026-03-20 08:30:09 -07:00
gemini-session-runner.ts feat: Gemini CLI E2E tests (v0.9.2.0) (#252) 2026-03-20 08:30:09 -07:00
llm-judge.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
observability.test.ts fix: never clean up observability artifacts — partial file persists after finalize 2026-03-14 12:37:38 -05:00
parity-harness.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
pricing.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
secret-sink-harness.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
session-runner.test.ts feat: stream-json NDJSON parser for real-time E2E progress 2026-03-14 03:49:36 -05:00
session-runner.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-parser.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
tool-map.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
touchfiles.ts v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733) 2026-05-26 21:36:53 -07:00