gstack/test/fixtures
Garry Tan 33b016712a
test(brain): fake-CLI agent-obedience E2E for /office-hours writeback
test/skill-e2e-office-hours-brain-writeback.test.ts (~210 LOC,
periodic-tier, ~$0.50-1/run):

Drives /office-hours via runSkillTest against a deterministic fixture
brief (pixel.fund founder pitch). The workdir has:
  - A regenerated office-hours/SKILL.md with the compressed brain blocks
    (generated via gen-skill-docs --respect-detection against a temp
    GSTACK_HOME, then restored to canonical post-snapshot)
  - A fake gbrain shell script on PATH that uses printf %q quoting to
    preserve --content "$(cat <<'EOF' ... EOF)" heredoc payloads
    intact (naive `echo "$@"` would lose argv boundaries)
  - The docs/gbrain-write-surfaces.md the resolver points to

Asserts:
  - gbrain-calls.log contains `gbrain put office-hours/pixel-fund`
  - Payload file at gbrain-payloads/office-hours/pixel-fund.md exists
    with valid YAML frontmatter (title: + tags: + design-doc tag)
  - At least one gbrain put entities/<name> call (entity stub
    enrichment is best-effort, soft warning if absent)

Covers agent obedience to the SAVE_RESULTS instruction. Out of scope:
gbrain CLI persistence contract (T11 covers that with real PGLite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:34:33 -07:00
..
golden v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740) 2026-05-26 23:43:07 -07:00
ios-qa/FixtureApp v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574) 2026-05-21 16:09:26 -07:00
mode-posture feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065) 2026-04-19 05:44:39 +08:00
office-hours-brain-writeback test(brain): fake-CLI agent-obedience E2E for /office-hours writeback 2026-05-27 08:34:33 -07:00
plans v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
coverage-audit-fixture.ts feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259) 2026-03-22 11:28:16 -07:00
eval-baselines.json fix: rewrite session-runner to claude -p subprocess, lower flaky baselines 2026-03-14 02:34:10 -05:00
forcing-finding-seeds.ts v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740) 2026-05-26 23:43:07 -07:00
golden-ship-claude.md fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847) 2026-04-06 00:47:04 -07:00
overlay-nudges.ts feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166) 2026-04-23 18:42:58 -07:00
parity-baseline-v1.44.1.json v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
parity-baseline-v1.46.0.0.json v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
parity-baseline-v1.47.0.0.json v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740) 2026-05-26 23:43:07 -07:00
qa-eval-checkout-ground-truth.json fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
qa-eval-ground-truth.json fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
qa-eval-spa-ground-truth.json fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
review-army-migration.sql feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
review-army-n-plus-one.rb feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
review-eval-design-slop.css feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
review-eval-design-slop.html feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
review-eval-enum-diff.rb feat: contributor mode, session awareness, recommendation format (#90) 2026-03-16 01:45:50 -05:00
review-eval-enum.rb feat: contributor mode, session awareness, recommendation format (#90) 2026-03-16 01:45:50 -05:00
review-eval-vuln.rb feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) 2026-03-14 01:17:36 -05:00