gstack/test/fixtures
Garry Tan 3aee5a7476
test: gate-tier AskUserQuestion floor tests for all plan-* review skills
Adds 4 finding-floor tests (one per plan-* skill) that catch the May
2026 transcript-bug class — model wrote a plan and called ExitPlanMode
without firing any review-phase AskUserQuestion. Asserts via
runPlanSkillFloorCheck that ANY non-permission AUQ render fires before
the agent reaches plan_ready.

Verified:
- Eng floor: passed in 59s
- CEO floor: passed in 197s
- Design floor: passed
- Devex floor: passed
- Total ~$2-6 per CI run; only triggers on diff against the 4 plan-*
  templates, the shared resolver review.ts, the seeds fixture, or the
  PTY runner helper.

Fixtures live in test/fixtures/forcing-finding-seeds.ts, one constant
per skill. Each seed is engineered to force at least one obvious
finding under that skill's review focus (architectural smell for eng,
scope-creep for ceo, UI-slop for design, painful onboarding for devex).

Touchfiles wiring:
- E2E_TOUCHFILES: 4 plan-*-finding-floor entries with deps on the
  matching skill template, the shared resolver, the seeds fixture,
  and the PTY runner helper
- E2E_TIERS: all 4 entries marked 'gate'
- touchfiles.test.ts: count assertion bumped 21→22 with explicit
  plan-ceo-finding-floor containment check

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:48:23 -07:00
..
golden v1.26.3.0 feat: /sync-gbrain skill + native code-surface orchestrator (#1314) 2026-05-04 09:29:48 -07:00
mode-posture feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065) 2026-04-19 05:44:39 +08:00
plans v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
coverage-audit-fixture.ts feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259) 2026-03-22 11:28:16 -07:00
eval-baselines.json fix: rewrite session-runner to claude -p subprocess, lower flaky baselines 2026-03-14 02:34:10 -05:00
forcing-finding-seeds.ts test: gate-tier AskUserQuestion floor tests for all plan-* review skills 2026-05-06 19:48:23 -07:00
golden-ship-claude.md fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847) 2026-04-06 00:47:04 -07:00
overlay-nudges.ts feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166) 2026-04-23 18:42:58 -07:00
qa-eval-checkout-ground-truth.json fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
qa-eval-ground-truth.json fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
qa-eval-spa-ground-truth.json fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
review-army-migration.sql feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
review-army-n-plus-one.rb feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
review-eval-design-slop.css feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
review-eval-design-slop.html feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
review-eval-enum-diff.rb feat: contributor mode, session awareness, recommendation format (#90) 2026-03-16 01:45:50 -05:00
review-eval-enum.rb feat: contributor mode, session awareness, recommendation format (#90) 2026-03-16 01:45:50 -05:00
review-eval-vuln.rb feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) 2026-03-14 01:17:36 -05:00