gstack/test
Garry Tan bc8cab2b5b
feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection
Two new E2E tests:
- review-pre-existing-bug: plants SQL injection in base branch, verifies
  Step 5.7 classifies as INFORMATIONAL and recommends /debug
- ship-reverted-qa-commits: creates branch with reverted fix(qa): commits,
  verifies /ship detects them and recommends /debug

Also fixes qa-debug-prompt-logic to use correct workingDirectory, and
ensures test repo init uses -b main for portability.

All 4 debug-related evals pass: $0.34 total, 94s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 14:49:57 -07:00
..
fixtures feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
helpers feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection 2026-03-18 14:49:57 -07:00
gen-skill-docs.test.ts feat: interactive /plan-design-review + CEO invokes designer + 100% coverage (v0.6.4) (#149) 2026-03-17 22:48:48 -05:00
skill-e2e.test.ts feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection 2026-03-18 14:49:57 -07:00
skill-llm-eval.test.ts feat: add debug escalation tests (validation + LLM judge + E2E) 2026-03-18 11:13:12 -07:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-validation.test.ts feat: add debug escalation tests (validation + LLM judge + E2E) 2026-03-18 11:13:12 -07:00
touchfiles.test.ts feat: add debug escalation tests (validation + LLM judge + E2E) 2026-03-18 11:13:12 -07:00