gstack/test
Garry Tan 64bbbb2198
fix: plan-design-review-audit eval — bump turns to 30, add efficiency hints
The test was flaky at 20 turns because the agent reads a 300-line SKILL.md,
navigates, extracts design data, and writes a report. Added hints to skip
preamble/batch commands/write early while still testing the real SKILL.md.
Now completes in ~13 turns consistently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 14:41:13 -07:00
..
fixtures feat: contributor mode, session awareness, recommendation format (#90) 2026-03-16 01:45:50 -05:00
helpers chore: bump version and changelog (v0.6.1.0) 2026-03-17 11:30:23 -07:00
gen-skill-docs.test.ts feat: SELECTIVE EXPANSION + smarter ship gates (v0.5.3) (#134) 2026-03-17 12:22:10 -05:00
skill-e2e.test.ts fix: plan-design-review-audit eval — bump turns to 30, add efficiency hints 2026-03-17 14:41:13 -07:00
skill-llm-eval.test.ts feat: diff-based test selection for E2E and LLM-judge evals 2026-03-17 11:28:03 -07:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-validation.test.ts feat: Test Bootstrap + Regression Tests + Coverage Audit (v0.6.0) (#136) 2026-03-17 13:05:18 -05:00
touchfiles.test.ts chore: bump version and changelog (v0.6.1.0) 2026-03-17 11:30:23 -07:00