gstack/test/fixtures
Garry Tan c35e933c7d
fix: rewrite session-runner to claude -p subprocess, lower flaky baselines
Session runner now spawns `claude -p` as a subprocess instead of using
Agent SDK query(), which fixes E2E tests hanging inside Claude Code.
Also lowers command_reference completeness baseline to 3 (flaky oscillation),
adds test:e2e script, and updates CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 02:34:10 -05:00
..
eval-baselines.json fix: rewrite session-runner to claude -p subprocess, lower flaky baselines 2026-03-14 02:34:10 -05:00
qa-eval-checkout-ground-truth.json feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) 2026-03-14 01:17:36 -05:00
qa-eval-ground-truth.json feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) 2026-03-14 01:17:36 -05:00
qa-eval-spa-ground-truth.json feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) 2026-03-14 01:17:36 -05:00
review-eval-vuln.rb feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) 2026-03-14 01:17:36 -05:00