gstack/test
Garry Tan 3492f98e82
docs: add E2E eval failure blame protocol
"Not related to our changes" is an extraordinary claim that requires
extraordinary proof. When evals fail during /ship:

1. Run the same eval on main — prove it fails there too
2. If it passes on main, it IS your change — trace the blame
3. If you can't verify, say "unverified" not "pre-existing"

Added to CLAUDE.md and as a comment in skill-e2e.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 11:18:27 -05:00
..
fixtures feat: contributor mode, session awareness, recommendation format (#90) 2026-03-16 01:45:50 -05:00
helpers feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83) 2026-03-15 23:55:39 -05:00
gen-skill-docs.test.ts fix: dynamic base branch detection across all SKILL templates (v0.3.10) (#81) 2026-03-16 10:59:13 -05:00
skill-e2e.test.ts docs: add E2E eval failure blame protocol 2026-03-16 11:18:27 -05:00
skill-llm-eval.test.ts fix: lower planted-bug detection baselines and LLM judge thresholds for reliability 2026-03-14 05:16:17 -05:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-validation.test.ts test: add deterministic contributor mode preamble validation 2026-03-16 11:17:47 -05:00