gstack

History

Garry Tan 03a6270b9c feat: eval efficiency metrics — turns, duration, commentary across all surfaces Add generateCommentary() for natural-language delta interpretation, per-test turns/duration in comparison and summary output, judgePassed unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-03-15 21:17:12 -05:00
..
fixtures	fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds	2026-03-14 07:17:17 -05:00
helpers	feat: eval efficiency metrics — turns, duration, commentary across all surfaces	2026-03-15 21:17:12 -05:00
gen-skill-docs.test.ts	feat: qa-only skill, qa fix loop, plan-to-QA artifact flow	2026-03-15 21:17:06 -05:00
skill-e2e.test.ts	feat: eval efficiency metrics — turns, duration, commentary across all surfaces	2026-03-15 21:17:12 -05:00
skill-llm-eval.test.ts	fix: lower planted-bug detection baselines and LLM judge thresholds for reliability	2026-03-14 05:16:17 -05:00
skill-parser.test.ts	feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )	2026-03-13 21:08:12 -07:00
skill-validation.test.ts	feat: qa-only skill, qa fix loop, plan-to-QA artifact flow	2026-03-15 21:17:06 -05:00