gstack/test
Garry Tan 0d1d2e970b
test: E2E + LLM-judge evals for deploy skills
- 4 E2E tests: land-and-deploy (Fly.io detection + deploy report),
  canary (monitoring report structure), benchmark (perf report schema),
  setup-deploy (platform detection → CLAUDE.md config)
- 4 LLM-judge evals: workflow quality for all 4 new skills
- Touchfile entries for diff-based test selection (E2E + LLM-judge)
- 460 free tests pass, 0 fail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 07:16:45 -07:00
..
fixtures feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
helpers test: E2E + LLM-judge evals for deploy skills 2026-03-20 07:16:45 -07:00
analytics.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
codex-e2e.test.ts feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226) 2026-03-19 18:20:50 -07:00
gen-skill-docs.test.ts fix: plan mode exception for review log + telemetry writes (v0.9.0.1) (#234) 2026-03-19 23:10:26 -07:00
hook-scripts.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
skill-e2e.test.ts test: E2E + LLM-judge evals for deploy skills 2026-03-20 07:16:45 -07:00
skill-llm-eval.test.ts test: E2E + LLM-judge evals for deploy skills 2026-03-20 07:16:45 -07:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-routing-e2e.test.ts fix: security hardening + issue triage (v0.8.3) (#205) 2026-03-19 01:58:43 -05:00
skill-validation.test.ts feat: /setup-deploy skill + platform-specific deploy verification 2026-03-20 07:14:32 -07:00
telemetry.test.ts feat: opt-in usage telemetry + community intelligence platform (v0.8.6) (#210) 2026-03-19 17:21:05 -07:00
touchfiles.test.ts fix: /qa never refuses browser testing on backend-only changes (#202) 2026-03-19 00:31:26 -05:00