gstack/test
Garry Tan 4ad73f7362
feat: unified gstack eval CLI with list, compare, push, cache, cost
- lib/cli-eval.ts: routes to list/compare/summary/push/cost/cache/watch
  subcommands. Ports logic from 4 separate scripts into unified entry.
  Adds ANSI color for TTY (respects NO_COLOR), --limit flag for list.
- bin/gstack-eval: bash wrapper matching bin/gstack-sync pattern
- package.json: eval:* scripts now point to lib/cli-eval.ts
- supabase/migrations/004_eval_costs.sql: per-model cost tracking + RLS
- docs/eval-result-format.md: public format spec for any language
- test/lib-eval-cli.test.ts: integration tests (spawn CLI subprocess)
  including 3 push failure modes (file-not-found, invalid schema,
  sync unavailable)

215 tests passing across 13 files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 09:39:36 -05:00
..
fixtures fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds 2026-03-14 07:17:17 -05:00
helpers feat: hook eval-store sync, use shared utils, add 30 lib tests 2026-03-15 02:02:54 -05:00
gen-skill-docs.test.ts feat: template-ify all skills + E2E tests for plan-ceo-review, plan-eng-review, retro 2026-03-14 07:28:02 -05:00
lib-eval-cache.test.ts feat: add SHA-based eval caching with EVAL_CACHE=0 bypass 2026-03-15 09:39:26 -05:00
lib-eval-cli.test.ts feat: unified gstack eval CLI with list, compare, push, cache, cost 2026-03-15 09:39:36 -05:00
lib-eval-cost.test.ts feat: add eval format validation, tier selection, cost tracking 2026-03-15 09:39:18 -05:00
lib-eval-format.test.ts feat: add eval format validation, tier selection, cost tracking 2026-03-15 09:39:18 -05:00
lib-eval-tier.test.ts feat: add eval format validation, tier selection, cost tracking 2026-03-15 09:39:18 -05:00
lib-sync-config.test.ts feat: hook eval-store sync, use shared utils, add 30 lib tests 2026-03-15 02:02:54 -05:00
lib-sync.test.ts feat: hook eval-store sync, use shared utils, add 30 lib tests 2026-03-15 02:02:54 -05:00
lib-util.test.ts feat: add listEvalFiles, loadEvalResults, formatTimestamp to lib/util.ts 2026-03-15 09:39:09 -05:00
skill-e2e.test.ts fix: harden planted-bug eval prompt for reliable form testing 2026-03-14 13:28:18 -05:00
skill-llm-eval.test.ts fix: lower planted-bug detection baselines and LLM judge thresholds for reliability 2026-03-14 05:16:17 -05:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-validation.test.ts feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61) 2026-03-14 20:15:11 -07:00