gstack/test
Garry Tan d442aadf4a
perf: add model pinning infrastructure + rate-limit telemetry to E2E runner
Default E2E model changed from Opus to Sonnet (5x faster, 5x cheaper).
Session runner now accepts `model` option with EVALS_MODEL env var override.
Added timing telemetry (first_response_ms, max_inter_turn_ms) and wall_clock_ms
to eval-store for diagnosing rate-limit impact. Added EVALS_FAST test filtering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 13:12:17 -07:00
..
fixtures feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142) 2026-03-17 20:12:55 -05:00
helpers perf: add model pinning infrastructure + rate-limit telemetry to E2E runner 2026-03-21 13:12:17 -07:00
analytics.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
codex-e2e.test.ts perf: enable test.concurrent for 31 independent E2E tests 2026-03-21 10:01:55 -07:00
gen-skill-docs.test.ts feat: adversarial spec review loop + skill chaining (v0.9.1.0) (#249) 2026-03-20 06:24:22 -07:00
hook-scripts.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
skill-e2e-browse.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-deploy.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-design.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-plan.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-qa-bugs.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-qa-workflow.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-review.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-e2e-workflow.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00
skill-llm-eval.test.ts test: E2E + LLM-judge evals for deploy skills 2026-03-20 07:16:45 -07:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-routing-e2e.test.ts perf: enable test.concurrent for 31 independent E2E tests 2026-03-21 10:01:55 -07:00
skill-validation.test.ts merge: resolve conflicts with origin/main (v0.9.1.0 → v0.9.1) 2026-03-20 07:28:44 -07:00
telemetry.test.ts feat: opt-in usage telemetry + community intelligence platform (v0.8.6) (#210) 2026-03-19 17:21:05 -07:00
touchfiles.test.ts perf: split monolithic E2E test into 8 parallel files 2026-03-21 10:37:37 -07:00