mirror of https://github.com/garrytan/gstack.git
Cathedral parity-eval suite primitive. captureBaseline() walks every top-level SKILL.md and records bytes, lines, estimated tokens, frontmatter description length, and eval coverage. diffBaselines() reports per-skill delta + total corpus delta + catalog tokens delta. Locks the v1.44.1 reference snapshot at test/fixtures/parity-baseline-v1.44.1.json. After Phase A+B+C land, scripts/capture-baseline.ts --tag v1.45.0.0 produces a comparable snapshot; diff supplies the real numbers the v2 CHANGELOG quotes. Never invent baseline numbers; ship them only if they came from a real run. v1.44.1 numbers captured this commit: - 51 skills - 2,847 KB total corpus - ~9,319 catalog tokens (sum of description bytes / 4) - top 3: ship 160 KB, plan-ceo-review 128 KB, office-hours 108 KB Test plan: - bun test test/helpers/capture-parity-baseline.test.ts passes 4/4 - The baseline JSON file is committed so reviewers can audit v1→v2 numbers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| providers | ||
| agent-sdk-runner.ts | ||
| benchmark-judge.ts | ||
| benchmark-runner.ts | ||
| capture-parity-baseline.test.ts | ||
| capture-parity-baseline.ts | ||
| claude-pty-runner.ts | ||
| claude-pty-runner.unit.test.ts | ||
| codex-session-runner.ts | ||
| e2e-helpers.ts | ||
| eval-store.test.ts | ||
| eval-store.ts | ||
| gemini-session-runner.test.ts | ||
| gemini-session-runner.ts | ||
| llm-judge.ts | ||
| observability.test.ts | ||
| pricing.ts | ||
| secret-sink-harness.ts | ||
| session-runner.test.ts | ||
| session-runner.ts | ||
| skill-parser.ts | ||
| tool-map.ts | ||
| touchfiles.ts | ||