mirror of https://github.com/garrytan/gstack.git
Cathedral parity-eval suite primitive. captureBaseline() walks every top-level SKILL.md and records bytes, lines, estimated tokens, frontmatter description length, and eval coverage. diffBaselines() reports per-skill delta + total corpus delta + catalog tokens delta. Locks the v1.44.1 reference snapshot at test/fixtures/parity-baseline-v1.44.1.json. After Phase A+B+C land, scripts/capture-baseline.ts --tag v1.45.0.0 produces a comparable snapshot; diff supplies the real numbers the v2 CHANGELOG quotes. Never invent baseline numbers; ship them only if they came from a real run. v1.44.1 numbers captured this commit: - 51 skills - 2,847 KB total corpus - ~9,319 catalog tokens (sum of description bytes / 4) - top 3: ship 160 KB, plan-ceo-review 128 KB, office-hours 108 KB Test plan: - bun test test/helpers/capture-parity-baseline.test.ts passes 4/4 - The baseline JSON file is committed so reviewers can audit v1→v2 numbers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| golden | ||
| ios-qa/FixtureApp | ||
| mode-posture | ||
| plans | ||
| coverage-audit-fixture.ts | ||
| eval-baselines.json | ||
| forcing-finding-seeds.ts | ||
| golden-ship-claude.md | ||
| overlay-nudges.ts | ||
| parity-baseline-v1.44.1.json | ||
| qa-eval-checkout-ground-truth.json | ||
| qa-eval-ground-truth.json | ||
| qa-eval-spa-ground-truth.json | ||
| review-army-migration.sql | ||
| review-army-n-plus-one.rb | ||
| review-eval-design-slop.css | ||
| review-eval-design-slop.html | ||
| review-eval-enum-diff.rb | ||
| review-eval-enum.rb | ||
| review-eval-vuln.rb | ||