Three untested surfaces from the v1.46.0.0 work. All three would have
caught real bugs we shipped (and fixed) on this branch.
1. test/helpers/budget-override.test.ts — 7 tests pin the audit-trail
contract for EVALS_BUDGET_OVERRIDE_REASON and
GSTACK_SIZE_BUDGET_OVERRIDE_REASON. Without this, the audit logger
could silently drop events and overrides become invisible. Tests
cover: required fields per JSONL line, CI provenance capture
(CI/GITHUB_ACTIONS/branch/commit), local-runner defaults,
append-only behavior, missing-directory recovery, and unwritable-
path resilience (logs warning instead of throwing).
2. test/terse-build.test.ts — 16 tests pin --explain-level=terse
behavior across the 4 gated resolvers and the composed preamble.
Default vs terse vs undefined-ctx all asserted. Without this, a
refactor that breaks the explainLevel threading silently regresses
the opt-in compression path; the runtime EXPLAIN_LEVEL: terse gate
still works so users wouldn't notice. Tier-1 invariant pinned
(terse-only-affects-tier-2+).
3. test/gen-skill-docs-idempotency.test.ts — 2 tests catch the class
of bug behind the v1.45.0.0 timestamp flap. Two consecutive
gen-skill-docs runs must produce byte-identical outputs across
STABLE_OUTPUTS (proactive-suggestions.json, SKILL.md, ship/SKILL.md,
plan-ceo-review/SKILL.md, office-hours/SKILL.md, gstack/llms.txt).
--dry-run reports zero stale files after a fresh gen. CI freshness
regressions surface as test failures BEFORE a PR is opened.
Test plan:
- bun test test/helpers/budget-override.test.ts: 7 pass
- bun test test/terse-build.test.ts: 16 pass
- bun test test/gen-skill-docs-idempotency.test.ts: 2 pass
- Full focused suite (15 test files): 1179 pass, 0 fail (+45 new tests
vs the pre-fill baseline of 1134)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>