mirror of https://github.com/garrytan/gstack.git
Plan-tune cathedral T16 (per D12 — all 5 in gate tier). One consolidated
file with five describeIfSelected scenarios, each selectable by its own
touchfile entry so they only run when the relevant code changes (or
EVALS_ALL=1 forces all):
plan-tune-hook-capture — PostToolUse hook fires → question-log fills
plan-tune-enforcement — never-ask + marker + 2-way → deny+reason
+ auto-decided event logged
plan-tune-annotation — declared profile + memory nugget
→ additionalContext surfaced on defer
plan-tune-codex-import — synthetic JSONL → import bin → log with
source=codex-import-marker
plan-tune-dream-cycle — apply proposal → re-fire question
→ memory injected via additionalContext
Each scenario fixtures an isolated git repo + bins + scripts + hooks
under tmp, then exercises the cathedral chain end-to-end against real
on-disk binaries (no mocks at the bin layer). GSTACK_STATE_ROOT keeps
the user's real ~/.gstack untouched.
These five complement the existing unit tests by proving the full
sub-process chain works (not just individual functions in isolation).
They DON'T spawn claude -p because the cathedral's substrate behavior is
deterministic — agent compliance is no longer the variable. The existing
test/skill-e2e-plan-tune.test.ts (plan-tune-inspect) still covers the
LLM-driven intent-routing behavior.
Cost: each scenario runs in ~1s with $0 because no claude -p invocations.
Touchfile-gated, so they only run on PRs that touch cathedral code.
Also fixes a bug found by the E2E: question-log-hook didn't pass the
incoming tool call's cwd to spawnSync when invoking gstack-question-log,
so the bin used the hook process's cwd (the repo root) instead of the
session's cwd. Result: log writes landed in the wrong project bucket.
Fix mirrors the same cwd-passing pattern from question-preference-hook.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| providers | ||
| agent-sdk-runner.ts | ||
| benchmark-judge.ts | ||
| benchmark-runner.ts | ||
| budget-override.test.ts | ||
| budget-override.ts | ||
| capture-parity-baseline.test.ts | ||
| capture-parity-baseline.ts | ||
| claude-pty-runner.ts | ||
| claude-pty-runner.unit.test.ts | ||
| codex-session-runner.ts | ||
| e2e-helpers.ts | ||
| eval-store.test.ts | ||
| eval-store.ts | ||
| gemini-session-runner.test.ts | ||
| gemini-session-runner.ts | ||
| llm-judge.ts | ||
| observability.test.ts | ||
| parity-harness.ts | ||
| pricing.ts | ||
| secret-sink-harness.ts | ||
| session-runner.test.ts | ||
| session-runner.ts | ||
| skill-parser.ts | ||
| tool-map.ts | ||
| touchfiles.ts | ||