From 6dc113838a3f964f6ab22e44e3d06c80f2f9baf3 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Wed, 27 May 2026 07:33:57 -0700 Subject: [PATCH] docs(spikes): Claude hook mutation + Codex session format Plan-tune cathedral T4 (per D5/D10). Two Phase 1 design spikes that downstream tasks (T3, T5, T6, T8, T9) depend on. claude-code-hook-mutation.md - Confirms PreToolUse allow + updatedInput is supported and is the right mechanism for substituting an auto-decided answer. - Pins stdin/stdout JSON schemas with field-by-field reference. - Documents matcher regex syntax for "(AskUserQuestion|mcp__.*__AskUserQuestion)" so Conductor's MCP-routed AUQ is covered. - Captures parallel-hook merge order caveat and our settings.json snippet. codex-session-format.md - Maps the on-disk ~/.codex/sessions//rollout-*.jsonl schema by event type (response_item 76%, event_msg 19%, turn_context, session_meta). - Critical finding: Codex has NO AskUserQuestion tool. Gstack AUQ-shaped Decision Briefs surface as agent_message text; answer is the next user_message. Two-tier recovery: marker-first (D18), then pattern fallback for hash-only logging. - Confirms logs_2.sqlite is internal telemetry, not session content. - Lists open questions to answer during T9 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/spikes/claude-code-hook-mutation.md | 193 +++++++++++++++++++++++ docs/spikes/codex-session-format.md | 171 ++++++++++++++++++++ 2 files changed, 364 insertions(+) create mode 100644 docs/spikes/claude-code-hook-mutation.md create mode 100644 docs/spikes/codex-session-format.md diff --git a/docs/spikes/claude-code-hook-mutation.md b/docs/spikes/claude-code-hook-mutation.md new file mode 100644 index 000000000..70a4ae18a --- /dev/null +++ b/docs/spikes/claude-code-hook-mutation.md @@ -0,0 +1,193 @@ +# Spike: Claude Code hook mutation for plan-tune cathedral + +**Status:** complete (2026-05-27) +**Surfaces:** D10 (does PreToolUse allow mutating AUQ input?), D19/Codex (matcher must cover MCP variants) +**Downstream consumers:** T3, T5, T6, T8 + +## Question this spike answers + +Can a PreToolUse hook on `AskUserQuestion` actually substitute the user's +answer via `updatedInput`? If yes, what's the exact protocol? + +## Answer + +**Yes.** `updatedInput` is the supported mechanism. Source: +https://code.claude.com/docs/en/hooks (confirmed 2026-04 reference). + +## Hook stdin schema (PreToolUse + PostToolUse) + +```json +{ + "session_id": "abc123", + "transcript_path": "/path/to/transcript.jsonl", + "cwd": "/current/working/dir", + "permission_mode": "default", + "effort": { "level": "medium" }, + "hook_event_name": "PreToolUse", + "tool_name": "AskUserQuestion", + "tool_input": { /* tool-specific */ }, + "tool_use_id": "unique-id-12345" +} +``` + +Optional in subagent context: `agent_id`, `agent_type`. + +## PreToolUse hook stdout schema for `allow + updatedInput` + +```json +{ + "hookSpecificOutput": { + "hookEventName": "PreToolUse", + "permissionDecision": "allow", + "permissionDecisionReason": "auto-decided by plan-tune preference", + "updatedInput": { /* shallow-merged into original tool_input */ }, + "additionalContext": "optional context for Claude" + } +} +``` + +**permissionDecision values:** +- `"allow"` — proceed, optionally with `updatedInput` +- `"deny"` — block (feedback to Claude, NOT a synthetic answer per Codex + correction in D-prefixed decisions) +- `"ask"` — escalate to user +- `"defer"` — let permission flow continue + +**`updatedInput` semantics:** shallow merge of fields present in the returned +object onto the original `tool_input`. Only valid with +`permissionDecision: "allow"`. This is what lets us substitute an +auto-decided answer for `never-ask` preferences. + +## Matcher schema + +The `matcher` field in `~/.claude/settings.json` supports JS-regex syntax +**when it contains regex metacharacters**. A matcher with only letters/ +underscores is an exact match. + +To cover both native + MCP `AskUserQuestion`: +```json +"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)" +``` + +Conductor disables native `AskUserQuestion` via `--disallowedTools` and +routes through `mcp__conductor__AskUserQuestion` — the MCP suffix is +required for our hook to fire there. + +## Multiple-hook concurrency caveat + +> All matching hooks run in parallel, and identical handlers are +> deduplicated automatically. + +**For our use case:** +- gstack registers exactly one PreToolUse hook and one PostToolUse hook on + AUQ-shaped tool names. +- If a user has THEIR own hook that also returns `updatedInput` on + AskUserQuestion, the merge order is undefined. +- Mitigation: document this constraint in `bin/gstack-settings-hook` + install prompt. User can detect the conflict from the diff preview before + accepting. + +**`permissionDecision` precedence (when multiple hooks decide):** +`deny > ask > allow > defer` — most restrictive wins. + +## Implementation hookSpecificOutput examples + +**Auto-decide (PreToolUse, `never-ask` preference + non-one-way):** +```json +{ + "hookSpecificOutput": { + "hookEventName": "PreToolUse", + "permissionDecision": "allow", + "permissionDecisionReason": "plan-tune: never-ask preference on ship-test-failure-triage", + "updatedInput": { + "questions": [{ /* same as input, but with auto-selected answer */ }] + } + } +} +``` + +**Pass-through (no preference, or one-way safety override):** +```json +{ + "hookSpecificOutput": { + "hookEventName": "PreToolUse", + "permissionDecision": "defer" + } +} +``` + +**PostToolUse capture (always):** +```json +{ + "hookSpecificOutput": { + "hookEventName": "PostToolUse" + } +} +``` +(PostToolUse hooks can also set `additionalContext` to append to the tool +result; we don't need this for v1 capture.) + +## Settings.json snippet for T8 hook installer + +```json +{ + "hooks": { + "PreToolUse": [ + { + "matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)", + "hooks": [ + { + "type": "command", + "command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-preference-hook", + "timeout": 5 + } + ] + } + ], + "PostToolUse": [ + { + "matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)", + "hooks": [ + { + "type": "command", + "command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-log-hook", + "timeout": 5 + } + ] + } + ] + } +} +``` + +Hook commands take `bun` invocation under the hood; absolute paths (or +`$CLAUDE_PROJECT_DIR` substitution) are required by Claude Code's hook +runner. The hooks themselves are TypeScript files that the bash wrapper +shells into bun. + +## Open questions deferred to implementation + +1. **Recommended-option parsing scope.** D2 says parse `(recommended)` + label first. The label is on the option's `label` field per + AskUserQuestion Format. Implementation will need to walk `tool_input. + questions[*].options[*]` looking for the label suffix. Worked + examples: ship/SKILL.md.tmpl emits options like `"A) Fix now" + (recommended)`. + +2. **Auto-decided event tagging.** When hook returns `updatedInput`, the + PostToolUse hook will see the resolved input and log a normal event. + Need an extra field on the PostToolUse payload (e.g., + `was_auto_decided: true`) that the hook can set via session state + tracking — write a marker file in `~/.gstack/sessions//.auto-decided-` + from PreToolUse, read it from PostToolUse, delete on read. + +3. **Timeout behavior.** Default hook timeout is 60s but the docs are + thin on what happens at timeout. Set explicit `timeout: 5` so the + user never waits >5s on a hook misfire. Falls back to pass-through. + +## References + +- https://code.claude.com/docs/en/hooks (canonical, latest as of 2026-04) +- WebSearch results 2026-05-27 +- Existing `bin/gstack-settings-hook` (SessionStart-only impl, to be + superseded by T3 schema-aware rewrite) diff --git a/docs/spikes/codex-session-format.md b/docs/spikes/codex-session-format.md new file mode 100644 index 000000000..323bdff29 --- /dev/null +++ b/docs/spikes/codex-session-format.md @@ -0,0 +1,171 @@ +# Spike: Codex session storage format for plan-tune cathedral + +**Status:** complete (2026-05-27) +**Surfaces:** D5 (Codex import parses structured files, not regex) +**Downstream consumers:** T9 (gstack-codex-session-import) + +## Question this spike answers + +What's the actual on-disk format of Codex sessions, and how do we recover +AskUserQuestion-shaped events from it for `gstack-codex-session-import`? + +## Storage layout + +``` +~/.codex/ +├── auth.json # Codex auth (do not touch) +├── config.toml # User config +├── goals_1.sqlite # ~24KB, internal goals DB (not relevant) +├── logs_2.sqlite # ~16MB, structured logs (target=*, see schema) +├── history.jsonl # ~9KB, command history +└── sessions/ + └── 2026/05/27/ + └── rollout--.jsonl # per-session transcript +``` + +Session files: one JSONL per `codex exec` or interactive session. Cwd path +embedded in the `session_meta` event. CLI version recorded. + +## Session JSONL event types (measured on Garry's machine, 2026-05-27) + +| type | count | meaning | +|----------------|------:|---------| +| `response_item`| 382 | model's response stream (~76%) | +| `event_msg` | 97 | high-level session events (~19%) | +| `turn_context` | 6 | per-turn context snapshot | +| `session_meta` | 6 | session header (one per session) | + +### response_item subtypes + +| subtype | count | meaning | +|--------------------------|------:|---------| +| `function_call` | 148 | model invoked a tool | +| `function_call_output` | 148 | tool result returned to model | +| `reasoning` | 44 | reasoning summary | +| `message` | 40 | text message (input_text or output_text) | +| `web_search_call` | 2 | web search tool call | + +### event_msg subtypes + +| subtype | count | meaning | +|-------------------|------:|---------| +| `token_count` | 55 | per-step token accounting | +| `agent_message` | 22 | agent's prose output | +| `user_message` | 6 | user's prose input | +| `task_started` | 6 | task start (one per top-level task) | +| `task_complete` | 6 | task complete | +| `web_search_end` | 2 | web search completion | + +## Critical finding: Codex has no `AskUserQuestion` tool + +Codex doesn't surface AskUserQuestion as a tool call in `response_item` +stream. Gstack skills running on Codex emit AskUserQuestion-shaped +Decision Briefs as plain prose inside `agent_message` events (the +`AskUserQuestion Format` from preamble). The user's answer comes back in +the next `user_message`. + +This means importing AUQ events from Codex sessions is structurally +different from importing them from Claude Code (where they ARE +tool calls): + +- **Claude Code:** hook captures structured `tool_input`/`tool_output` + for `AskUserQuestion`. Question + options + answer all separated. +- **Codex:** parser must extract from `agent_message.text` body, detect + the D-numbered Decision Brief pattern, then match against the + subsequent `user_message` for the answer. + +## Recovery strategy for `gstack-codex-session-import` + +**Two-tier extraction:** + +1. **Marker-first (D18 mechanism).** Search `agent_message` text for the + `` marker. If present, we have an exact question_id + and can reliably recover. (Will work once T14 adds markers to the top + 10 registry questions and Codex starts emitting them via the + host-aware preamble path.) + +2. **Pattern fallback.** When no marker, parse for: + - `D` line (D-number from AskUserQuestion Format) + - `Recommendation: ...` line + - Option block `A) ...`, `B) ...`, etc. + - Next `user_message` event for the chosen option label + + Use this only to populate hash-based question_id (the same + `hook-<sha1(skill+text+sorted_options)[:10]>` shape Layer 1 uses on + Claude). Tagged `source: "codex-pattern-fallback"`, never used as + preference key (per D18 hash drift guidance). + +## Schema we'll write to question-log.jsonl from Codex import + +Per existing `bin/gstack-question-log` schema, augmented with: +- `source: "codex-import-marker"` (when qid marker found) +- `source: "codex-import-pattern"` (when fallback regex used) +- `codex_session_id` (UUID from session_meta) +- `codex_cwd` (working dir from session_meta — disambiguates project) +- `codex_ts` (timestamp from event) + +## Sqlite logs_2.sqlite schema + +```sql +CREATE TABLE logs ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + ts INTEGER NOT NULL, + ts_nanos INTEGER NOT NULL, + level TEXT NOT NULL, + target TEXT NOT NULL, + feedback_log_body TEXT, + module_path TEXT, + file TEXT, + line INTEGER, + thread_id TEXT, + process_uuid TEXT, + estimated_bytes INTEGER NOT NULL DEFAULT 0 +); +``` + +`logs_2.sqlite` is internal telemetry, not session content. **Don't use +for AUQ extraction.** Sessions JSONL is authoritative. + +## Project-slug derivation + +From `session_meta.payload.cwd` — derive via the existing +`bin/gstack-slug` logic on the cwd path. Conductor worktrees have their +own slug naming convention encoded in cwd; the bin already handles this. + +## Versioning safety + +`session_meta.payload.cli_version` records the Codex CLI version (e.g. +`0.130.0`). When the importer encounters an unknown version, log a +warning to stderr but continue — schema additions are typically +backwards-compatible in JSONL. + +If `type` or `payload.type` values change in a future version, we'll see +them as `unknown` in the importer's audit log. Add a guarded +`KNOWN_VERSIONS = ["0.130.x", "0.131.x", ...]` constant in the importer +and bump explicitly when re-testing. + +## Open questions for implementation + +1. **Where does Codex store the "user's answer" exactly?** Need to test + with a real `codex exec` run that triggers a Decision Brief and inspect + the next event. Likely `event_msg` of subtype `user_message` or a + `response_item` of subtype `message` with `role: "user"`. Confirm + during T9 implementation. + +2. **Free-text extraction for "Other".** The Decision Brief prose + doesn't structurally separate "Other" responses from named options. + Pattern fallback will need to detect "Other: <text>" wording in the + answer. T10 (dream cycle distill) only fires on this when source is + `codex-import-marker` so we can trust the data. + +3. **Conductor cwd handling.** Conductor worktrees share project state + but have distinct cwds. The import should bucket events by the + project slug, not the cwd directly, so events from sibling worktrees + accumulate into the same project view. + +## References + +- Live inspection of `~/.codex/sessions/2026/05/*/` +- `sqlite3 ~/.codex/logs_2.sqlite ".schema"` (2026-05-27) +- Codex CLI 0.130.0 (current at spike time) +- See also: D5 cross-model tension decision in plan file.