docs(spikes): Claude hook mutation + Codex session format

Plan-tune cathedral T4 (per D5/D10). Two Phase 1 design spikes that downstream tasks (T3, T5, T6, T8, T9) depend on. claude-code-hook-mutation.md - Confirms PreToolUse allow + updatedInput is supported and is the right mechanism for substituting an auto-decided answer. - Pins stdin/stdout JSON schemas with field-by-field reference. - Documents matcher regex syntax for "(AskUserQuestion|mcp__.*__AskUserQuestion)" so Conductor's MCP-routed AUQ is covered. - Captures parallel-hook merge order caveat and our settings.json snippet. codex-session-format.md - Maps the on-disk ~/.codex/sessions/<date>/rollout-*.jsonl schema by event type (response_item 76%, event_msg 19%, turn_context, session_meta). - Critical finding: Codex has NO AskUserQuestion tool. Gstack AUQ-shaped Decision Briefs surface as agent_message text; answer is the next user_message. Two-tier recovery: marker-first (D18), then pattern fallback for hash-only logging. - Confirms logs_2.sqlite is internal telemetry, not session content. - Lists open questions to answer during T9 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:33:57 -07:00 · 2026-05-27 07:33:57 -07:00 · 6dc113838a
parent 9e0e185fe2
commit 6dc113838a
2 changed files with 364 additions and 0 deletions
--- a/docs/spikes/claude-code-hook-mutation.md
+++ b/docs/spikes/claude-code-hook-mutation.md
@ -0,0 +1,193 @@
+# Spike: Claude Code hook mutation for plan-tune cathedral
+
+**Status:** complete (2026-05-27)
+**Surfaces:** D10 (does PreToolUse allow mutating AUQ input?), D19/Codex (matcher must cover MCP variants)
+**Downstream consumers:** T3, T5, T6, T8
+
+## Question this spike answers
+
+Can a PreToolUse hook on `AskUserQuestion` actually substitute the user's
+answer via `updatedInput`? If yes, what's the exact protocol?
+
+## Answer
+
+**Yes.** `updatedInput` is the supported mechanism. Source:
+https://code.claude.com/docs/en/hooks (confirmed 2026-04 reference).
+
+## Hook stdin schema (PreToolUse + PostToolUse)
+
+```json
+{
+  "session_id": "abc123",
+  "transcript_path": "/path/to/transcript.jsonl",
+  "cwd": "/current/working/dir",
+  "permission_mode": "default",
+  "effort": { "level": "medium" },
+  "hook_event_name": "PreToolUse",
+  "tool_name": "AskUserQuestion",
+  "tool_input": { /* tool-specific */ },
+  "tool_use_id": "unique-id-12345"
+}
+```
+
+Optional in subagent context: `agent_id`, `agent_type`.
+
+## PreToolUse hook stdout schema for `allow + updatedInput`
+
+```json
+{
+  "hookSpecificOutput": {
+    "hookEventName": "PreToolUse",
+    "permissionDecision": "allow",
+    "permissionDecisionReason": "auto-decided by plan-tune preference",
+    "updatedInput": { /* shallow-merged into original tool_input */ },
+    "additionalContext": "optional context for Claude"
+  }
+}
+```
+
+**permissionDecision values:**
+- `"allow"` — proceed, optionally with `updatedInput`
+- `"deny"` — block (feedback to Claude, NOT a synthetic answer per Codex
+  correction in D-prefixed decisions)
+- `"ask"` — escalate to user
+- `"defer"` — let permission flow continue
+
+**`updatedInput` semantics:** shallow merge of fields present in the returned
+object onto the original `tool_input`. Only valid with
+`permissionDecision: "allow"`. This is what lets us substitute an
+auto-decided answer for `never-ask` preferences.
+
+## Matcher schema
+
+The `matcher` field in `~/.claude/settings.json` supports JS-regex syntax
+**when it contains regex metacharacters**. A matcher with only letters/
+underscores is an exact match.
+
+To cover both native + MCP `AskUserQuestion`:
+```json
+"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)"
+```
+
+Conductor disables native `AskUserQuestion` via `--disallowedTools` and
+routes through `mcp__conductor__AskUserQuestion` — the MCP suffix is
+required for our hook to fire there.
+
+## Multiple-hook concurrency caveat
+
+> All matching hooks run in parallel, and identical handlers are
+> deduplicated automatically.
+
+**For our use case:**
+- gstack registers exactly one PreToolUse hook and one PostToolUse hook on
+  AUQ-shaped tool names.
+- If a user has THEIR own hook that also returns `updatedInput` on
+  AskUserQuestion, the merge order is undefined.
+- Mitigation: document this constraint in `bin/gstack-settings-hook`
+  install prompt. User can detect the conflict from the diff preview before
+  accepting.
+
+**`permissionDecision` precedence (when multiple hooks decide):**
+`deny > ask > allow > defer` — most restrictive wins.
+
+## Implementation hookSpecificOutput examples
+
+**Auto-decide (PreToolUse, `never-ask` preference + non-one-way):**
+```json
+{
+  "hookSpecificOutput": {
+    "hookEventName": "PreToolUse",
+    "permissionDecision": "allow",
+    "permissionDecisionReason": "plan-tune: never-ask preference on ship-test-failure-triage",
+    "updatedInput": {
+      "questions": [{ /* same as input, but with auto-selected answer */ }]
+    }
+  }
+}
+```
+
+**Pass-through (no preference, or one-way safety override):**
+```json
+{
+  "hookSpecificOutput": {
+    "hookEventName": "PreToolUse",
+    "permissionDecision": "defer"
+  }
+}
+```
+
+**PostToolUse capture (always):**
+```json
+{
+  "hookSpecificOutput": {
+    "hookEventName": "PostToolUse"
+  }
+}
+```
+(PostToolUse hooks can also set `additionalContext` to append to the tool
+result; we don't need this for v1 capture.)
+
+## Settings.json snippet for T8 hook installer
+
+```json
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-preference-hook",
+            "timeout": 5
+          }
+        ]
+      }
+    ],
+    "PostToolUse": [
+      {
+        "matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-log-hook",
+            "timeout": 5
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+Hook commands take `bun` invocation under the hood; absolute paths (or
+`$CLAUDE_PROJECT_DIR` substitution) are required by Claude Code's hook
+runner. The hooks themselves are TypeScript files that the bash wrapper
+shells into bun.
+
+## Open questions deferred to implementation
+
+1. **Recommended-option parsing scope.** D2 says parse `(recommended)`
+   label first. The label is on the option's `label` field per
+   AskUserQuestion Format. Implementation will need to walk `tool_input.
+   questions[*].options[*]` looking for the label suffix. Worked
+   examples: ship/SKILL.md.tmpl emits options like `"A) Fix now"
+   (recommended)`.
+
+2. **Auto-decided event tagging.** When hook returns `updatedInput`, the
+   PostToolUse hook will see the resolved input and log a normal event.
+   Need an extra field on the PostToolUse payload (e.g.,
+   `was_auto_decided: true`) that the hook can set via session state
+   tracking — write a marker file in `~/.gstack/sessions/<id>/.auto-decided-<tool_use_id>`
+   from PreToolUse, read it from PostToolUse, delete on read.
+
+3. **Timeout behavior.** Default hook timeout is 60s but the docs are
+   thin on what happens at timeout. Set explicit `timeout: 5` so the
+   user never waits >5s on a hook misfire. Falls back to pass-through.
+
+## References
+
+- https://code.claude.com/docs/en/hooks (canonical, latest as of 2026-04)
+- WebSearch results 2026-05-27
+- Existing `bin/gstack-settings-hook` (SessionStart-only impl, to be
+  superseded by T3 schema-aware rewrite)
--- a/docs/spikes/codex-session-format.md
+++ b/docs/spikes/codex-session-format.md
@ -0,0 +1,171 @@
+# Spike: Codex session storage format for plan-tune cathedral
+
+**Status:** complete (2026-05-27)
+**Surfaces:** D5 (Codex import parses structured files, not regex)
+**Downstream consumers:** T9 (gstack-codex-session-import)
+
+## Question this spike answers
+
+What's the actual on-disk format of Codex sessions, and how do we recover
+AskUserQuestion-shaped events from it for `gstack-codex-session-import`?
+
+## Storage layout
+
+```
+~/.codex/
+├── auth.json                     # Codex auth (do not touch)
+├── config.toml                   # User config
+├── goals_1.sqlite                # ~24KB, internal goals DB (not relevant)
+├── logs_2.sqlite                 # ~16MB, structured logs (target=*, see schema)
+├── history.jsonl                 # ~9KB, command history
+└── sessions/
+    └── 2026/05/27/
+        └── rollout-<iso8601>-<uuid>.jsonl   # per-session transcript
+```
+
+Session files: one JSONL per `codex exec` or interactive session. Cwd path
+embedded in the `session_meta` event. CLI version recorded.
+
+## Session JSONL event types (measured on Garry's machine, 2026-05-27)
+
+| type           | count | meaning |
+|----------------|------:|---------|
+| `response_item`|   382 | model's response stream (~76%) |
+| `event_msg`    |    97 | high-level session events (~19%) |
+| `turn_context` |     6 | per-turn context snapshot |
+| `session_meta` |     6 | session header (one per session) |
+
+### response_item subtypes
+
+| subtype                  | count | meaning |
+|--------------------------|------:|---------|
+| `function_call`          | 148   | model invoked a tool |
+| `function_call_output`   | 148   | tool result returned to model |
+| `reasoning`              |  44   | reasoning summary |
+| `message`                |  40   | text message (input_text or output_text) |
+| `web_search_call`        |   2   | web search tool call |
+
+### event_msg subtypes
+
+| subtype           | count | meaning |
+|-------------------|------:|---------|
+| `token_count`     | 55    | per-step token accounting |
+| `agent_message`   | 22    | agent's prose output |
+| `user_message`    |  6    | user's prose input |
+| `task_started`    |  6    | task start (one per top-level task) |
+| `task_complete`   |  6    | task complete |
+| `web_search_end`  |  2    | web search completion |
+
+## Critical finding: Codex has no `AskUserQuestion` tool
+
+Codex doesn't surface AskUserQuestion as a tool call in `response_item`
+stream. Gstack skills running on Codex emit AskUserQuestion-shaped
+Decision Briefs as plain prose inside `agent_message` events (the
+`AskUserQuestion Format` from preamble). The user's answer comes back in
+the next `user_message`.
+
+This means importing AUQ events from Codex sessions is structurally
+different from importing them from Claude Code (where they ARE
+tool calls):
+
+- **Claude Code:** hook captures structured `tool_input`/`tool_output`
+  for `AskUserQuestion`. Question + options + answer all separated.
+- **Codex:** parser must extract from `agent_message.text` body, detect
+  the D-numbered Decision Brief pattern, then match against the
+  subsequent `user_message` for the answer.
+
+## Recovery strategy for `gstack-codex-session-import`
+
+**Two-tier extraction:**
+
+1. **Marker-first (D18 mechanism).** Search `agent_message` text for the
+   `<gstack-qid:foo-bar>` marker. If present, we have an exact question_id
+   and can reliably recover. (Will work once T14 adds markers to the top
+   10 registry questions and Codex starts emitting them via the
+   host-aware preamble path.)
+
+2. **Pattern fallback.** When no marker, parse for:
+   - `D<N> — <title>` line (D-number from AskUserQuestion Format)
+   - `Recommendation: ...` line
+   - Option block `A) ...`, `B) ...`, etc.
+   - Next `user_message` event for the chosen option label
+
+   Use this only to populate hash-based question_id (the same
+   `hook-<sha1(skill+text+sorted_options)[:10]>` shape Layer 1 uses on
+   Claude). Tagged `source: "codex-pattern-fallback"`, never used as
+   preference key (per D18 hash drift guidance).
+
+## Schema we'll write to question-log.jsonl from Codex import
+
+Per existing `bin/gstack-question-log` schema, augmented with:
+- `source: "codex-import-marker"` (when qid marker found)
+- `source: "codex-import-pattern"` (when fallback regex used)
+- `codex_session_id` (UUID from session_meta)
+- `codex_cwd` (working dir from session_meta — disambiguates project)
+- `codex_ts` (timestamp from event)
+
+## Sqlite logs_2.sqlite schema
+
+```sql
+CREATE TABLE logs (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  ts INTEGER NOT NULL,
+  ts_nanos INTEGER NOT NULL,
+  level TEXT NOT NULL,
+  target TEXT NOT NULL,
+  feedback_log_body TEXT,
+  module_path TEXT,
+  file TEXT,
+  line INTEGER,
+  thread_id TEXT,
+  process_uuid TEXT,
+  estimated_bytes INTEGER NOT NULL DEFAULT 0
+);
+```
+
+`logs_2.sqlite` is internal telemetry, not session content. **Don't use
+for AUQ extraction.** Sessions JSONL is authoritative.
+
+## Project-slug derivation
+
+From `session_meta.payload.cwd` — derive via the existing
+`bin/gstack-slug` logic on the cwd path. Conductor worktrees have their
+own slug naming convention encoded in cwd; the bin already handles this.
+
+## Versioning safety
+
+`session_meta.payload.cli_version` records the Codex CLI version (e.g.
+`0.130.0`). When the importer encounters an unknown version, log a
+warning to stderr but continue — schema additions are typically
+backwards-compatible in JSONL.
+
+If `type` or `payload.type` values change in a future version, we'll see
+them as `unknown` in the importer's audit log. Add a guarded
+`KNOWN_VERSIONS = ["0.130.x", "0.131.x", ...]` constant in the importer
+and bump explicitly when re-testing.
+
+## Open questions for implementation
+
+1. **Where does Codex store the "user's answer" exactly?** Need to test
+   with a real `codex exec` run that triggers a Decision Brief and inspect
+   the next event. Likely `event_msg` of subtype `user_message` or a
+   `response_item` of subtype `message` with `role: "user"`. Confirm
+   during T9 implementation.
+
+2. **Free-text extraction for "Other".** The Decision Brief prose
+   doesn't structurally separate "Other" responses from named options.
+   Pattern fallback will need to detect "Other: <text>" wording in the
+   answer. T10 (dream cycle distill) only fires on this when source is
+   `codex-import-marker` so we can trust the data.
+
+3. **Conductor cwd handling.** Conductor worktrees share project state
+   but have distinct cwds. The import should bucket events by the
+   project slug, not the cwd directly, so events from sibling worktrees
+   accumulate into the same project view.
+
+## References
+
+- Live inspection of `~/.codex/sessions/2026/05/*/`
+- `sqlite3 ~/.codex/logs_2.sqlite ".schema"` (2026-05-27)
+- Codex CLI 0.130.0 (current at spike time)
+- See also: D5 cross-model tension decision in plan file.