mirror of https://github.com/garrytan/gstack.git
docs(spikes): Claude hook mutation + Codex session format
Plan-tune cathedral T4 (per D5/D10). Two Phase 1 design spikes that downstream tasks (T3, T5, T6, T8, T9) depend on. claude-code-hook-mutation.md - Confirms PreToolUse allow + updatedInput is supported and is the right mechanism for substituting an auto-decided answer. - Pins stdin/stdout JSON schemas with field-by-field reference. - Documents matcher regex syntax for "(AskUserQuestion|mcp__.*__AskUserQuestion)" so Conductor's MCP-routed AUQ is covered. - Captures parallel-hook merge order caveat and our settings.json snippet. codex-session-format.md - Maps the on-disk ~/.codex/sessions/<date>/rollout-*.jsonl schema by event type (response_item 76%, event_msg 19%, turn_context, session_meta). - Critical finding: Codex has NO AskUserQuestion tool. Gstack AUQ-shaped Decision Briefs surface as agent_message text; answer is the next user_message. Two-tier recovery: marker-first (D18), then pattern fallback for hash-only logging. - Confirms logs_2.sqlite is internal telemetry, not session content. - Lists open questions to answer during T9 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9e0e185fe2
commit
6dc113838a
|
|
@ -0,0 +1,193 @@
|
|||
# Spike: Claude Code hook mutation for plan-tune cathedral
|
||||
|
||||
**Status:** complete (2026-05-27)
|
||||
**Surfaces:** D10 (does PreToolUse allow mutating AUQ input?), D19/Codex (matcher must cover MCP variants)
|
||||
**Downstream consumers:** T3, T5, T6, T8
|
||||
|
||||
## Question this spike answers
|
||||
|
||||
Can a PreToolUse hook on `AskUserQuestion` actually substitute the user's
|
||||
answer via `updatedInput`? If yes, what's the exact protocol?
|
||||
|
||||
## Answer
|
||||
|
||||
**Yes.** `updatedInput` is the supported mechanism. Source:
|
||||
https://code.claude.com/docs/en/hooks (confirmed 2026-04 reference).
|
||||
|
||||
## Hook stdin schema (PreToolUse + PostToolUse)
|
||||
|
||||
```json
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"transcript_path": "/path/to/transcript.jsonl",
|
||||
"cwd": "/current/working/dir",
|
||||
"permission_mode": "default",
|
||||
"effort": { "level": "medium" },
|
||||
"hook_event_name": "PreToolUse",
|
||||
"tool_name": "AskUserQuestion",
|
||||
"tool_input": { /* tool-specific */ },
|
||||
"tool_use_id": "unique-id-12345"
|
||||
}
|
||||
```
|
||||
|
||||
Optional in subagent context: `agent_id`, `agent_type`.
|
||||
|
||||
## PreToolUse hook stdout schema for `allow + updatedInput`
|
||||
|
||||
```json
|
||||
{
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PreToolUse",
|
||||
"permissionDecision": "allow",
|
||||
"permissionDecisionReason": "auto-decided by plan-tune preference",
|
||||
"updatedInput": { /* shallow-merged into original tool_input */ },
|
||||
"additionalContext": "optional context for Claude"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**permissionDecision values:**
|
||||
- `"allow"` — proceed, optionally with `updatedInput`
|
||||
- `"deny"` — block (feedback to Claude, NOT a synthetic answer per Codex
|
||||
correction in D-prefixed decisions)
|
||||
- `"ask"` — escalate to user
|
||||
- `"defer"` — let permission flow continue
|
||||
|
||||
**`updatedInput` semantics:** shallow merge of fields present in the returned
|
||||
object onto the original `tool_input`. Only valid with
|
||||
`permissionDecision: "allow"`. This is what lets us substitute an
|
||||
auto-decided answer for `never-ask` preferences.
|
||||
|
||||
## Matcher schema
|
||||
|
||||
The `matcher` field in `~/.claude/settings.json` supports JS-regex syntax
|
||||
**when it contains regex metacharacters**. A matcher with only letters/
|
||||
underscores is an exact match.
|
||||
|
||||
To cover both native + MCP `AskUserQuestion`:
|
||||
```json
|
||||
"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)"
|
||||
```
|
||||
|
||||
Conductor disables native `AskUserQuestion` via `--disallowedTools` and
|
||||
routes through `mcp__conductor__AskUserQuestion` — the MCP suffix is
|
||||
required for our hook to fire there.
|
||||
|
||||
## Multiple-hook concurrency caveat
|
||||
|
||||
> All matching hooks run in parallel, and identical handlers are
|
||||
> deduplicated automatically.
|
||||
|
||||
**For our use case:**
|
||||
- gstack registers exactly one PreToolUse hook and one PostToolUse hook on
|
||||
AUQ-shaped tool names.
|
||||
- If a user has THEIR own hook that also returns `updatedInput` on
|
||||
AskUserQuestion, the merge order is undefined.
|
||||
- Mitigation: document this constraint in `bin/gstack-settings-hook`
|
||||
install prompt. User can detect the conflict from the diff preview before
|
||||
accepting.
|
||||
|
||||
**`permissionDecision` precedence (when multiple hooks decide):**
|
||||
`deny > ask > allow > defer` — most restrictive wins.
|
||||
|
||||
## Implementation hookSpecificOutput examples
|
||||
|
||||
**Auto-decide (PreToolUse, `never-ask` preference + non-one-way):**
|
||||
```json
|
||||
{
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PreToolUse",
|
||||
"permissionDecision": "allow",
|
||||
"permissionDecisionReason": "plan-tune: never-ask preference on ship-test-failure-triage",
|
||||
"updatedInput": {
|
||||
"questions": [{ /* same as input, but with auto-selected answer */ }]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Pass-through (no preference, or one-way safety override):**
|
||||
```json
|
||||
{
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PreToolUse",
|
||||
"permissionDecision": "defer"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**PostToolUse capture (always):**
|
||||
```json
|
||||
{
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PostToolUse"
|
||||
}
|
||||
}
|
||||
```
|
||||
(PostToolUse hooks can also set `additionalContext` to append to the tool
|
||||
result; we don't need this for v1 capture.)
|
||||
|
||||
## Settings.json snippet for T8 hook installer
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PreToolUse": [
|
||||
{
|
||||
"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-preference-hook",
|
||||
"timeout": 5
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-log-hook",
|
||||
"timeout": 5
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Hook commands take `bun` invocation under the hood; absolute paths (or
|
||||
`$CLAUDE_PROJECT_DIR` substitution) are required by Claude Code's hook
|
||||
runner. The hooks themselves are TypeScript files that the bash wrapper
|
||||
shells into bun.
|
||||
|
||||
## Open questions deferred to implementation
|
||||
|
||||
1. **Recommended-option parsing scope.** D2 says parse `(recommended)`
|
||||
label first. The label is on the option's `label` field per
|
||||
AskUserQuestion Format. Implementation will need to walk `tool_input.
|
||||
questions[*].options[*]` looking for the label suffix. Worked
|
||||
examples: ship/SKILL.md.tmpl emits options like `"A) Fix now"
|
||||
(recommended)`.
|
||||
|
||||
2. **Auto-decided event tagging.** When hook returns `updatedInput`, the
|
||||
PostToolUse hook will see the resolved input and log a normal event.
|
||||
Need an extra field on the PostToolUse payload (e.g.,
|
||||
`was_auto_decided: true`) that the hook can set via session state
|
||||
tracking — write a marker file in `~/.gstack/sessions/<id>/.auto-decided-<tool_use_id>`
|
||||
from PreToolUse, read it from PostToolUse, delete on read.
|
||||
|
||||
3. **Timeout behavior.** Default hook timeout is 60s but the docs are
|
||||
thin on what happens at timeout. Set explicit `timeout: 5` so the
|
||||
user never waits >5s on a hook misfire. Falls back to pass-through.
|
||||
|
||||
## References
|
||||
|
||||
- https://code.claude.com/docs/en/hooks (canonical, latest as of 2026-04)
|
||||
- WebSearch results 2026-05-27
|
||||
- Existing `bin/gstack-settings-hook` (SessionStart-only impl, to be
|
||||
superseded by T3 schema-aware rewrite)
|
||||
|
|
@ -0,0 +1,171 @@
|
|||
# Spike: Codex session storage format for plan-tune cathedral
|
||||
|
||||
**Status:** complete (2026-05-27)
|
||||
**Surfaces:** D5 (Codex import parses structured files, not regex)
|
||||
**Downstream consumers:** T9 (gstack-codex-session-import)
|
||||
|
||||
## Question this spike answers
|
||||
|
||||
What's the actual on-disk format of Codex sessions, and how do we recover
|
||||
AskUserQuestion-shaped events from it for `gstack-codex-session-import`?
|
||||
|
||||
## Storage layout
|
||||
|
||||
```
|
||||
~/.codex/
|
||||
├── auth.json # Codex auth (do not touch)
|
||||
├── config.toml # User config
|
||||
├── goals_1.sqlite # ~24KB, internal goals DB (not relevant)
|
||||
├── logs_2.sqlite # ~16MB, structured logs (target=*, see schema)
|
||||
├── history.jsonl # ~9KB, command history
|
||||
└── sessions/
|
||||
└── 2026/05/27/
|
||||
└── rollout-<iso8601>-<uuid>.jsonl # per-session transcript
|
||||
```
|
||||
|
||||
Session files: one JSONL per `codex exec` or interactive session. Cwd path
|
||||
embedded in the `session_meta` event. CLI version recorded.
|
||||
|
||||
## Session JSONL event types (measured on Garry's machine, 2026-05-27)
|
||||
|
||||
| type | count | meaning |
|
||||
|----------------|------:|---------|
|
||||
| `response_item`| 382 | model's response stream (~76%) |
|
||||
| `event_msg` | 97 | high-level session events (~19%) |
|
||||
| `turn_context` | 6 | per-turn context snapshot |
|
||||
| `session_meta` | 6 | session header (one per session) |
|
||||
|
||||
### response_item subtypes
|
||||
|
||||
| subtype | count | meaning |
|
||||
|--------------------------|------:|---------|
|
||||
| `function_call` | 148 | model invoked a tool |
|
||||
| `function_call_output` | 148 | tool result returned to model |
|
||||
| `reasoning` | 44 | reasoning summary |
|
||||
| `message` | 40 | text message (input_text or output_text) |
|
||||
| `web_search_call` | 2 | web search tool call |
|
||||
|
||||
### event_msg subtypes
|
||||
|
||||
| subtype | count | meaning |
|
||||
|-------------------|------:|---------|
|
||||
| `token_count` | 55 | per-step token accounting |
|
||||
| `agent_message` | 22 | agent's prose output |
|
||||
| `user_message` | 6 | user's prose input |
|
||||
| `task_started` | 6 | task start (one per top-level task) |
|
||||
| `task_complete` | 6 | task complete |
|
||||
| `web_search_end` | 2 | web search completion |
|
||||
|
||||
## Critical finding: Codex has no `AskUserQuestion` tool
|
||||
|
||||
Codex doesn't surface AskUserQuestion as a tool call in `response_item`
|
||||
stream. Gstack skills running on Codex emit AskUserQuestion-shaped
|
||||
Decision Briefs as plain prose inside `agent_message` events (the
|
||||
`AskUserQuestion Format` from preamble). The user's answer comes back in
|
||||
the next `user_message`.
|
||||
|
||||
This means importing AUQ events from Codex sessions is structurally
|
||||
different from importing them from Claude Code (where they ARE
|
||||
tool calls):
|
||||
|
||||
- **Claude Code:** hook captures structured `tool_input`/`tool_output`
|
||||
for `AskUserQuestion`. Question + options + answer all separated.
|
||||
- **Codex:** parser must extract from `agent_message.text` body, detect
|
||||
the D-numbered Decision Brief pattern, then match against the
|
||||
subsequent `user_message` for the answer.
|
||||
|
||||
## Recovery strategy for `gstack-codex-session-import`
|
||||
|
||||
**Two-tier extraction:**
|
||||
|
||||
1. **Marker-first (D18 mechanism).** Search `agent_message` text for the
|
||||
`<gstack-qid:foo-bar>` marker. If present, we have an exact question_id
|
||||
and can reliably recover. (Will work once T14 adds markers to the top
|
||||
10 registry questions and Codex starts emitting them via the
|
||||
host-aware preamble path.)
|
||||
|
||||
2. **Pattern fallback.** When no marker, parse for:
|
||||
- `D<N> — <title>` line (D-number from AskUserQuestion Format)
|
||||
- `Recommendation: ...` line
|
||||
- Option block `A) ...`, `B) ...`, etc.
|
||||
- Next `user_message` event for the chosen option label
|
||||
|
||||
Use this only to populate hash-based question_id (the same
|
||||
`hook-<sha1(skill+text+sorted_options)[:10]>` shape Layer 1 uses on
|
||||
Claude). Tagged `source: "codex-pattern-fallback"`, never used as
|
||||
preference key (per D18 hash drift guidance).
|
||||
|
||||
## Schema we'll write to question-log.jsonl from Codex import
|
||||
|
||||
Per existing `bin/gstack-question-log` schema, augmented with:
|
||||
- `source: "codex-import-marker"` (when qid marker found)
|
||||
- `source: "codex-import-pattern"` (when fallback regex used)
|
||||
- `codex_session_id` (UUID from session_meta)
|
||||
- `codex_cwd` (working dir from session_meta — disambiguates project)
|
||||
- `codex_ts` (timestamp from event)
|
||||
|
||||
## Sqlite logs_2.sqlite schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
ts INTEGER NOT NULL,
|
||||
ts_nanos INTEGER NOT NULL,
|
||||
level TEXT NOT NULL,
|
||||
target TEXT NOT NULL,
|
||||
feedback_log_body TEXT,
|
||||
module_path TEXT,
|
||||
file TEXT,
|
||||
line INTEGER,
|
||||
thread_id TEXT,
|
||||
process_uuid TEXT,
|
||||
estimated_bytes INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
```
|
||||
|
||||
`logs_2.sqlite` is internal telemetry, not session content. **Don't use
|
||||
for AUQ extraction.** Sessions JSONL is authoritative.
|
||||
|
||||
## Project-slug derivation
|
||||
|
||||
From `session_meta.payload.cwd` — derive via the existing
|
||||
`bin/gstack-slug` logic on the cwd path. Conductor worktrees have their
|
||||
own slug naming convention encoded in cwd; the bin already handles this.
|
||||
|
||||
## Versioning safety
|
||||
|
||||
`session_meta.payload.cli_version` records the Codex CLI version (e.g.
|
||||
`0.130.0`). When the importer encounters an unknown version, log a
|
||||
warning to stderr but continue — schema additions are typically
|
||||
backwards-compatible in JSONL.
|
||||
|
||||
If `type` or `payload.type` values change in a future version, we'll see
|
||||
them as `unknown` in the importer's audit log. Add a guarded
|
||||
`KNOWN_VERSIONS = ["0.130.x", "0.131.x", ...]` constant in the importer
|
||||
and bump explicitly when re-testing.
|
||||
|
||||
## Open questions for implementation
|
||||
|
||||
1. **Where does Codex store the "user's answer" exactly?** Need to test
|
||||
with a real `codex exec` run that triggers a Decision Brief and inspect
|
||||
the next event. Likely `event_msg` of subtype `user_message` or a
|
||||
`response_item` of subtype `message` with `role: "user"`. Confirm
|
||||
during T9 implementation.
|
||||
|
||||
2. **Free-text extraction for "Other".** The Decision Brief prose
|
||||
doesn't structurally separate "Other" responses from named options.
|
||||
Pattern fallback will need to detect "Other: <text>" wording in the
|
||||
answer. T10 (dream cycle distill) only fires on this when source is
|
||||
`codex-import-marker` so we can trust the data.
|
||||
|
||||
3. **Conductor cwd handling.** Conductor worktrees share project state
|
||||
but have distinct cwds. The import should bucket events by the
|
||||
project slug, not the cwd directly, so events from sibling worktrees
|
||||
accumulate into the same project view.
|
||||
|
||||
## References
|
||||
|
||||
- Live inspection of `~/.codex/sessions/2026/05/*/`
|
||||
- `sqlite3 ~/.codex/logs_2.sqlite ".schema"` (2026-05-27)
|
||||
- Codex CLI 0.130.0 (current at spike time)
|
||||
- See also: D5 cross-model tension decision in plan file.
|
||||
Loading…
Reference in New Issue