feat(pr-prep): step 4.4 — second-opinion via codex review on CLEAN commits

Before filing each CLEAN commit (i.e. not bucketed EXACT_DUP / OVERLAP / SIBLING), invoke `codex review` for independent second opinion. Codex CLI uses a different model family (OpenAI vs Claude) so signal is genuinely independent — catches structural bugs the author missed during write-up. Skill is optional: if `codex` CLI not on PATH, emit soft warning and continue. Don't block on tool availability. Severity escalation: P0/P1 findings bump commit from CLEAN to OVERLAP (don't file as new PR until addressed). P2 stays CLEAN with author discretion (fix-before-file or document in PR body). Real-world motivating case (2026-05-26): two CLEAN-bucketed PRs to garrytan/gbrain (#1427 synopsis doc truncate, #1428 models doctor args[0]) both had codex-caught P2 issues: #1427: env-overridable cap not folded into computeCorpusGeneration hash → different caps produced same corpus_generation → cache invalidation broken. #1428: `gbrain models doctor --help` regressed into network probe run instead of usage print (ternary ordering bug). Both fixed pre-merge as follow-up commits. Net cost avoided: 2 review ping-pong cycles + 2 fix PRs after upstream caught it.
2026-05-26 02:11:12 +10:00 · 2026-05-26 02:11:12 +10:00 · b617f7916d
parent c2cb8bc423
commit b617f7916d
2 changed files with 96 additions and 0 deletions
--- a/pr-prep/SKILL.md
+++ b/pr-prep/SKILL.md
@ -974,6 +974,54 @@ Final severity bucket per commit:
 | **SIBLING** | OPEN issues but no PR; or merged-recently with overlap |
 | **CLEAN** | No hits, or only old closed issues |

+## Step 4.4: Second-opinion review via codex (CLEAN commits only)
+
+For each commit bucketed CLEAN (i.e. not duplicating upstream work),
+run an independent second-opinion code review BEFORE the PR is opened.
+Catches bugs the author missed without spending reviewer attention
+upstream.
+
+The skill assumes `codex` CLI is on PATH (OpenAI's official CLI;
+`brew install codex` on macOS). If absent, emit a soft warning + skip
+this step — don't block. Different model family from Claude gives
+genuine independent signal.
+
+```bash
+if command -v codex >/dev/null 2>&1; then
+  for sha in $CLEAN_COMMIT_SHAS; do
+    diff=$(git show "$sha" --stat --pretty=format:"%s")
+    subject=$(git log -1 --format=%s "$sha")
+    codex review "Review commit ${sha:0:8} '${subject}' for correctness,
+      edge cases, and CONTRIBUTING.md compliance. Focus on: regression
+      risk on adjacent code paths, missing tests, hash/version-bump
+      invariants if touching cache keys, ordering bugs if touching
+      conditionals or dispatchers. Flag P0/P1/P2 issues with file:line."
+  done
+else
+  echo "[pr-prep] codex CLI not found — skipping second-opinion review.
+   Install via 'brew install codex' or pin a fork-specific reviewer in
+   your skill config."
+fi
+```
+
+Surface findings in the report under each commit as a `Codex P{N}`
+line. P0/P1 findings escalate the commit's severity to OVERLAP at
+minimum (don't file as CLEAN until addressed). P2 findings stay
+CLEAN — author decides whether to fix-before-file or note-in-PR-body.
+
+Real-world example (2026-05-26 motivating case):
+- PR #1427 (synopsis doc truncate) → codex P2: env-overridable cap
+  not folded into `computeCorpusGeneration` hash. Different caps
+  produce same `corpus_generation` → cache invalidation breaks.
+  Fixed pre-merge, pushed as follow-up commit, comment posted to PR.
+- PR #1428 (models doctor args[0]) → codex P2: `--help` regressed
+  into running network probes. Reorder ternary so `hasHelp` checked
+  first. Fixed pre-merge.
+
+Both findings were structural, not stylistic. Author missed them
+during own write-up. Net cost avoided: 2 review-cycle ping-pongs
+upstream + a follow-up fix PR per finding.
+
 ## Step 4.5: Surface CONTRIBUTING.md pre-push gates per commit

 For each commit that survives audit (CLEAN / OVERLAP / SIBLING — not
--- a/pr-prep/SKILL.md.tmpl
+++ b/pr-prep/SKILL.md.tmpl
@ -162,6 +162,54 @@ Final severity bucket per commit:
 | **SIBLING** | OPEN issues but no PR; or merged-recently with overlap |
 | **CLEAN** | No hits, or only old closed issues |

+## Step 4.4: Second-opinion review via codex (CLEAN commits only)
+
+For each commit bucketed CLEAN (i.e. not duplicating upstream work),
+run an independent second-opinion code review BEFORE the PR is opened.
+Catches bugs the author missed without spending reviewer attention
+upstream.
+
+The skill assumes `codex` CLI is on PATH (OpenAI's official CLI;
+`brew install codex` on macOS). If absent, emit a soft warning + skip
+this step — don't block. Different model family from Claude gives
+genuine independent signal.
+
+```bash
+if command -v codex >/dev/null 2>&1; then
+  for sha in $CLEAN_COMMIT_SHAS; do
+    diff=$(git show "$sha" --stat --pretty=format:"%s")
+    subject=$(git log -1 --format=%s "$sha")
+    codex review "Review commit ${sha:0:8} '${subject}' for correctness,
+      edge cases, and CONTRIBUTING.md compliance. Focus on: regression
+      risk on adjacent code paths, missing tests, hash/version-bump
+      invariants if touching cache keys, ordering bugs if touching
+      conditionals or dispatchers. Flag P0/P1/P2 issues with file:line."
+  done
+else
+  echo "[pr-prep] codex CLI not found — skipping second-opinion review.
+   Install via 'brew install codex' or pin a fork-specific reviewer in
+   your skill config."
+fi
+```
+
+Surface findings in the report under each commit as a `Codex P{N}`
+line. P0/P1 findings escalate the commit's severity to OVERLAP at
+minimum (don't file as CLEAN until addressed). P2 findings stay
+CLEAN — author decides whether to fix-before-file or note-in-PR-body.
+
+Real-world example (2026-05-26 motivating case):
+- PR #1427 (synopsis doc truncate) → codex P2: env-overridable cap
+  not folded into `computeCorpusGeneration` hash. Different caps
+  produce same `corpus_generation` → cache invalidation breaks.
+  Fixed pre-merge, pushed as follow-up commit, comment posted to PR.
+- PR #1428 (models doctor args[0]) → codex P2: `--help` regressed
+  into running network probes. Reorder ternary so `hasHelp` checked
+  first. Fixed pre-merge.
+
+Both findings were structural, not stylistic. Author missed them
+during own write-up. Net cost avoided: 2 review-cycle ping-pongs
+upstream + a follow-up fix PR per finding.
+
 ## Step 4.5: Surface CONTRIBUTING.md pre-push gates per commit

 For each commit that survives audit (CLEAN / OVERLAP / SIBLING — not