feat(pr-prep): step 4.4 — second-opinion via codex review on CLEAN commits

Before filing each CLEAN commit (i.e. not bucketed EXACT_DUP / OVERLAP /
SIBLING), invoke `codex review` for independent second opinion. Codex
CLI uses a different model family (OpenAI vs Claude) so signal is
genuinely independent — catches structural bugs the author missed
during write-up.

Skill is optional: if `codex` CLI not on PATH, emit soft warning
and continue. Don't block on tool availability.

Severity escalation: P0/P1 findings bump commit from CLEAN to OVERLAP
(don't file as new PR until addressed). P2 stays CLEAN with author
discretion (fix-before-file or document in PR body).

Real-world motivating case (2026-05-26): two CLEAN-bucketed PRs to
garrytan/gbrain (#1427 synopsis doc truncate, #1428 models doctor
args[0]) both had codex-caught P2 issues:

  #1427: env-overridable cap not folded into computeCorpusGeneration
    hash → different caps produced same corpus_generation → cache
    invalidation broken.
  #1428: `gbrain models doctor --help` regressed into network probe
    run instead of usage print (ternary ordering bug).

Both fixed pre-merge as follow-up commits. Net cost avoided: 2
review ping-pong cycles + 2 fix PRs after upstream caught it.
This commit is contained in:
Benjamin D. Smith 2026-05-26 02:11:12 +10:00
parent c2cb8bc423
commit b617f7916d
2 changed files with 96 additions and 0 deletions

View File

@ -974,6 +974,54 @@ Final severity bucket per commit:
| **SIBLING** | OPEN issues but no PR; or merged-recently with overlap |
| **CLEAN** | No hits, or only old closed issues |
## Step 4.4: Second-opinion review via codex (CLEAN commits only)
For each commit bucketed CLEAN (i.e. not duplicating upstream work),
run an independent second-opinion code review BEFORE the PR is opened.
Catches bugs the author missed without spending reviewer attention
upstream.
The skill assumes `codex` CLI is on PATH (OpenAI's official CLI;
`brew install codex` on macOS). If absent, emit a soft warning + skip
this step — don't block. Different model family from Claude gives
genuine independent signal.
```bash
if command -v codex >/dev/null 2>&1; then
for sha in $CLEAN_COMMIT_SHAS; do
diff=$(git show "$sha" --stat --pretty=format:"%s")
subject=$(git log -1 --format=%s "$sha")
codex review "Review commit ${sha:0:8} '${subject}' for correctness,
edge cases, and CONTRIBUTING.md compliance. Focus on: regression
risk on adjacent code paths, missing tests, hash/version-bump
invariants if touching cache keys, ordering bugs if touching
conditionals or dispatchers. Flag P0/P1/P2 issues with file:line."
done
else
echo "[pr-prep] codex CLI not found — skipping second-opinion review.
Install via 'brew install codex' or pin a fork-specific reviewer in
your skill config."
fi
```
Surface findings in the report under each commit as a `Codex P{N}`
line. P0/P1 findings escalate the commit's severity to OVERLAP at
minimum (don't file as CLEAN until addressed). P2 findings stay
CLEAN — author decides whether to fix-before-file or note-in-PR-body.
Real-world example (2026-05-26 motivating case):
- PR #1427 (synopsis doc truncate) → codex P2: env-overridable cap
not folded into `computeCorpusGeneration` hash. Different caps
produce same `corpus_generation` → cache invalidation breaks.
Fixed pre-merge, pushed as follow-up commit, comment posted to PR.
- PR #1428 (models doctor args[0]) → codex P2: `--help` regressed
into running network probes. Reorder ternary so `hasHelp` checked
first. Fixed pre-merge.
Both findings were structural, not stylistic. Author missed them
during own write-up. Net cost avoided: 2 review-cycle ping-pongs
upstream + a follow-up fix PR per finding.
## Step 4.5: Surface CONTRIBUTING.md pre-push gates per commit
For each commit that survives audit (CLEAN / OVERLAP / SIBLING — not

View File

@ -162,6 +162,54 @@ Final severity bucket per commit:
| **SIBLING** | OPEN issues but no PR; or merged-recently with overlap |
| **CLEAN** | No hits, or only old closed issues |
## Step 4.4: Second-opinion review via codex (CLEAN commits only)
For each commit bucketed CLEAN (i.e. not duplicating upstream work),
run an independent second-opinion code review BEFORE the PR is opened.
Catches bugs the author missed without spending reviewer attention
upstream.
The skill assumes `codex` CLI is on PATH (OpenAI's official CLI;
`brew install codex` on macOS). If absent, emit a soft warning + skip
this step — don't block. Different model family from Claude gives
genuine independent signal.
```bash
if command -v codex >/dev/null 2>&1; then
for sha in $CLEAN_COMMIT_SHAS; do
diff=$(git show "$sha" --stat --pretty=format:"%s")
subject=$(git log -1 --format=%s "$sha")
codex review "Review commit ${sha:0:8} '${subject}' for correctness,
edge cases, and CONTRIBUTING.md compliance. Focus on: regression
risk on adjacent code paths, missing tests, hash/version-bump
invariants if touching cache keys, ordering bugs if touching
conditionals or dispatchers. Flag P0/P1/P2 issues with file:line."
done
else
echo "[pr-prep] codex CLI not found — skipping second-opinion review.
Install via 'brew install codex' or pin a fork-specific reviewer in
your skill config."
fi
```
Surface findings in the report under each commit as a `Codex P{N}`
line. P0/P1 findings escalate the commit's severity to OVERLAP at
minimum (don't file as CLEAN until addressed). P2 findings stay
CLEAN — author decides whether to fix-before-file or note-in-PR-body.
Real-world example (2026-05-26 motivating case):
- PR #1427 (synopsis doc truncate) → codex P2: env-overridable cap
not folded into `computeCorpusGeneration` hash. Different caps
produce same `corpus_generation` → cache invalidation breaks.
Fixed pre-merge, pushed as follow-up commit, comment posted to PR.
- PR #1428 (models doctor args[0]) → codex P2: `--help` regressed
into running network probes. Reorder ternary so `hasHelp` checked
first. Fixed pre-merge.
Both findings were structural, not stylistic. Author missed them
during own write-up. Net cost avoided: 2 review-cycle ping-pongs
upstream + a follow-up fix PR per finding.
## Step 4.5: Surface CONTRIBUTING.md pre-push gates per commit
For each commit that survives audit (CLEAN / OVERLAP / SIBLING — not