gstack/scripts/resolvers
Garry Tan 4ab0269729
feat(codex+review): require synthesis Recommendation in cross-model skills
Extends the v1.25.1.0 AskUserQuestion recommendation-quality coverage to the
cross-model synthesis surfaces that were previously emitting prose without a
structured recommendation:

- /codex review (Step 2A) — after presenting Codex output + GATE verdict,
  must emit `Recommendation: <action> because <reason>` line. Reason must
  compare against alternatives (other findings, fix-vs-ship, fix-order).
- /codex challenge (Step 2B) — same requirement after adversarial output.
- /codex consult (Step 2C) — same requirement after consult presentation,
  with examples for plan-review consults that engage with specific Codex
  insights.
- Claude adversarial subagent (scripts/resolvers/review.ts:446, used by
  /ship Step 11 + standalone /review) — subagent prompt now ends with
  "After listing findings, end your output with ONE line in the canonical
  format Recommendation: <action> because <reason>". Codex adversarial
  command (line 461) gets the same final-line requirement.

The same `judgeRecommendation` helper grades both AskUserQuestion and
cross-model synthesis — one rubric, two surfaces. Substance-5 cross-model
recommendations explicitly compare against alternatives (a different
finding, fix-vs-ship, fix-order). Generic synthesis ("because adversarial
review found things") fails at threshold ≥ 4.

Tests:
- test/llm-judge-recommendation.test.ts gains 5 cross-model fixtures (3
  substance ≥ 4, 2 substance < 4). Existing rubric correctly grades them.
- test/skill-cross-model-recommendation-emit.test.ts (new, free-tier) —
  static guard greps codex/SKILL.md.tmpl + scripts/resolvers/review.ts for
  the canonical emit instruction. Trips before any paid eval if the
  templates drift.

Touchfile: extended `llm-judge-recommendation` entry with codex/SKILL.md.tmpl
and scripts/resolvers/review.ts so synthesis-template edits invalidate the
fixture re-run.

Verified: free `bun test` exits 0 (5/5 static emit-guard tests pass), paid
fixture passes 45/45 expect calls in 24s with the cross-model substance-5
fixtures correctly judged at >= 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:38:12 -07:00
..
preamble v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287) 2026-05-01 08:45:36 -07:00
browse.ts fix: avoid tilde-in-assignment to silence Claude Code permission prompts (#993) 2026-04-16 14:49:56 -07:00
codex-helpers.ts feat: Factory Droid compatibility — works across Claude Code, Codex, and Factory (v0.13.5.0) (#621) 2026-03-29 08:57:34 -07:00
composition.ts feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644) 2026-03-29 23:35:17 -06:00
confidence.ts feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622) 2026-03-29 17:02:01 -06:00
constants.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
design.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
dx.ts feat: /plan-devex-review + /devex-review — DX review skills (v0.15.3.0) (#784) 2026-04-03 16:22:57 -07:00
gbrain.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
index.ts feat(v1.4.0.0): /make-pdf — markdown to publication-quality PDFs (#1086) 2026-04-20 13:20:30 +08:00
learnings.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
make-pdf.ts feat(v1.4.0.0): /make-pdf — markdown to publication-quality PDFs (#1086) 2026-04-20 13:20:30 +08:00
model-overlay.ts feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166) 2026-04-23 18:42:58 -07:00
preamble.ts v1.12.1.0 fix: remove vestigial plan-mode handshake (#1185) 2026-04-24 02:11:24 -07:00
question-tuning.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
review-army.ts feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030) 2026-04-16 23:14:03 -07:00
review.ts feat(codex+review): require synthesis Recommendation in cross-model skills 2026-05-01 19:38:12 -07:00
testing.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
types.ts v1.11.1.0 fix: plan-mode handshake + canUseTool test harness (#1182) 2026-04-24 00:04:53 -07:00
utility.ts feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117) 2026-04-22 01:06:22 -07:00