gstack

History

Garry Tan 4ab0269729 feat(codex+review): require synthesis Recommendation in cross-model skills Extends the v1.25.1.0 AskUserQuestion recommendation-quality coverage to the cross-model synthesis surfaces that were previously emitting prose without a structured recommendation: - /codex review (Step 2A) — after presenting Codex output + GATE verdict, must emit `Recommendation: <action> because <reason>` line. Reason must compare against alternatives (other findings, fix-vs-ship, fix-order). - /codex challenge (Step 2B) — same requirement after adversarial output. - /codex consult (Step 2C) — same requirement after consult presentation, with examples for plan-review consults that engage with specific Codex insights. - Claude adversarial subagent (scripts/resolvers/review.ts:446, used by /ship Step 11 + standalone /review) — subagent prompt now ends with "After listing findings, end your output with ONE line in the canonical format Recommendation: <action> because <reason>". Codex adversarial command (line 461) gets the same final-line requirement. The same `judgeRecommendation` helper grades both AskUserQuestion and cross-model synthesis — one rubric, two surfaces. Substance-5 cross-model recommendations explicitly compare against alternatives (a different finding, fix-vs-ship, fix-order). Generic synthesis ("because adversarial review found things") fails at threshold ≥ 4. Tests: - test/llm-judge-recommendation.test.ts gains 5 cross-model fixtures (3 substance ≥ 4, 2 substance < 4). Existing rubric correctly grades them. - test/skill-cross-model-recommendation-emit.test.ts (new, free-tier) — static guard greps codex/SKILL.md.tmpl + scripts/resolvers/review.ts for the canonical emit instruction. Trips before any paid eval if the templates drift. Touchfile: extended `llm-judge-recommendation` entry with codex/SKILL.md.tmpl and scripts/resolvers/review.ts so synthesis-template edits invalidate the fixture re-run. Verified: free `bun test` exits 0 (5/5 static emit-guard tests pass), paid fixture passes 45/45 expect calls in 24s with the cross-model substance-5 fixtures correctly judged at >= 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-01 19:38:12 -07:00
..
specialists	feat: adaptive gating + cross-review dedup for review army (v0.15.2.0) (#760 )	2026-04-04 22:46:21 -07:00
SKILL.md	feat(codex+review): require synthesis Recommendation in cross-model skills	2026-05-01 19:38:12 -07:00
SKILL.md.tmpl	v1.11.0.0 feat(ship): workspace-aware version allocation (#1168 )	2026-04-23 23:03:27 -07:00
TODOS-format.md	feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61 )	2026-03-14 20:15:11 -07:00
checklist.md	feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )	2026-03-30 22:07:50 -06:00
design-checklist.md	feat: adaptive gating + cross-review dedup for review army (v0.15.2.0) (#760 )	2026-04-04 22:46:21 -07:00
greptile-triage.md	feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61 )	2026-03-14 20:15:11 -07:00