gstack

History

Garry Tan 4ab0269729 feat(codex+review): require synthesis Recommendation in cross-model skills Extends the v1.25.1.0 AskUserQuestion recommendation-quality coverage to the cross-model synthesis surfaces that were previously emitting prose without a structured recommendation: - /codex review (Step 2A) — after presenting Codex output + GATE verdict, must emit `Recommendation: <action> because <reason>` line. Reason must compare against alternatives (other findings, fix-vs-ship, fix-order). - /codex challenge (Step 2B) — same requirement after adversarial output. - /codex consult (Step 2C) — same requirement after consult presentation, with examples for plan-review consults that engage with specific Codex insights. - Claude adversarial subagent (scripts/resolvers/review.ts:446, used by /ship Step 11 + standalone /review) — subagent prompt now ends with "After listing findings, end your output with ONE line in the canonical format Recommendation: <action> because <reason>". Codex adversarial command (line 461) gets the same final-line requirement. The same `judgeRecommendation` helper grades both AskUserQuestion and cross-model synthesis — one rubric, two surfaces. Substance-5 cross-model recommendations explicitly compare against alternatives (a different finding, fix-vs-ship, fix-order). Generic synthesis ("because adversarial review found things") fails at threshold ≥ 4. Tests: - test/llm-judge-recommendation.test.ts gains 5 cross-model fixtures (3 substance ≥ 4, 2 substance < 4). Existing rubric correctly grades them. - test/skill-cross-model-recommendation-emit.test.ts (new, free-tier) — static guard greps codex/SKILL.md.tmpl + scripts/resolvers/review.ts for the canonical emit instruction. Trips before any paid eval if the templates drift. Touchfile: extended `llm-judge-recommendation` entry with codex/SKILL.md.tmpl and scripts/resolvers/review.ts so synthesis-template edits invalidate the fixture re-run. Verified: free `bun test` exits 0 (5/5 static emit-guard tests pass), paid fixture passes 45/45 expect calls in 24s with the cross-model substance-5 fixtures correctly judged at >= 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-01 19:38:12 -07:00
..
preamble	v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )	2026-05-01 08:45:36 -07:00
browse.ts	fix: avoid tilde-in-assignment to silence Claude Code permission prompts (#993 )	2026-04-16 14:49:56 -07:00
codex-helpers.ts	feat: Factory Droid compatibility — works across Claude Code, Codex, and Factory (v0.13.5.0) (#621 )	2026-03-29 08:57:34 -07:00
composition.ts	feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644 )	2026-03-29 23:35:17 -06:00
confidence.ts	feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622 )	2026-03-29 17:02:01 -06:00
constants.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
design.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
dx.ts	feat: /plan-devex-review + /devex-review — DX review skills (v0.15.3.0) (#784 )	2026-04-03 16:22:57 -07:00
gbrain.ts	feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )	2026-04-16 10:41:38 -07:00
index.ts	feat(v1.4.0.0): /make-pdf — markdown to publication-quality PDFs (#1086 )	2026-04-20 13:20:30 +08:00
learnings.ts	feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )	2026-03-31 23:08:22 -06:00
make-pdf.ts	feat(v1.4.0.0): /make-pdf — markdown to publication-quality PDFs (#1086 )	2026-04-20 13:20:30 +08:00
model-overlay.ts	feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166 )	2026-04-23 18:42:58 -07:00
preamble.ts	v1.12.1.0 fix: remove vestigial plan-mode handshake (#1185 )	2026-04-24 02:11:24 -07:00
question-tuning.ts	v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )	2026-04-26 13:55:13 -07:00
review-army.ts	feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030 )	2026-04-16 23:14:03 -07:00
review.ts	feat(codex+review): require synthesis Recommendation in cross-model skills	2026-05-01 19:38:12 -07:00
testing.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
types.ts	v1.11.1.0 fix: plan-mode handshake + canUseTool test harness (#1182 )	2026-04-24 00:04:53 -07:00
utility.ts	feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117 )	2026-04-22 01:06:22 -07:00