gstack

History

Garry Tan 30fe6bb11c v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313 ) * fix(plan-eng-review): tighten STOP gates with anti-rationalization clause Five sites in SKILL.md.tmpl uplift to the office-hours `b512be71` pattern: the four review-section gates (Architecture, Code Quality, Test, Performance) plus the Step 0 complexity-check trigger. Adds tool_use reminder ("call the tool directly"), names blocked next steps explicitly, anti-rationalization clause naming the precise failure mode (loading the schema via ToolSearch and writing the recommendation as chat prose). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(test/helpers): initialPlanContent + wrote_findings_before_asking + shared report-at-bottom assertion Three additions to claude-pty-runner.ts: 1. runPlanSkillObservation gains initialPlanContent?: string. Pre-pumps a user message containing the seeded plan before invoking the skill, with a 3s gap so the message renders before the slash command. claude has no --plan-file flag (verified via claude --help), so message-pump is the route. Lets STOP-gate regression tests force complexity findings. 2. ClassifyResult gains wrote_findings_before_asking with companion strictPlanWrites?: boolean opt on classifyVisible. Fires when a Write/ Edit to .claude/plans/* precedes any AskUserQuestion render in the session window. Default off — preserves zero-findings → write plan → plan_ready as legitimate for unseeded smokes. Six new unit tests cover before/after-AUQ ordering, permission-dialog edge case, strict-off path. 3. assertReportAtBottomIfPlanWritten(obs) shared helper. Wraps the existing assertReviewReportAtBottom(content) and gates on obs.planFile (artifact existing), so the assertion fires under both 'asked' and 'plan_ready' when a plan was actually written. Also: runPlanSkillObservation now captures obs.planFile on every classifier outcome, not just 'plan_ready'. Catches the case where the skill wrote a plan partway through then paused on a question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: wire assertReportAtBottomIfPlanWritten into 4 plan-mode E2E tests + add seeded-plan STOP-gate case Every test case in skill-e2e-plan-{eng,ceo,design,devex}-plan-mode.test.ts that produces a plan file now asserts ## GSTACK REVIEW REPORT is the last ## section. The {{PLAN_FILE_REVIEW_REPORT}} resolver mandated this contract; nothing tested it until now. Plan-eng additionally gains a third test case: STOP gate fires when seeded plan forces Step 0 findings. Combines the new initialPlanContent runner option with --disallowedTools AskUserQuestion to force the Conductor MCP-variant path through mcp____AskUserQuestion. Asserts outcome NOT in {wrote_findings_before_asking, auto_decided, silent_write, exited, timeout} and that plan_ready outcomes carry a ## Decisions section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(touchfiles): delete duplicate plan-design-review-plan-mode keys Verified duplicates in test/helpers/touchfiles.ts: - E2E_TOUCHFILES had plan-design-review-plan-mode at line 94 (full deps) AND line 243 (smaller deps); JS object literals: later wins. - E2E_TIERS had it at line 399 ('gate') AND line 524 ('periodic'); same later-wins rule. Effective tier was 'periodic', not 'gate'. Three of four plan-mode siblings ran on every PR; design ran weekly only. Delete the line-243 and line-524 duplicates. Keep line 94 (full deps) and line 399 ('gate'). Also extend the four plan-mode-test entries to include scripts/resolvers/review.ts so changes to {{PLAN_FILE_REVIEW_REPORT}} trigger all four siblings in bun run eval:select. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.26.2.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: tighten CHANGELOG voice for v1.26.2.0 Move contributor-flavored bullet (runPlanSkillObservation seeding) into For contributors. Drop branch-internal narrative (Codex review pass, plan iteration tracking) per CHANGELOG-for-users style. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-03 20:26:59 -07:00
..
providers	v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )	2026-05-01 07:21:28 -07:00
agent-sdk-runner.ts	v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )	2026-05-01 07:21:28 -07:00
benchmark-judge.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
benchmark-runner.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
claude-pty-runner.ts	v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313 )	2026-05-03 20:26:59 -07:00
claude-pty-runner.unit.test.ts	v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313 )	2026-05-03 20:26:59 -07:00
codex-session-runner.ts	fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391 )	2026-03-23 08:44:08 -07:00
e2e-helpers.ts	v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296 )	2026-05-01 19:51:51 -07:00
eval-store.test.ts	feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83 )	2026-03-15 23:55:39 -05:00
eval-store.ts	v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )	2026-04-26 13:55:13 -07:00
gemini-session-runner.test.ts	feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )	2026-03-20 08:30:09 -07:00
gemini-session-runner.ts	feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )	2026-03-20 08:30:09 -07:00
llm-judge.ts	v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296 )	2026-05-01 19:51:51 -07:00
observability.test.ts	fix: never clean up observability artifacts — partial file persists after finalize	2026-03-14 12:37:38 -05:00
pricing.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
secret-sink-harness.ts	v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )	2026-04-24 01:38:21 -07:00
session-runner.test.ts	feat: stream-json NDJSON parser for real-time E2E progress	2026-03-14 03:49:36 -05:00
session-runner.ts	fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )	2026-04-19 08:38:19 +08:00
skill-parser.ts	feat: content security — 4-layer prompt injection defense for pair-agent (#815 )	2026-04-06 14:41:06 -07:00
tool-map.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
touchfiles.ts	v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313 )	2026-05-03 20:26:59 -07:00