gstack/test/helpers
Garry Tan 866982decd
feat(test/helpers): runPlanSkillFloorCheck — minimal AskUserQuestion-floor observer
Adds a focused PTY observer that exits at the first non-permission
numbered-option render. Catches the May 2026 transcript-bug class
(model wrote plan + ExitPlanMode without firing any AUQ) without
needing to fingerprint or navigate past the AUQ.

Why separate from runPlanSkillCounting: plan-mode AUQs render every
option on a single logical line via cursor-positioning escapes that
stripAnsi can't simulate, so parseNumberedOptions returns < 2 options
and never records a fingerprint. Counting tests work on 25-min budgets
because eventually one frame parses cleanly; gate-tier floor tests
need to exit early on the first observation. Trades fingerprint
precision for early-exit reliability.

Also drops COMPLETION_SUMMARY_RE check from this helper — it matches
"GSTACK REVIEW REPORT" anywhere in the buffer including when the
agent does recon by reading existing plan files. plan_ready
(claude's actual "Ready to execute" confirmation) is the reliable
terminal signal for "agent finished without asking."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:47:42 -07:00
..
providers v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252) 2026-05-01 07:21:28 -07:00
agent-sdk-runner.ts v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252) 2026-05-01 07:21:28 -07:00
benchmark-judge.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
benchmark-runner.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
claude-pty-runner.ts feat(test/helpers): runPlanSkillFloorCheck — minimal AskUserQuestion-floor observer 2026-05-06 19:47:42 -07:00
claude-pty-runner.unit.test.ts v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313) 2026-05-03 20:26:59 -07:00
codex-session-runner.ts fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391) 2026-03-23 08:44:08 -07:00
e2e-helpers.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
eval-store.test.ts feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83) 2026-03-15 23:55:39 -05:00
eval-store.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
gemini-session-runner.test.ts feat: Gemini CLI E2E tests (v0.9.2.0) (#252) 2026-03-20 08:30:09 -07:00
gemini-session-runner.ts feat: Gemini CLI E2E tests (v0.9.2.0) (#252) 2026-03-20 08:30:09 -07:00
llm-judge.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
observability.test.ts fix: never clean up observability artifacts — partial file persists after finalize 2026-03-14 12:37:38 -05:00
pricing.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
secret-sink-harness.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
session-runner.test.ts feat: stream-json NDJSON parser for real-time E2E progress 2026-03-14 03:49:36 -05:00
session-runner.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-parser.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
tool-map.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
touchfiles.ts v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313) 2026-05-03 20:26:59 -07:00