gstack/test
Garry Tan aaf2668103
test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills
Conductor launches Claude Code with --disallowedTools AskUserQuestion
--permission-mode default --permission-prompt-tool stdio (verified by
inspecting the live conductor claude process via ps -p ... -o args=).
Native AskUserQuestion is removed from the model's tool registry; without
fallback guidance the plan-mode skills (plan-ceo-review, plan-eng-review,
plan-design-review, plan-devex-review, autoplan, office-hours) silently
proceed and never surface decisions to the user.

Adds 6 gate-tier real-PTY regression cases:

  - 4 inline test cases inside the existing plan-X-review-plan-mode.test
    files, each exercising the same skill with extraArgs ['--disallowedTools',
    'AskUserQuestion'] and asserting outcome === 'asked'. plan-design-review
    keeps the ['asked', 'plan_ready'] envelope (legitimate short-circuit on
    no-UI-scope) but explicitly fails on 'auto_decided'.
  - 2 standalone test files for autoplan + office-hours (which had no prior
    plan-mode test). autoplan asserts the FIRST non-auto-decided gate fires
    (Phase 1 premise confirmation) — autoplan auto-decides intermediate
    questions BY DESIGN.

Touchfile entries:
  - autoplan-auto-mode + office-hours-auto-mode added to E2E_TOUCHFILES +
    E2E_TIERS (gate)
  - existing plan-X-review-plan-mode entries gain question-tuning.ts and
    generate-ask-user-format.ts touchfile deps so AUTO_DECIDE-related
    resolver changes correctly invalidate the regression tests
  - touchfiles.test.ts count updated 18 -> 19 to cover the autoplan
    touchfile dependency on plan-ceo-review/**

Filenames retain `auto-mode` for branch-history continuity. Auto-mode (the
AUTO_DECIDE preamble path when QUESTION_TUNING=true) is a related but
distinct silencing mechanism; both share the same fix surface in the
preamble.

These tests are expected to FAIL on this branch until the fix lands. The
failure is the receipt for the regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:13:48 -07:00
..
fixtures v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
helpers test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
agent-sdk-runner.test.ts v1.11.1.0 fix: plan-mode handshake + canUseTool test harness (#1182) 2026-04-24 00:04:53 -07:00
analytics.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
audit-compliance.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
benchmark-cli.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
benchmark-runner.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
brain-sync.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
builder-profile.test.ts feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937) 2026-04-08 22:21:28 -10:00
codex-e2e-plan-format.test.ts fix(plan-reviews): restore RECOMMENDATION + Completeness split + Codex ELI10 (v1.6.3.0) (#1149) 2026-04-23 07:25:20 -07:00
codex-e2e.test.ts feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425) 2026-03-23 23:05:22 -07:00
codex-hardening.test.ts codex + Apple Silicon hardening wave (v0.18.4.0) (#1056) 2026-04-18 12:30:54 +08:00
context-save-hardening.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
diff-scope.test.ts feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
e2e-harness-audit.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
explain-level-config.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
gbrain-detect-install.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
gbrain-lib-verify.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
gbrain-repo-policy.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
gbrain-supabase-provision.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
gemini-e2e.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
gen-skill-docs.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
global-discover.test.ts fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817) 2026-04-05 02:02:06 -07:00
gstack-brain-init-gh-mock.test.ts v1.12.2.0 fix: /setup-gbrain day-two fixes (MCP scope, version parse, gh repo create order, smoke test) (#1187) 2026-04-24 07:51:46 -07:00
gstack-developer-profile.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
gstack-gbrain-source-wireup.test.ts v1.17.0.0: setup-gbrain wireup ships the gbrain federation surface (#1234) 2026-04-28 01:17:54 -07:00
gstack-next-version.test.ts v1.11.0.0 feat(ship): workspace-aware version allocation (#1168) 2026-04-23 23:03:27 -07:00
gstack-question-log.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
gstack-question-preference.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
gstack-upgrade-migration-v1_17_0_0.test.ts v1.17.0.0: setup-gbrain wireup ships the gbrain federation surface (#1234) 2026-04-28 01:17:54 -07:00
helpers-unit.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
hook-scripts.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
host-config.test.ts community wave: 6 PRs + hardening (v0.18.1.0) (#1028) 2026-04-17 00:45:13 -07:00
jargon-list.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
learnings-injection.test.ts fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847) 2026-04-06 00:47:04 -07:00
learnings.test.ts feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622) 2026-03-29 17:02:01 -06:00
migration-checkpoint-ownership.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
model-overlay-opus-4-7.test.ts v1.13.0.0 feat: add Claude outside-voice skill (#1212) 2026-04-25 11:52:48 -07:00
openclaw-native-skills.test.ts community wave: 6 PRs + hardening (v0.18.1.0) (#1028) 2026-04-17 00:45:13 -07:00
plan-tune.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
preamble-compose.test.ts v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178) 2026-04-23 18:25:34 -07:00
readme-throughput.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
relink.test.ts fix: headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) (#1025) 2026-04-16 15:39:44 -07:00
resolver-ask-user-format.test.ts v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178) 2026-04-23 18:25:34 -07:00
review-log.test.ts fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552) 2026-03-26 23:21:27 -06:00
secret-sink-harness.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
setup-codesign.test.ts codex + Apple Silicon hardening wave (v0.18.4.0) (#1056) 2026-04-18 12:30:54 +08:00
ship-version-sync.test.ts fix(ship): detect + repair VERSION/package.json drift in Step 12 (v1.1.1.0) (#1063) 2026-04-18 23:58:59 +08:00
skill-budget-regression.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-collision-sentinel.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-ask-user-question-format-compliance.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-autoplan-auto-mode.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
skill-e2e-autoplan-chain.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-autoplan-dual-voice.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-benchmark-providers.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
skill-e2e-brain-privacy-gate.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
skill-e2e-bws.test.ts feat(v1.9.0.0): gbrain-sync — cross-machine gstack memory (#1151) 2026-04-23 17:54:54 -07:00
skill-e2e-context-skills.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-cso.test.ts feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384) 2026-03-23 06:57:22 -07:00
skill-e2e-deploy.test.ts feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518) 2026-03-26 11:08:31 -07:00
skill-e2e-design.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-learnings.test.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
skill-e2e-office-hours-auto-mode.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
skill-e2e-office-hours.test.ts feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065) 2026-04-19 05:44:39 +08:00
skill-e2e-opus-47.test.ts feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117) 2026-04-22 01:06:22 -07:00
skill-e2e-overlay-harness.test.ts feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166) 2026-04-23 18:42:58 -07:00
skill-e2e-plan-ceo-mode-routing.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-plan-ceo-plan-mode.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
skill-e2e-plan-design-plan-mode.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
skill-e2e-plan-design-with-ui.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-plan-devex-plan-mode.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
skill-e2e-plan-eng-plan-mode.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
skill-e2e-plan-format.test.ts v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178) 2026-04-23 18:25:34 -07:00
skill-e2e-plan-mode-no-op.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-plan-prosons.test.ts v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178) 2026-04-23 18:25:34 -07:00
skill-e2e-plan-tune.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
skill-e2e-plan.test.ts feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065) 2026-04-19 05:44:39 +08:00
skill-e2e-qa-bugs.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-qa-workflow.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-review-army.test.ts feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
skill-e2e-review.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
skill-e2e-session-intelligence.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-ship-idempotency.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-sidebar.test.ts feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793) 2026-04-04 15:32:20 -07:00
skill-e2e-skillify.test.ts v1.20.0.0 feat: browser-skills runtime + gbrain-support carryover (#1233) 2026-04-28 20:08:04 -07:00
skill-e2e-workflow.test.ts refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873) 2026-04-07 00:23:36 -07:00
skill-e2e.test.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
skill-llm-eval.test.ts feat: voice directive for all skills (v0.12.3.0) (#520) 2026-03-26 17:31:53 -06:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-routing-e2e.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
skill-validation.test.ts v1.20.0.0 feat: browser-skills runtime + gbrain-support carryover (#1233) 2026-04-28 20:08:04 -07:00
taste-engine.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
team-mode.test.ts feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117) 2026-04-22 01:06:22 -07:00
telemetry.test.ts feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641) 2026-03-29 21:43:36 -06:00
timeline.test.ts feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733) 2026-04-01 00:50:42 -06:00
touchfiles.test.ts test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills 2026-04-30 21:13:48 -07:00
uninstall.test.ts feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561) 2026-03-27 00:44:37 -06:00
upgrade-migration-v1.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
v0-dormancy.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
worktree.test.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
writing-style-resolver.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00