gstack/test/helpers
Garry Tan a5833c413f
v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966)
* feat(config): make codex_reviews the master switch for all Codex review

Broaden the codex_reviews doc to describe it governing /review, /ship,
/document-release, plan reviews, and /autoplan. Reject invalid values on
set (preserving the existing value) so a typo can never silently flip
paid Codex calls on or off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(review): Codex review default-on across review/ship/plan/docs

Add a shared codexPreflight() helper (constants.ts) that, in one bash
block, reads codex_reviews, sources gstack-codex-probe, checks install +
auth, and echoes a single canonical mode (ready/not_installed/not_authed/
disabled). All Codex resolvers route through it.

- generateCodexPlanReview: opt-in question removed; the outside voice now
  runs automatically (default-on), falling back to a Claude subagent when
  Codex is missing/unauthed. Cross-model tension still gates on user
  approval (sovereignty preserved).
- generateAdversarialStep: probe-based availability (install AND auth),
  distinct not-installed vs not-authed guidance; 200-line structured-review
  threshold unchanged.
- generateCodexDocReview (new, wired via CODEX_DOC_REVIEW): reviews the
  release's docs against the shipped diff range, informational + an explicit
  apply-fixes decision point, never auto-edits.
- autoplan Phase 0.5 now honors codex_reviews=disabled so the switch is
  truly global.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(docs): regenerate SKILL docs + refresh ship golden

Output of gen:skill-docs for the Codex-default-on resolver/template
changes. Refreshes the factory-ship golden fixture (codex-host output
unchanged — resolvers strip for the codex host).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(infra): widen size-budget guards for default-on Codex outside-voice

The codexPreflight() block + CODEX_MODE branch prose (replacing the
smaller opt-in question) grows plan-ceo/eng/devex-review and review by
5-7% over baseline. Each bump carries a comment justifying it as
intentional capability, not slop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: guard Codex default-on + config reject-on-set

skill-validation: assert plan reviews no longer carry the opt-in question
and render the default-on outside-voice, document-release carries the doc
review, and the codex host strips all of it.

gstack-config: codex_reviews defaults to enabled, accepts enabled/disabled,
and rejects an invalid value while preserving the existing one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(test): align gstack-config tests with defaults-fallback behavior

Three tests (last touched v0.13.7.0) asserted get/list print empty for
unset keys, but gstack-config falls back to the documented defaults table
(get returns the default, list shows the active-values block). Update the
assertions to the real behavior and split out an unknown-key case that does
still return empty. Pre-existing red, unrelated to codex review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs

Codex cross-model review now runs by default on /review, /ship, all four
plan reviews, /document-release, and /autoplan, governed by one master
switch (codex_reviews, default enabled). Plan-review outside voice is
default-on; /document-release gets a new Codex doc-vs-diff audit; every
call site detects install AND auth and falls back to a Claude subagent
with a clear reason. Disable everything with:
gstack-config set codex_reviews disabled

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 21:14:58 -07:00
..
providers v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
agent-sdk-runner.ts v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
auq-sdk-capture.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
benchmark-judge.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
benchmark-runner.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
budget-override.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
budget-override.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
capture-parity-baseline.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
capture-parity-baseline.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
carve-guard-checks.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
carve-guards.ts v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
claude-pty-runner.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
claude-pty-runner.unit.test.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
codex-session-runner.ts fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391) 2026-03-23 08:44:08 -07:00
e2e-helpers.ts v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534) 2026-05-16 12:32:33 -07:00
eval-store.test.ts feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83) 2026-03-15 23:55:39 -05:00
eval-store.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
gemini-session-runner.test.ts feat: Gemini CLI E2E tests (v0.9.2.0) (#252) 2026-03-20 08:30:09 -07:00
gemini-session-runner.ts feat: Gemini CLI E2E tests (v0.9.2.0) (#252) 2026-03-20 08:30:09 -07:00
llm-judge.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
observability.test.ts fix: never clean up observability artifacts — partial file persists after finalize 2026-03-14 12:37:38 -05:00
parity-harness.ts v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
pricing.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
required-reads.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
secret-sink-harness.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
session-runner.test.ts feat: stream-json NDJSON parser for real-time E2E progress 2026-03-14 03:49:36 -05:00
session-runner.ts v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
skill-parser.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
tool-map.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
touchfiles.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
transcript-section-logger.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00