gstack/test
Garry Tan 9fd03fae9e
v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077)
* fix(gbrain): stop forcing GBRAIN_PREPARE on transaction-mode poolers (#1965)

buildGbrainEnv auto-set GBRAIN_PREPARE=true whenever DATABASE_URL targeted
port 6543, and the /sync-gbrain capability check exported it for the rest
of the skill run. Both had the semantics inverted: gbrain auto-disables
prepared statements on transaction-mode poolers because they break every
write there ("prepared statement does not exist"); GBRAIN_PREPARE=true is
gbrain's documented override for SESSION-mode poolers on 6543, not a
requirement for transaction mode. The #1435 search symptom the auto-set
worked around was fixed gbrain-side.

Remove both force-sets. A caller-set GBRAIN_PREPARE (either value) still
passes through untouched, preserving the session-mode-on-6543 escape hatch.
isTransactionModePooler stays exported.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(gbrain): classify probe timeout as its own status; sync proceeds instead of skipping (#1964)

The 5s engine probe misclassified healthy-but-slow engines (cold Supabase
pooler connections measured at 6.9-10.7s) as broken-config, so /sync-gbrain
silently skipped code+memory and told the user their config was malformed.

- New "timeout" status: probe killed at the deadline with no recognized
  stderr pattern. Default deadline is now 15s, overridable via
  GSTACK_GBRAIN_PROBE_TIMEOUT_MS (tests set 300ms against a fake that
  sleeps 2s).
- Sync stages PROCEED on timeout with a stderr warning naming the env knob;
  a genuinely-dead engine surfaces its real error at the first operation
  instead of a false config diagnosis.
- Consistency everywhere "ok" gated behavior: gstack-gbrain-detect --is-ok
  exits 0 on timeout, and gen-skill-docs' detection gate accepts it, so a
  slow engine no longer silently suppresses brain-aware features.
- Status cache: key now includes the effective probe timeout (raising it
  invalidates a cached timeout) and GBRAIN_HOME; config detection honors
  GBRAIN_HOME so relocated-home users stop being misclassified as
  missing-config.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(bins): cygpath-normalize SCRIPT_DIR for bun imports; surface learnings-log errors (#1950)

Under Windows git-bash, pwd yields a POSIX path (/c/Users/...) that Bun on
Windows cannot resolve as an ES module specifier. gstack-learnings-log
interpolates SCRIPT_DIR into a bun -e import, so every invocation died with
"Cannot find module" — and 2>/dev/null swallowed the error, silently
dropping every AI-logged learning for Windows users.

- 3-line cygpath -m guard in gstack-learnings-log and gstack-question-log
  (which gains the same import shape in the next commit). Matches the
  duplicated IS_WINDOWS convention in setup; no shared shell lib exists.
- learnings-log adopts question-log's set +e / TMPERR capture pattern
  wholesale: validation errors now print to stderr. The old
  `if [ $? -ne 0 ]` check was dead code under set -euo pipefail — the
  script exited at the failing assignment before reaching it.
- New test/bin-windows-bun-import-paths.test.ts: static invariant (any
  bash bin interpolating $SCRIPT_DIR into a bun -e import must carry the
  guard) + behavioral end-to-end run invoked via `bash <bin>` — added to
  the windows-free-tests workflow list so the conversion is proven on the
  only platform where the bug exists.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(question-log): dedupe INJECTION_PATTERNS via lib/jsonl-store (#1934)

bin/gstack-question-log carried a local copy of the injection-pattern list,
so pattern fixes to lib/jsonl-store.ts never propagated — including the
/override[:\s]/i false-positive fix arriving via community PR #1940.
Import the shared hasInjection instead (enabled by the previous commit's
cygpath guard). question-log also gets the lib's stricter superset
(human:, disregard, from-now-on, approve-all patterns).

Tests pin the contract in a #1940-order-independent way: an "Override:
ignore all previous instructions" header is rejected, "prose overrides the
deterministic table" is accepted, and a static invariant keeps local
INJECTION_PATTERNS duplicates out of the bin.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(security): community-pulse + both dashboards never report fake zeros (#1947)

The security-signaling surface failed open at three layers — every failure
mode read as a reassuring "0 attacks" / "0 installs":

- community-pulse edge function: supabase-js returns {data,error} without
  throwing, and all five queries discarded `error` — a DB outage produced
  real-looking zeros via the SUCCESS path, and the catch (also returning
  zeros with HTTP 200) was unreachable for query failures. Every query now
  destructures and throws; the catch serves the stale cache (marked
  "stale": true) when one exists, else 503 {"error":"pulse_unavailable"}.
  Success responses carry "status":"ok" so clients can distinguish
  authoritative data from legacy backends. NOTE: the edge function deploys
  out-of-band (supabase functions deploy community-pulse).
- gstack-security-dashboard: captures the HTTP status; non-200 / network
  failure / error body / missing section → "unknown — backend error";
  jq missing → "unknown — install jq" (the lossy grep fallback broke on
  nested arrays and under-reported attacks as zero — removed); a 200
  without the new marker shows figures with an "unverified (legacy
  backend)" note. Also fixes a latent display bug: the TOTAL grep matched
  the digit 7 inside "attacks_last_7_days" and misreported every count.
- gstack-community-dashboard: same class — curl || echo "{}" plus
  grep || echo "0" printed "Weekly active installs: 0" on any failure.
  Now "unknown — backend error (HTTP N)".

test/security-dashboard-fallback.test.ts pins the matrix (200+marker,
200-legacy, 503, network failure) x (jq present, jq absent) for both bins:
"unknown" states never render as 0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(telemetry): redact error_message spans before they leave the machine (#1947)

error_message was uploaded with only quote/newline escaping — stack traces
and failed-API errors can embed credentials, private paths, and hostnames,
and the sync path strips only _repo_slug/_branch.

New lib/redact-engine.ts export redactFindingSpans(): replaces EVERY
finding's span with <REDACTED-{id}> regardless of tier (applyRedactions is
the interactive PII-only path and exits nonzero on credential findings, so
it can't serve machine egress). Returns null when a span can't be located —
callers drop the whole payload rather than risk a leak.

gstack-telemetry-log pipes error_message through it at LOG time, so the
local JSONL at rest is clean too; surrounding text survives for crash
triage. FAIL CLOSED: bun missing, engine error, or non-JSON-string output
all null the field. Tests pin: embedded ghp_ token → <REDACTED-github.pat>
with context intact; redactor unavailable → null; raw bytes on disk never
contain the token.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(redact): prepush guard fails closed on git failure; /ship owns hook install (#1946)

Two gaps closed:

1. Fail closed. The git() helper returned "" on ANY non-zero exit or
   maxBuffer overflow (status null), addedLinesFor produced an empty
   string, and the push sailed through unscanned — fail-open on exactly
   the oversized-diff case where a large secret-bearing blob is most
   likely. The diff call now uses a strict variant that throws; main
   blocks with a clear message naming the GSTACK_REDACT_PREPUSH=skip
   escape valve. Probe calls (symbolic-ref, rev-parse, merge-base) keep
   the permissive helper — their failures are normal control flow.

2. Install path. The hook was installed by nothing ("opt-in, installed by
   nothing" was the issue's words). ./setup runs in the gstack checkout —
   the wrong repo for a per-project hook — so it gets a one-line hint
   only. /ship owns per-repo install: config redact_prepush_hook=true +
   hook missing → silent install (consent already given); config unset +
   no ~/.gstack/.redact-prepush-prompted marker → one-time machine-wide
   AskUserQuestion offer, answer persisted. ship/SKILL.md regenerated in
   this same commit (check-freshness bisect discipline).

Tests: unscannable diff (bogus SHAs) → exit 1 + valve named; empty-but-
successful diff → exit 0; static asserts pin setup as hint-only and the
ship template as the installer surface.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(redact): six new credential patterns — GitLab, HuggingFace, npm, DigitalOcean, Bearer, GCP SA (#1946)

Coverage gaps from the #1946 security review, including token types for
tooling gstack itself drives (glab):

HIGH (block): gitlab.token (glpat-/glptt-/gldt-), huggingface.token (hf_),
npm.token (npm_), digitalocean.token (dop_v1_), gcp.service_account (the
JSON-escaped "private_key" form that dodges pem.private_key's literal-block
match when minified, confirmed by "private_key_id" proximity).

MEDIUM (warn): auth.bearer — the most FP-prone shape in the set (docs are
full of "Authorization: Bearer <token>"), so it requires header-context
proximity and the same entropy>=3.0 + placeholder validator recipe as
env.kv. "Bearer YOUR_TOKEN_HERE" never fires; calibration over coverage,
per the cries-wolf principle.

All shapes are linear-time; test/redact-pattern-lint.test.ts covers them
automatically. Engine tests add positive + placeholder-negative cases per
pattern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: coverage-audit additions for the fix wave

Ship Step 7 gap-fill (all passing, 248 tests across the touched suites):
memory + dream stage probe-timeout proceeds, gbrain-detect override paths,
stale-flag passthrough, 200-body-missing-.security fail-closed case,
telemetry redaction edges, and credential-pattern edge cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: pre-landing review fixes

Review army findings (1 critical, auto-fixed with regression tests):

- CRITICAL (security specialist, verified live): redactFindingSpans spliced
  only the regex capture span, and pem.private_key / gcp.service_account
  capture just the BEGIN-header — the key body survived "redaction" and
  shipped via telemetry. Marker-only patterns now drop the whole payload
  (null, fail closed). Overlapping spans (Bearer+JWT on the same bytes) are
  coalesced before splicing so stale offsets can't leave partial secret
  bytes behind.
- gitStrict: drop the dead `|| r.status === null` disjunct (null !== 0
  already covers it); add the signal-kill/null-status regression test the
  docstring promised.
- security-dashboard human mode flags stale snapshots ("figures may be out
  of date") instead of presenting frozen counts as current.
- community-dashboard marker check uses jq when available — the grep-only
  variant misclassified whitespaced/reserialized bodies as legacy.
- telemetry fail-closed test now shadows bun with a failing stub
  (deterministic on any host layout); stale "five status cases" describe
  title renamed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: adversarial review fixes (Claude + Codex cross-model passes)

Both adversarial passes ran against the wave; every FIXABLE finding landed
with a regression test:

- probeTimeoutMs clamps to >=1ms: a fractional override floored to 0, and
  execFileSync treats timeout:0 as NO timeout — the probe that exists to
  bound hangs could hang forever (found by both models independently).
- /ship silent hook install now requires the hooks dir to live inside
  .git: with core.hooksPath (husky's COMMITTED .husky/), the chaining
  installer would have renamed the team's committed pre-push and written a
  machine-local wrapper into the working tree (found by both models).
- gstack-config gbrain-refresh accepts the "timeout" status — the last
  consumer still gating on literal "ok" (Codex); gstack-gbrain-detect's
  config-derived fields honor GBRAIN_HOME so the detection JSON can't
  report status ok alongside config_exists false (Codex).
- prepush: a remote sha absent locally (shallow clone / stale fetch) falls
  back to the merge-base/empty-tree range — scans MORE, never blocks a
  legitimate push into training users toward --no-verify.
- dashboards: curl's own 000 no longer doubles to "HTTP 000000"; the
  community dashboard flags stale snapshots like the security one; array
  sections parse via jq (the sed/grep loops truncated at the first ']');
  the no-jq marker grep tolerates whitespace.
- telemetry: multi-line redactor output nulls the field instead of
  corrupting the JSONL record; setup's hint fires only when the config key
  is genuinely unset (an explicit false is a recorded decline); the /ship
  prompt marker honors GSTACK_HOME.

Kept as designed (cross-model tension noted): Bearer stays MEDIUM in the
prepush gate — a HIGH Bearer would block every docs example; the entropy
validator can't eliminate that FP class, and MEDIUM warns visibly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.11.0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: P1 TODO — eval harness live progress + incremental persistence

Root-caused during this ship: a killed eval run was indistinguishable from a
healthy one for hours (per-file output buffering across mega test files, no
incremental eval-store writes, no honest liveness signal). Full context and
starting points in the entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: fix operational-learning E2E fixture — copy lib/jsonl-store.ts

Pre-existing breakage, proven on main: gstack-learnings-log has imported
lib/jsonl-store.ts (shared injection patterns) since v1.57.5.0 / #1910, but
the fixture copies only the bin scripts — the bin exits 1 before writing
anything, on main silently (stderr swallowed) and on this branch loudly
(the #1950 error-surfacing made the four-day-old failure visible). A real
install always ships bin/ and lib/ together; the fixture now does too.
Verified: the fixture-shaped invocation writes the learning (exit 0) with
lib present, exits 1 on both main and this branch without it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(ios-qa): isolate E2E tests under --concurrent (3 real races)

The ios-qa E2E file failed intermittently under `bun test --concurrent`
(the eval harness default). Three distinct shared-state races, all fixed:

1. Shared pidfile: a module-level `workDir` reassigned in beforeEach was
   clobbered by parallel tests, so concurrent daemons collided on the same
   pidfile and the loser returned `already_running`. Each test now gets its
   own dir via makeWorkDir().
2. process.env path globals: tests set GSTACK_IOS_AUDIT_PATH /
   _ATTEMPTS_PATH / _ALLOWLIST_PATH on the shared process env; concurrent
   tests stomped each other's audit/attempts destinations. Threaded
   auditPath/attemptsPath/allowlistPath through DaemonOptions (and
   mintForCaller) as explicit args — env is no longer load-bearing.
3. afterEach cleanup race: the per-test cleanup drained a shared dir array,
   so the first test to finish deleted still-running tests' workDirs
   mid-assertion. Moved to afterAll (cleans once, after all settle).

Verified: 5/5 clean full-suite runs at --max-concurrency 15 (was
intermittent); daemon unit suite 91/91; daemon source compiles. The paths
default to the env-derived locations when options are omitted, so the
production CLI path is unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(pty): pin spawned claude to EVALS model chain (default claude-sonnet-4-6)

launchClaudePty spawned the interactive `claude` TUI with no --model flag, so
the child inherited the operator's ~/.claude/settings.json model. On a
slow-thinking model that meant 5+ min of extended thinking on empty plan-mode
context, timing out the plan-mode smoke tests regardless of contention. Pin the
model via opts.model ?? EVALS_MODEL ?? 'claude-sonnet-4-6' — byte-identical to
session-runner.ts:144, so PTY and `claude -p` evals always agree.

Pushed before extraArgs (last flag wins, so a per-test --model still overrides).
Placement leaves the spawn region byte-stable for a clean merge with the
in-flight hermetic-env branch. Plumbed model through the three plan-skill
wrappers. Static-grep tripwires guard the pin, its fallback chain, the
before-extraArgs ordering, and all three wrapper forwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(pty): detect markdown bold-bullet prose AUQs (fixes office-hours smoke)

office-hours auto-mode renders its mode question as `- **Building a startup**`
markdown bullets (office-hours/SKILL.md.tmpl:102) with no letter/number marker.
isProseAUQVisible only matched `A)`-style lettered or `1.`-style numbered
options, so the question went undetected: the model surfaced it at ~2m19s
(well under the 300s budget) but the harness kept scoring the run "working"
off the spinner glyphs and timed out — a false timeout on a question that was
already on screen.

Add Pattern 3: when an interrogative line ('?') is present AND 3+ bold-bullet
markers (`- **`) appear in the 4KB tail, classify as a prose AUQ. Bold is the
discriminator vs incidental prose bullets; the line anchor is dropped (stripAnsi
can collapse option lines) and the existing `❯ 1.` cursor gate still defers to a
live native list. Wires through the existing classifyVisible 'asked' path and the
timeout high-water-mark, so office-hours now classifies 'asked' instead of
'timeout'. Five unit cases: the office-hours render passes; no-'?', <3-bullet,
plain-bullet, and native-cursor cases stay false.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(pty): detect stripAnsi-collapsed prose AUQs + judge spinner-precedence

The plan-eng/plan-design plan-mode + finding-floor smokes timed out even when
the skill HAD rendered a complete prose AskUserQuestion and was waiting: the PTY
strips cursor-positioning escapes, collapsing the option newlines/spaces so
"A) ..." arrives as "A(recommended)" / "-B:" and "Reply with A, B, or C" as
"ReplywithA,B,orC". Every line-anchored detector (Patterns 1-3) returns false on
those bytes, so proseAUQEverObserved never latched and the run timed out on a
question that was already on screen.

Add Pattern 4/5: a two-signal collapsed-form detector — a reply/recommendation
marker (space-insensitive "reply with [A-D]", "Recommendation:", or
"(recommended)") AND 2+ distinct A-D letters each punctuated by ) : or (. The
conjunction is what separates a real AUQ from incidental report prose; verified
true on the verbatim failing-run buffers where Patterns 1-3 return false.

Also fix the Haiku judge spinner bias: of 614 verdicts, 569 were 'working' and
95 of those noted a question was visible — Claude Code keeps the spinner
animating at an idle prose decision, so the judge coin-flipped. Add a precedence
override: when an option list AND a Recommendation/Reply instruction are both
visible, classify WAITING even with spinner glyphs. Kept the strict dual-signal
gate (never option-list-alone) so auto-decide-preserved doesn't flip.

5 unit tests pin the two-signal contract (2 true on real collapsed bytes, 3
false guards). 90 -> 95 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(plan-review): ask-first scope gate for plan-eng + plan-design review

On an empty/cold invocation, plan-eng-review and plan-design-review would dive
straight into repo exploration (plan-eng) or a 7-pass mockup+audit (plan-design)
and only ask the user much later, if at all. plan-ceo-review already asks first
via an unconditional Step-0 gate and behaves well; these two did not.

Add a hard-STOP scope gate as the FIRST operational instruction in each skill
(above the design-doc check / pre-review audit / mockup defaults it explicitly
overrides): the first tool call must be AskUserQuestion confirming the review
target, before any git/Read/Grep/Glob/Bash or mockup generation. Under
--disallowedTools the options render as plain column-0 lettered prose with a
Recommendation + "Reply with A, B, or C" line so the answer is detectable.

This is correct cold-start UX (confirm what to review before grinding a full
review on nothing) and it is the product half of the plan-mode smoke fix; the
harness collapsed-form detector is the deterministic half that catches the ask
however it renders. Templates + regenerated SKILL.md (default variant).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(tiers): reclassify stochastic plan-eng/plan-design ask-first smokes as periodic

plan-eng-review and plan-design-review run a long explore/audit before their
first AskUserQuestion, so whether the plan-mode + finding-floor smokes reach a
terminal outcome within the 300s/600s budget depends on stochastic ask-first
compliance (measured ~50-67%/run even with the hardened gate). Per the
"non-deterministic -> periodic" tiering rule, move the four affected smokes
(plan-eng/plan-design review-plan-mode + finding-floor) to periodic.

The deterministic harness fix (collapsed-form detector + judge precedence) and
the ask-first gate lift these from always-failing to mostly-passing and are the
real product+harness improvements; periodic monitoring tracks the rate weekly
without blocking PRs on an LLM coin-flip. plan-ceo/plan-devex ask-first reliably
and stay gate-tier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(evals): gate the deterministic PTY plan-mode smokes in CI

The real-PTY plan-mode smokes never ran in CI — the gate was local-only. Add an
e2e-pty-plan-smoke matrix suite running the two deterministically-reliable ones
(office-hours-auto-mode, plan-mode-no-op) so a regression there blocks PRs. The
stochastic plan-eng/plan-design ask-first smokes stay periodic (touchfiles
E2E_TIERS) and are not CI-gated.

A fresh CI container has no ~/.claude.json, so the spawned interactive `claude`
would wedge on the onboarding + API-key-approval dialog. Add a scoped seed step
(hasCompletedOnboarding + key approval, its own ANTHROPIC_API_KEY env) before the
run — mirrors what the hermetic E2E child env seeds. Per-suite timeout override
(35 min) via matrix.suite.timeout so the PTY suite has headroom for --retry 2
without bumping the other 12 suites. Report runner count 12 -> 13.

Validate via workflow_dispatch before relying on the gate (PTY-in-CI is new).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(evals): install gstack skill registry for the PTY smoke suite

The first dry-run of e2e-pty-plan-smoke failed: the spawned interactive `claude`
printed "Unknown command: /plan-ceo-review". .claude/skills is gitignored, so a
fresh CI checkout has no gstack skill registry and the TUI can't resolve
/office-hours or /plan-ceo-review.

Add a Register step (scoped to the suite, after Seed, before Run) that mirrors
setup's --no-prefix user-scoped registry minimally: $HOME/.claude/skills/gstack
-> repo (resolves the preambles' absolute ~/.claude/skills/gstack/bin/* and
<skill>/sections/* paths) + per-skill SKILL.md/sections symlinks for the two
skills these tests invoke. HOME is /github/home in this container and the runner
adds no HOME/CLAUDE_CONFIG_DIR override (no hermetic mode), so $HOME is the right
anchor — the Seed step already proved claude reads it. No ./setup (binary build
+ Chromium + fonts + /dev/tty prompt); SKILL.md + bin/ + sections/ are committed.

Self-validating: fails the step loudly on a dangling symlink or missing
`name:` frontmatter, so a moved target surfaces here instead of as a silent
35-min "Unknown command" timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.4.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-21 07:15:19 -07:00
..
fixtures v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
helpers v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
agent-sdk-runner.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
analytics.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
artifacts-init-migration.test.ts v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547) 2026-05-17 08:26:36 -07:00
audit-compliance.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
auq-error-fallback-hook.test.ts v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
auq-format-always-loaded.test.ts v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
benchmark-cli.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
benchmark-runner.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
bin-windows-bun-import-paths.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
brain-cache-roundtrip.test.ts v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
brain-cache-spec.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
brain-preflight.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
brain-sync-windows-paths.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
brain-sync.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
build-gbrain-env.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
build-script-shell-compat.test.ts v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594) 2026-05-20 07:35:01 -07:00
builder-profile.test.ts feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937) 2026-04-08 22:21:28 -10:00
cache-concurrent-refresh.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
carve-guard-completeness.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
carve-guards-negative.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
carve-section-loading.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
carve-section-ordering.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
catalog-mode-full.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
catalog-trim.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
codex-e2e-plan-format.test.ts fix(plan-reviews): restore RECOMMENDATION + Completeness split + Codex ELI10 (v1.6.3.0) (#1149) 2026-04-23 07:25:20 -07:00
codex-e2e-recommendation-substance.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
codex-e2e.test.ts feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425) 2026-03-23 23:05:22 -07:00
codex-hardening.test.ts v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs) (#1478) 2026-05-14 11:11:52 -04:00
codex-resume-flag-semantics.test.ts v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391) 2026-05-09 08:06:47 -07:00
conductor-env-shim.test.ts v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534) 2026-05-16 12:32:33 -07:00
context-save-hardening.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
cso-preserved.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
cso-spec-taxonomy-alignment.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
declared-annotation.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
dev-setup-render-isolation.test.ts v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) 2026-06-09 22:29:23 -07:00
diagram-render-drift.test.ts v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
diff-scope.test.ts v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
discover-section-templates.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
distill-apply.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
distill-free-text.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
docs-config-keys.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
document-skills-redaction.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
e2e-harness-audit.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
explain-level-config.test.ts v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678) 2026-05-24 01:43:51 -07:00
extension-pty-inject-invariant.test.ts v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594) 2026-05-20 07:35:01 -07:00
gbrain-cycle-completed.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
gbrain-detect-install.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
gbrain-detect-shape.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
gbrain-detection-override.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
gbrain-dream-stage.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
gbrain-exec-invariant.test.ts v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547) 2026-05-17 08:26:36 -07:00
gbrain-guards.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
gbrain-init-rollback.test.ts v1.37.0.0 feat: split-engine gbrain (remote MCP brain + local PGLite for code) (#1500) 2026-05-14 17:20:48 -07:00
gbrain-init-voyage-code-3.test.ts v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests (#1639) 2026-05-21 18:55:55 -07:00
gbrain-lib-validate-varname.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
gbrain-lib-verify.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
gbrain-local-status.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
gbrain-refresh-install-render.test.ts v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) 2026-06-09 22:29:23 -07:00
gbrain-repo-policy.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
gbrain-source-gitignore.test.ts v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547) 2026-05-17 08:26:36 -07:00
gbrain-sources-parse.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
gbrain-sources.test.ts v1.26.3.0 feat: /sync-gbrain skill + native code-surface orchestrator (#1314) 2026-05-04 09:29:48 -07:00
gbrain-spawn-windows-shell.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
gbrain-supabase-provision.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
gbrain-sync-skip.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
gbrain-sync-voyage-code-3-integration.test.ts v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests (#1639) 2026-05-21 18:55:55 -07:00
gemini-e2e.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
gen-skill-docs-idempotency.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
gen-skill-docs-out-dir.test.ts v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) 2026-06-09 22:29:23 -07:00
gen-skill-docs.test.ts v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916) 2026-06-08 21:17:18 -07:00
global-discover.test.ts v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592) 2026-05-20 06:56:41 -07:00
gstack-artifacts-init.test.ts v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs) (#1478) 2026-05-14 11:11:52 -04:00
gstack-artifacts-url.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
gstack-brain-context-load.test.ts v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594) 2026-05-20 07:35:01 -07:00
gstack-codex-session-import.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
gstack-config-redact-keys.test.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
gstack-decision-bins.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
gstack-decision-semantic.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
gstack-decision.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
gstack-detach.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
gstack-developer-profile.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
gstack-gbrain-detect-mcp-mode.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
gstack-gbrain-mcp-verify.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
gstack-gbrain-source-wireup.test.ts v1.17.0.0: setup-gbrain wireup ships the gbrain federation surface (#1234) 2026-04-28 01:17:54 -07:00
gstack-gbrain-sync.test.ts v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594) 2026-05-20 07:35:01 -07:00
gstack-learnings-search.test.ts v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
gstack-memory-helpers.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
gstack-memory-ingest.test.ts v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547) 2026-05-17 08:26:36 -07:00
gstack-next-version.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
gstack-paths.test.ts v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594) 2026-05-20 07:35:01 -07:00
gstack-question-log.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
gstack-question-preference.test.ts v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740) 2026-05-26 23:43:07 -07:00
gstack-redact-cli.test.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
gstack-schema-pack.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
gstack-session-kind.test.ts v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
gstack-settings-hook-schema-aware.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
gstack-slug-sanitize.test.ts v1.55.1.0 fix: telemetry consent accuracy + gstack-slug cache sanitization (#1848) 2026-06-02 22:36:34 -07:00
gstack-state-root-override.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
gstack-upgrade-migration-v1_17_0_0.test.ts v1.17.0.0: setup-gbrain wireup ships the gbrain federation surface (#1234) 2026-04-28 01:17:54 -07:00
gstack-upgrade-migration-v1_37_0_0.test.ts v1.37.0.0 feat: split-engine gbrain (remote MCP brain + local PGLite for code) (#1500) 2026-05-14 17:20:48 -07:00
gstack-upgrade-migration-v1_40_0_0.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
gstack-version-bump.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
helpers-unit.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
hermetic-wiring.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
hook-scripts.test.ts v1.31.1.0 fix wave: 3 community PRs (careful BSD sed, codex Step 0 rename, make-pdf setup ordering) (#1413) 2026-05-10 06:57:24 -07:00
host-config.test.ts community wave: 6 PRs + hardening (v0.18.1.0) (#1028) 2026-04-17 00:45:13 -07:00
investigate-freeze-path.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
is-conductor.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
jargon-list.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
jsonl-merge.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
jsonl-store.test.ts v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
land-and-deploy-postfail.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
learnings-injection.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
learnings.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
llm-judge-recommendation.test.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
llms-txt-shape.test.ts v1.28.0.0 feat: browse --headed/--proxy/--navigate + gstack/llms.txt + webdriver-only stealth (#1363) 2026-05-07 20:14:59 -07:00
memory-cache-injection.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
memory-ingest-no-put_page.test.ts v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594) 2026-05-20 07:35:01 -07:00
memory-ingest-timeout.test.ts v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
migration-checkpoint-ownership.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
migrations-v1.27.0.0.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
model-overlay-opus-4-7.test.ts v1.13.0.0 feat: add Claude outside-voice skill (#1212) 2026-04-25 11:52:48 -07:00
no-stale-gstack-brain-refs.test.ts v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505) 2026-05-14 21:19:58 -07:00
one-way-doors.test.ts v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
openclaw-native-skills.test.ts community wave: 6 PRs + hardening (v0.18.1.0) (#1028) 2026-04-17 00:45:13 -07:00
parity-baseline-integrity.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
parity-sectioned.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
parity-suite.test.ts v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916) 2026-06-08 21:17:18 -07:00
plan-tune-gates.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
plan-tune.test.ts v1.53.1.0 fix: non-interactive-safe plan-tune hook install (flags + smart defaults) (#1805) 2026-05-30 11:42:13 -07:00
post-rename-doc-regen.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
pr-title-rewrite.test.ts v1.23.0.0 feat: always prefix PR titles with v<VERSION> (#1284) 2026-05-01 07:06:37 -07:00
pr-title-sync-workflow-safety.test.ts v1.57.3.0 fix(ship): always-loaded PR-title-version rule + fork-PR title-sync backstop (#1909) 2026-06-07 22:04:18 -07:00
preamble-compose.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
question-log-hook.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
question-preference-hook.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
readme-throughput.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
redact-audit-log.test.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
redact-doc-resolver.test.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
redact-engine-autoredact.test.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
redact-engine.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
redact-pattern-lint.test.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
redact-prepush-hook.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
redact-semantic-pass.eval.ts v1.53.0.0 feat: smarter redaction — PII/secrets/legal guard across /spec, /ship, /cso, /document-* (#1797) 2026-05-30 08:54:46 -07:00
regression-1539-review-self-verify.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
regression-1611-gbrain-sync-resume.test.ts v1.56.1.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802) (#1856) 2026-06-07 06:51:10 -07:00
regression-1624-retro-stale-base.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
regression-pr1169-build-app-sed.test.ts v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592) 2026-05-20 06:56:41 -07:00
regression-pr1169-mktemp-fallbacks.test.ts v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592) 2026-05-20 06:56:41 -07:00
relink.test.ts v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642) 2026-05-21 21:21:07 -07:00
required-reads.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
resolver-ask-user-format.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
resolver-entry.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
resolvers-gbrain-put-rewrite.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
resolvers-gbrain-save-results.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
review-log.test.ts fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552) 2026-03-26 23:21:27 -06:00
salience-allowlist.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
schema-version-migration.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
secret-sink-harness.test.ts v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183) 2026-04-24 01:38:21 -07:00
section-manifest-consistency.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
security-dashboard-fallback.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
setup-codesign.test.ts v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678) 2026-05-24 01:43:51 -07:00
setup-conductor-worktree.test.ts v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505) 2026-05-14 21:19:58 -07:00
setup-emoji-font.test.ts v1.52.2.0 fix(make-pdf): render emoji instead of tofu (▯) on Linux (#1787) 2026-05-29 18:06:19 -07:00
setup-gbrain-path4-structure.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
setup-plan-tune-hooks-noninteractive.test.ts v1.53.1.0 fix: non-interactive-safe plan-tune hook install (flags + smart defaults) (#1805) 2026-05-30 11:42:13 -07:00
setup-sections-linking.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
setup-windows-fallback.test.ts v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505) 2026-05-14 21:19:58 -07:00
ship-plan-completion-invariants.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
ship-template-redaction.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
ship-version-sync.test.ts fix(ship): detect + repair VERSION/package.json drift in Step 12 (v1.1.1.0) (#1063) 2026-04-18 23:58:59 +08:00
skill-budget-regression.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
skill-ceo-section-ordering.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-collision-sentinel.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-coverage-floor.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
skill-coverage-matrix.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
skill-coverage-matrix.ts v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
skill-cross-model-recommendation-emit.test.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
skill-e2e-ask-user-question-format-compliance.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-auq-consistency.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-auq-matrix.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-auq-verbose-vs-carved-ab.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-auto-decide-preserved.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
skill-e2e-autoplan-chain.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-autoplan-dual-voice.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-benchmark-providers.test.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
skill-e2e-brain-privacy-gate.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
skill-e2e-bws.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
skill-e2e-conductor-prose.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
skill-e2e-context-skills.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-cso.test.ts feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384) 2026-03-23 06:57:22 -07:00
skill-e2e-deploy.test.ts feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518) 2026-03-26 11:08:31 -07:00
skill-e2e-design.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-diagram.test.ts v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
skill-e2e-gbrain-roundtrip-local.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
skill-e2e-hermetic-canary.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
skill-e2e-ios-device.test.ts v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574) 2026-05-21 16:09:26 -07:00
skill-e2e-ios-swift-build.test.ts v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574) 2026-05-21 16:09:26 -07:00
skill-e2e-ios.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
skill-e2e-learnings.test.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
skill-e2e-memory-pipeline.test.ts v1.26.3.0 feat: /sync-gbrain skill + native code-surface orchestrator (#1314) 2026-05-04 09:29:48 -07:00
skill-e2e-office-hours-auto-mode.test.ts v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287) 2026-05-01 08:45:36 -07:00
skill-e2e-office-hours-brain-writeback.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-office-hours-phase4.test.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
skill-e2e-office-hours.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-e2e-opus-47.test.ts feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117) 2026-04-22 01:06:22 -07:00
skill-e2e-overlay-harness.test.ts feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166) 2026-04-23 18:42:58 -07:00
skill-e2e-plan-ceo-finding-count.test.ts v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255) 2026-04-30 02:50:09 -07:00
skill-e2e-plan-ceo-finding-floor.test.ts v1.27.1.0 fix: anti-shortcut clause + gate-tier AskUserQuestion floor tests for all plan-* skills (#1354) 2026-05-06 20:27:20 -07:00
skill-e2e-plan-ceo-mode-routing.test.ts v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255) 2026-04-30 02:50:09 -07:00
skill-e2e-plan-ceo-plan-mode.test.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
skill-e2e-plan-ceo-review-section-loading.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
skill-e2e-plan-ceo-split-overflow.test.ts v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740) 2026-05-26 23:43:07 -07:00
skill-e2e-plan-design-finding-count.test.ts v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255) 2026-04-30 02:50:09 -07:00
skill-e2e-plan-design-finding-floor.test.ts v1.27.1.0 fix: anti-shortcut clause + gate-tier AskUserQuestion floor tests for all plan-* skills (#1354) 2026-05-06 20:27:20 -07:00
skill-e2e-plan-design-plan-mode.test.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
skill-e2e-plan-design-with-ui.test.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
skill-e2e-plan-devex-finding-count.test.ts v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255) 2026-04-30 02:50:09 -07:00
skill-e2e-plan-devex-finding-floor.test.ts v1.27.1.0 fix: anti-shortcut clause + gate-tier AskUserQuestion floor tests for all plan-* skills (#1354) 2026-05-06 20:27:20 -07:00
skill-e2e-plan-devex-plan-mode.test.ts v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313) 2026-05-03 20:26:59 -07:00
skill-e2e-plan-eng-finding-count.test.ts v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255) 2026-04-30 02:50:09 -07:00
skill-e2e-plan-eng-finding-floor.test.ts v1.27.1.0 fix: anti-shortcut clause + gate-tier AskUserQuestion floor tests for all plan-* skills (#1354) 2026-05-06 20:27:20 -07:00
skill-e2e-plan-eng-multi-finding-batching.test.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
skill-e2e-plan-eng-plan-mode.test.ts v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390) 2026-05-09 17:01:13 -07:00
skill-e2e-plan-format.test.ts v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296) 2026-05-01 19:51:51 -07:00
skill-e2e-plan-mode-no-op.test.ts v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215) 2026-04-26 13:55:13 -07:00
skill-e2e-plan-prosons.test.ts v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178) 2026-04-23 18:25:34 -07:00
skill-e2e-plan-tune-cathedral.test.ts v1.52.0.0 feat(plan-tune): explicit consent + first-run setup wizard for contributors (#1741) 2026-05-28 18:21:09 -07:00
skill-e2e-plan-tune.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
skill-e2e-plan.test.ts v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916) 2026-06-08 21:17:18 -07:00
skill-e2e-qa-bugs.test.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
skill-e2e-qa-workflow.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-review-army.test.ts feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
skill-e2e-review.test.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
skill-e2e-session-intelligence.test.ts fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064) 2026-04-19 08:38:19 +08:00
skill-e2e-setup-gbrain-bad-token.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
skill-e2e-setup-gbrain-path4-local-pglite.test.ts v1.37.0.0 feat: split-engine gbrain (remote MCP brain + local PGLite for code) (#1500) 2026-05-14 17:20:48 -07:00
skill-e2e-setup-gbrain-remote.test.ts v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) 2026-05-06 19:37:53 -07:00
skill-e2e-ship-idempotency.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
skill-e2e-ship-section-loading.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
skill-e2e-sidebar.test.ts feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793) 2026-04-04 15:32:20 -07:00
skill-e2e-skillify.test.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
skill-e2e-spec-execute.test.ts v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733) 2026-05-26 21:36:53 -07:00
skill-e2e-workflow.test.ts v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431) 2026-05-11 12:16:26 -07:00
skill-e2e.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
skill-llm-eval-spec.test.ts v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733) 2026-05-26 21:36:53 -07:00
skill-llm-eval.test.ts v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004) 2026-06-14 11:40:57 -07:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-preflight-budget.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
skill-routing-e2e.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
skill-size-budget.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
skill-validation.test.ts v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
spec-template-invariants.test.ts v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907) 2026-06-07 19:13:24 -07:00
spec-template-sync.test.ts v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733) 2026-05-26 21:36:53 -07:00
static-no-legacy-writes.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
takes-fence-fallback.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
taste-engine.test.ts feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040) 2026-04-19 17:50:31 +08:00
team-mode.test.ts feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117) 2026-04-22 01:06:22 -07:00
telemetry-repo-strip.test.ts v1.55.1.0 fix: telemetry consent accuracy + gstack-slug cache sanitization (#1848) 2026-06-02 22:36:34 -07:00
telemetry.test.ts v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077) 2026-06-21 07:15:19 -07:00
template-context-parity.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
terse-build.test.ts v1.57.4.0 refactor(ethos): rename Boil the Lake principle to Boil the Ocean (#1912) 2026-06-08 05:41:07 -07:00
test-free-shards.test.ts v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252) 2026-05-01 07:21:28 -07:00
timeline.test.ts v1.44.1.0 fix wave: post-windhoek paper-cut — 9 community PRs in one bundle (#1682) 2026-05-25 10:57:15 -07:00
touchfiles.test.ts v1.56.0.0 Token-reduction Phase B + AUQ paranoid safety net (#1849) 2026-06-04 11:14:43 -07:00
transcript-section-logger.test.ts v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806) 2026-05-30 12:09:10 -07:00
uninstall.test.ts feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561) 2026-03-27 00:44:37 -06:00
upgrade-migration-v1.test.ts v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678) 2026-05-24 01:43:51 -07:00
user-slug-fallback.test.ts v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742) 2026-05-29 08:35:00 -07:00
v0-dormancy.test.ts feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039) 2026-04-18 15:05:42 +08:00
worktree.test.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
writing-style-resolver.test.ts v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00