gstack

History

Garry Tan 7860d6516e test: demote setup-gbrain Path 4 E2E to periodic-tier The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic — the model interprets "follow Path 4 only" prompts flexibly and can skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which makes the gate-tier assertions flaky. The deterministic gate coverage for Path 4 is in test/setup-gbrain-path4-structure.test.ts: a fast structural lint that catches AUQ-pacing regressions and prose contract drift in <200ms with zero token spend. That test is the right tool for catching the failure mode the gate-tier was meant to guard against. The Agent SDK E2E tests stay available on-demand for periodic-tier runs (EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts). Also tightened the verify-error assertion to the literal field shape ("error_class": "AUTH") instead of a substring match that false-matches the parent claude session's "needs-auth" MCP discovery markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-06 11:02:45 -07:00
..
providers	v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )	2026-05-01 07:21:28 -07:00
agent-sdk-runner.ts	v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )	2026-05-01 07:21:28 -07:00
benchmark-judge.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
benchmark-runner.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
claude-pty-runner.ts	v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313 )	2026-05-03 20:26:59 -07:00
claude-pty-runner.unit.test.ts	v1.26.2.0 fix: plan-eng-review STOP gates always fire AskUserQuestion + report-at-bottom contract enforcement (#1313 )	2026-05-03 20:26:59 -07:00
codex-session-runner.ts	fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391 )	2026-03-23 08:44:08 -07:00
e2e-helpers.ts	v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296 )	2026-05-01 19:51:51 -07:00
eval-store.test.ts	feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83 )	2026-03-15 23:55:39 -05:00
eval-store.ts	v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )	2026-04-26 13:55:13 -07:00
gemini-session-runner.test.ts	feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )	2026-03-20 08:30:09 -07:00
gemini-session-runner.ts	feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )	2026-03-20 08:30:09 -07:00
llm-judge.ts	v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296 )	2026-05-01 19:51:51 -07:00
observability.test.ts	fix: never clean up observability artifacts — partial file persists after finalize	2026-03-14 12:37:38 -05:00
pricing.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
secret-sink-harness.ts	v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )	2026-04-24 01:38:21 -07:00
session-runner.test.ts	feat: stream-json NDJSON parser for real-time E2E progress	2026-03-14 03:49:36 -05:00
session-runner.ts	fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )	2026-04-19 08:38:19 +08:00
skill-parser.ts	feat: content security — 4-layer prompt injection defense for pair-agent (#815 )	2026-04-06 14:41:06 -07:00
tool-map.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
touchfiles.ts	test: demote setup-gbrain Path 4 E2E to periodic-tier	2026-05-06 11:02:45 -07:00