gstack

History

Garry Tan ebebc95a34 test(parity): T0b — cathedral parity-suite harness + invariant registry Adds the harness that the v2_PLAN.md cathedral parity-eval suite is built on. Compares CURRENT SKILL.md output to v1.44.1 baseline along three axes: STRUCTURE frontmatter shape (catalog trim landed, "## When to invoke" present) CONTENT must-preserve phrases per skill family (cso: OWASP/STRIDE; plan-ceo: SCOPE EXPANSION/HOLD SCOPE/REDUCTION; ship: VERSION/CHANGELOG/PR; etc.) SIZE per-skill byte budget (maxSizeRatio + minBytes guards) PARITY_INVARIANTS registry pins 10 load-bearing skills (cso, ship, plan-*- review, review, qa, investigate, office-hours, autoplan). Each entry declares what must NOT regress; future compression that strips these phrases or shrinks a skill past its minBytes cliff fails CI. Periodic-tier LLM-judge parity (paid, ~$0.20/skill) lands in v2.0.0.0 sections/ phase. Same registry, same harness, judge added on top. Test plan: - bun test test/parity-suite.test.ts: 10/10 invariants pass vs v1.44.1 - Per-skill failures get actionable per-line breakdown so a reviewer can see which phrase / heading / size limit went sideways Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-25 20:37:36 -07:00
..
providers	v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391 )	2026-05-09 08:06:47 -07:00
agent-sdk-runner.ts	v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )	2026-05-01 07:21:28 -07:00
benchmark-judge.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
benchmark-runner.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
budget-override.ts	test(budget): T5 — hard token budgets + override audit trail (Phase A.6)	2026-05-25 20:36:43 -07:00
capture-parity-baseline.test.ts	test(parity): T0a — capture v1.44.1 baseline + capture helper + diff utility	2026-05-25 20:29:47 -07:00
capture-parity-baseline.ts	test(parity): T0a — capture v1.44.1 baseline + capture helper + diff utility	2026-05-25 20:29:47 -07:00
claude-pty-runner.ts	v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390 )	2026-05-09 17:01:13 -07:00
claude-pty-runner.unit.test.ts	v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390 )	2026-05-09 17:01:13 -07:00
codex-session-runner.ts	fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391 )	2026-03-23 08:44:08 -07:00
e2e-helpers.ts	v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534 )	2026-05-16 12:32:33 -07:00
eval-store.test.ts	feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83 )	2026-03-15 23:55:39 -05:00
eval-store.ts	v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431 )	2026-05-11 12:16:26 -07:00
gemini-session-runner.test.ts	feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )	2026-03-20 08:30:09 -07:00
gemini-session-runner.ts	feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )	2026-03-20 08:30:09 -07:00
llm-judge.ts	v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296 )	2026-05-01 19:51:51 -07:00
observability.test.ts	fix: never clean up observability artifacts — partial file persists after finalize	2026-03-14 12:37:38 -05:00
parity-harness.ts	test(parity): T0b — cathedral parity-suite harness + invariant registry	2026-05-25 20:37:36 -07:00
pricing.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
secret-sink-harness.ts	v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )	2026-04-24 01:38:21 -07:00
session-runner.test.ts	feat: stream-json NDJSON parser for real-time E2E progress	2026-03-14 03:49:36 -05:00
session-runner.ts	fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )	2026-04-19 08:38:19 +08:00
skill-parser.ts	feat: content security — 4-layer prompt injection defense for pair-agent (#815 )	2026-04-06 14:41:06 -07:00
tool-map.ts	feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )	2026-04-19 17:50:31 +08:00
touchfiles.ts	v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574 )	2026-05-21 16:09:26 -07:00