gstack/test
Garry Tan b805aa0113
feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005)
* feat: add Confusion Protocol to preamble resolver

Injects a high-stakes ambiguity gate at preamble tier >= 2 so all
workflow skills get it. Fires when Claude encounters architectural
decisions, data model changes, destructive operations, or contradictory
requirements. Does NOT fire on routine coding.

Addresses Karpathy failure mode #1 (wrong assumptions) with an
inline STOP gate instead of relying on workflow skill invocation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Hermes and GBrain host configs

Hermes: tool rewrites for terminal/read_file/patch/delegate_task,
paths to ~/.hermes/skills/gstack, AGENTS.md config file.

GBrain: coding skills become brain-aware when GBrain mod is installed.
Same tool rewrites as OpenClaw (agents spawn Claude Code via ACP).
GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS NOT suppressed on gbrain
host, enabling brain-first lookup and save-to-brain behavior.

Both registered in hosts/index.ts with setup script redirect messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: GBrain resolver — brain-first lookup and save-to-brain

New scripts/resolvers/gbrain.ts with two resolver functions:
- GBRAIN_CONTEXT_LOAD: search brain for context before skill starts
- GBRAIN_SAVE_RESULTS: save skill output to brain after completion

Placeholders added to 4 thinking skill templates (office-hours,
investigate, plan-ceo-review, retro). Resolves to empty string on
all hosts except gbrain via suppressedResolvers.

GBRAIN suppression added to all 9 non-gbrain host configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: wire slop:diff into /review as advisory diagnostic

Adds Step 3.5 to the review template: runs bun run slop:diff against
the base branch to catch AI code quality issues (empty catches,
redundant return await, overcomplicated abstractions). Advisory only,
never blocking. Skips silently if slop-scan is not installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Karpathy compatibility note to README

Positions gstack as the workflow enforcement layer for Karpathy-style
CLAUDE.md rules (17K stars). Links to forrestchang/andrej-karpathy-skills.
Maps each Karpathy failure mode to the gstack skill that addresses it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve native OpenClaw thinking skills

office-hours: add design doc path visibility message after writing
ceo-review: add HARD GATE reminder at review section transitions
retro: add non-git context support (check memory for meeting notes)

Mirrors template improvements to hand-crafted native skills.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update tests and golden fixtures for new hosts

- Host count: 8 → 10 (hermes, gbrain)
- OpenClaw adapter test: expects undefined (dead code removed)
- Golden ship fixtures: updated with Confusion Protocol + vendoring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate all SKILL.md files

Regenerated from templates after Confusion Protocol, GBrain resolver
placeholders, slop:diff in review, HARD GATE reminders, investigation
learnings, design doc visibility, and retro non-git context changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.18.0.0

- CHANGELOG: add v0.18.0.0 entry (Confusion Protocol, Hermes, GBrain,
  slop in review, Karpathy note, skill improvements)
- CLAUDE.md: add hermes.ts and gbrain.ts to hosts listing
- README.md: update agent count 8→10, add Hermes + GBrain to table
- VERSION: bump to 0.18.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: sync package.json version to 0.18.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: extract Step 0 from review SKILL.md in E2E test

The review-base-branch E2E test was copying the full 1493-line
review/SKILL.md into the test fixture. The agent spent 8+ turns
reading it in chunks, leaving only 7 turns for actual work, causing
error_max_turns on every attempt.

Now extracts only Step 0 (base branch detection, ~50 lines) which is
all the test actually needs. Follows the CLAUDE.md rule: "NEVER copy
a full SKILL.md file into an E2E test fixture."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: update GBrain and Hermes host configs for v0.10.0 integration

GBrain: add 'triggers' to keepFields so generated skills pass
checkResolvable() validation. Add version compat comment.

Hermes: un-suppress GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS.
The resolvers handle GBrain-not-installed gracefully, so Hermes
agents with GBrain as a mod get brain features automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: GBrain resolver DX improvements and preamble health check

Resolver changes:
- gbrain query → gbrain search (fast keyword search, not expensive hybrid)
- Add keyword extraction guidance for agents
- Show explicit gbrain put_page syntax with --title, --tags, heredoc
- Add entity enrichment with false-positive filter
- Name throttle error patterns (exit code 1, stderr keywords)
- Add data-research routing for investigate skill
- Expand skillSaveMap from 4 to 8 entries
- Add brain operation telemetry summary

Preamble changes:
- Add gbrain doctor --fast --json health check for gbrain/hermes hosts
- Parse check failures/warnings count
- Show failing check details when score < 50

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: preserve keepFields in allowlist frontmatter mode

The allowlist mode hard-coded name + description reconstruction but
never iterated keepFields for additional fields. Adding 'triggers'
to keepFields was a no-op because the field was silently stripped.

Now iterates keepFields and preserves any field beyond name/description
from the source template frontmatter, including YAML arrays.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add triggers to all 38 skill templates

Multi-word, skill-specific trigger keywords for GBrain's RESOLVER.md
router. Each skill gets 3-6 triggers derived from its "Use when asked
to..." description text. Avoids single generic words that would collide
across skills (e.g., "debug this" not "debug").

These are distinct from voice-triggers (speech-to-text aliases) and
serve GBrain's checkResolvable() validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate all SKILL.md files and update golden fixtures

Regenerated from updated templates (triggers, brain placeholders,
resolver DX improvements, preamble health check). Golden fixtures
updated to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: settings-hook remove exits 1 when nothing to remove

gstack-settings-hook remove was exiting 0 when settings.json didn't
exist, causing gstack-uninstall to report "SessionStart hook" as
removed on clean systems where nothing was installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for GBrain v0.10.0 integration

ARCHITECTURE.md: added GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS
to resolver table.

CHANGELOG.md: expanded v0.18.0.0 entry with GBrain v0.10.0 integration
details (triggers, expanded brain-awareness, DX improvements, Hermes
brain support), updated date.

CLAUDE.md: added gbrain to resolvers/ directory comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: routing E2E stops writing to user's ~/.claude/skills/

installSkills() was copying SKILL.md files to both project-level
(.claude/skills/ in tmpDir) and user-level (~/.claude/skills/).
Writing to the user's real install fails when symlinks point to
different worktrees or dangling targets (ENOENT on copyFileSync).

Now installs to project-level only. The test already sets cwd to
the tmpDir, so project-level discovery works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: scale Gemini E2E back to smoke test

Gemini CLI gets lost in worktrees on complex tasks (review times out
at 600s, discover-skill hits exit 124). Nobody uses Gemini for gstack
skill execution. Replace the two failing tests (gemini-discover-skill
and gemini-review-findings) with a single smoke test that verifies
Gemini can start and read the README. 90s timeout, no skill invocation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:41:38 -07:00
..
fixtures feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
helpers feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
analytics.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
audit-compliance.test.ts fix: security audit round 2 (v0.13.4.0) (#640) 2026-03-29 22:46:33 -06:00
builder-profile.test.ts feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937) 2026-04-08 22:21:28 -10:00
codex-e2e.test.ts feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425) 2026-03-23 23:05:22 -07:00
diff-scope.test.ts feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
gemini-e2e.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
gen-skill-docs.test.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
global-discover.test.ts fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817) 2026-04-05 02:02:06 -07:00
hook-scripts.test.ts feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) 2026-03-18 23:57:59 -05:00
host-config.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
learnings-injection.test.ts fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847) 2026-04-06 00:47:04 -07:00
learnings.test.ts feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622) 2026-03-29 17:02:01 -06:00
relink.test.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00
review-log.test.ts fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552) 2026-03-26 23:21:27 -06:00
skill-e2e-bws.test.ts fix: cookie picker auth token leak (v0.15.17.0) (#904) 2026-04-08 10:10:13 -07:00
skill-e2e-cso.test.ts feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384) 2026-03-23 06:57:22 -07:00
skill-e2e-deploy.test.ts feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518) 2026-03-26 11:08:31 -07:00
skill-e2e-design.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-learnings.test.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
skill-e2e-plan.test.ts test: E2E tests for plan review report and Codex offering (v0.11.15.0) (#449) 2026-03-24 07:30:24 -07:00
skill-e2e-qa-bugs.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-qa-workflow.test.ts feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360) 2026-03-23 10:17:33 -07:00
skill-e2e-review-army.test.ts feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692) 2026-03-30 22:07:50 -06:00
skill-e2e-review.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
skill-e2e-session-intelligence.test.ts feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733) 2026-04-01 00:50:42 -06:00
skill-e2e-sidebar.test.ts feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793) 2026-04-04 15:32:20 -07:00
skill-e2e-workflow.test.ts refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873) 2026-04-07 00:23:36 -07:00
skill-e2e.test.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
skill-llm-eval.test.ts feat: voice directive for all skills (v0.12.3.0) (#520) 2026-03-26 17:31:53 -06:00
skill-parser.test.ts feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
skill-routing-e2e.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
skill-validation.test.ts feat: UX behavioral foundations + ux-audit command (v0.17.0.0) (#1000) 2026-04-14 07:47:11 -10:00
team-mode.test.ts feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
telemetry.test.ts feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641) 2026-03-29 21:43:36 -06:00
timeline.test.ts feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733) 2026-04-01 00:50:42 -06:00
touchfiles.test.ts feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647) 2026-03-31 23:08:22 -06:00
uninstall.test.ts feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561) 2026-03-27 00:44:37 -06:00
worktree.test.ts feat: content security — 4-layer prompt injection defense for pair-agent (#815) 2026-04-06 14:41:06 -07:00