Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
Go to file
Garry Tan 14fc0866d9
v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990)
* docs(todos): P3 content-hash diagram render cache for make-pdf

Deferred from the diagram-engine eng review (Codex outside-voice D7):
repeat make-pdf runs re-render every fence; cache keyed on fence source +
bundle version once multi-diagram docs make it worth building.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(diagram-render): offline mermaid+excalidraw render bundle for browse

Single self-contained page (dist/diagram-render.html, 9.2MB, committed per
eng-review D2) exposing __renderMermaid / __mermaidToExcalidraw /
__excalidrawToSvg / __rasterize / __probeImage through browse load-html +
js --out. Render contract per D3: securityLevel strict, per-fence ids,
print-css font lock, htmlLabels off (canvas-taint-safe). Deterministic
build (same sha twice); drift test pins dist == BUILD_INFO == package.json
pins and rebuild-reproducibility when toolchain matches. Spike-proven
offline: flowchart + sequence SVG, editable .excalidraw scene, 300dpi PNG.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(diagram-render): __downscaleRaster for print-resolution image normalization

Data-URI rasters re-encode in their own format (JPEG stays JPEG at q0.9 —
PNG-encoding photos bloats them) at an explicit target pixel width. Used by
make-pdf's pre-pass for the 300dpi content-box ceiling (eng-review D4).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): diagram pre-pass — mermaid/excalidraw fences render as vector SVG; local images inline as data URIs

```mermaid / ```excalidraw fences extract to placeholder tokens, render in
one diagram-render bundle tab per run (reset contract: bundle page reloads
after any render error), and substitute back as accessible <figure> blocks
with the raw source preserved in a comment. Render failures produce a loud
red diagnostic block, never silent raw code. render=false keeps a fence as
code; title="..." becomes the aria-label and caption.

Local images now actually render: page.setContent loads at about:blank
(tab-session.ts:194), so relative paths silently 404'd before. The pre-pass
resolves them against the markdown's directory, inlines as data URIs, probes
intrinsic dimensions from the bytes (pure-TS PNG/JPEG/GIF/WebP/SVG sniffing),
and downscales rasters wider than 2x the content box at 300dpi. Remote URLs
warn (offline posture, --allow-network exempts); missing files get a visible
placeholder; --strict hard-fails both for CI pipelines.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): diagram pre-pass unit suite + e2e render gates

34 unit tests (fence extraction incl. nested/tilde/unclosed/render=false,
info-string parsing, slot substitution, diagnostic/figure escaping + SVG
script strip, byte-level dimension probing across 5 formats, content-box
math, image inlining incl. strict/remote/missing/data-URI paths). E2E gate
proves through the compiled binary: both fences render as vector text
(id-collision check), raw mermaid ships only via render=false, broken fence
yields the diagnostic block, and the relative fixture image rasterizes to
colored pixels (CRITICAL regression for the about:blank image fix).
--strict exits non-zero on a missing image.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): width directives + conservative auto-landscape via CSS named pages

`![a](x.png){width=full|<pct>|<dim>}` and `{page=landscape|portrait}`
suffixes translate to data-gstack-* attrs in render() (before the sanitizer,
which keeps data- attributes; unrecognized brace groups stay visible text).
Default width rule needs no code: intrinsic CSS-px capped at the content box,
never upscaled — figure img max-width owns it.

Auto-landscape promotes a block to `@page wide { size: <pagesize> landscape }`
only when aspect >= 1.8 AND intrinsic width > 2.5x the content box (~1600px on
letter) AND diagram provenance (rendered fences) or a whole-word alt token
(diagram|architecture|flowchart|chart|graph) for plain images. {page=...}
forces or vetoes; fence info strings accept page=... too. preferCSSPageSize
is passed to Chromium only when a promotion exists, so every other document
prints exactly as before. False negatives are cheap; false positives feel
broken (eng-review P4, Codex challenge accepted).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): width-policy unit suite + landscape e2e gate with negative fixtures

24 unit tests weighted toward the false-positive guards: wide screenshot
without an alt hint stays portrait, sub-threshold and tall images stay
portrait, deterministic 1560/1561px boundary, whole-word alt matching
('photographic' must not match 'graph'), page=portrait veto beats every
heuristic, diagnostic blocks never promote. E2E gate asserts pdfinfo
per-page boxes through the compiled binary: exactly 3 of 5 fixture blocks
get landscape pages (alt-hinted image, directive-forced image, wide sequence
diagram) while the unhinted screenshot and the veto'd diagram stay portrait —
plus the --toc combo proving TOC and named-page landscape coexist.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): --to html|docx output formats

--to html writes the assembled self-contained document directly (no print
round-trip): inline vector diagrams, data-URI images, zero network
references, plus an @media screen layer for browser reading. --to docx is
the content-fidelity export (eng-review P8): html-to-docx@1.8.0 (exact pin;
pure JS, bun-compile-verified) maps headings/tables/code/lists; diagrams and
SVG images rasterize at 300dpi of the content-box width via the render tab;
diagnostic figures convert to plain p/pre so the converter can't silently
drop an error. --format keeps its page-size-alias meaning; --to is the
output format, and the CLI says so when confused.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): format gate — html no-network-refs + docx zip content checks

HTML: zero src/href network refs, no script/link tags, inline SVG diagrams,
data-URI images, screen layer, diagnostic survives. DOCX: valid OOXML zip
(document.xml + Content_Types), >=2 PNG media (diagram raster + fixture
image), headings + render=false source + diagnostic text in document.xml,
no leaked mermaid source from rendered fences. Plus --to validation UX.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(diagram): /diagram skill — English in, editable diagram triplet out

New skill: agent authors mermaid from the user's description and renders the
triplet through the offline diagram-render bundle in the browse daemon —
.mmd source (the single source of truth), editable .excalidraw (opens at
excalidraw.com, round-trips back through re-render), and SVG + PNG. Flowcharts
convert to fully editable scenes; other mermaid types render with an explicit
upstream-converter limitation note. Never ships an unrendered source file;
offline is the contract (no CDN fallback). Inventory rows in AGENTS.md +
docs/skills.md; generated SKILL.md + llms.txt via gen:skill-docs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(diagram): paid E2E pair — gate triplet contract + periodic authoring judge

diagram-triplet (gate, deterministic functional): a fresh claude -p agent
following the skill extract must emit a parseable triplet — graph LR/TD in
.mmd, excalidraw scene with >3 elements, SVG markup, PNG magic bytes.
Verified live: pass, $0.17, 58s. diagram-authoring-quality (periodic,
LLM-judged): faithfulness/labels/size rubric with a diagnostic-path cap,
floor 6/10. Verified live: pass at exactly 6 with substantive critique.
Touchfiles select both on diagram/** and lib/diagram-render/** changes;
tier split per E2E_TIERS rules (eng-review D5).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(diagram): register /diagram in the skill coverage matrix

Gate: triplet contract + structural floor; periodic: authoring-quality judge.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): typography scale-up, zero image truncation, landscape vertical centering

Dogfooding round on the repo README surfaced four output-quality bugs:

- Type was too small everywhere: body 11→12pt, h1 22→26pt, h2 15→18pt,
  cover title 32→56pt with poster spacing, cover meta 10→13pt, TOC 11→12pt
  with tighter leading, code 9.5→10.5pt, tables 10→11pt.
- Zero image truncation, ever: the max-width cap was figure-scoped, but
  markdown images render as <p><img> — a 1850px GitHub screenshot ran off
  the page edge. Global img { max-width: 100%; height: auto; } cap.
- hyphens: auto put real 'dif-\nferent' breaks into the PDF text layer the
  moment 12pt made lines wrap (combined-gate caught it). Clean copy-paste
  is the product contract; left-aligned rag doesn't need hyphenation →
  hyphens: manual.
- Promoted landscape blocks now vertically center. CSS flex/min-height
  centering fragments into phantom empty landscape pages in Chromium
  (bisected: min-height at ANY value; 3 promotions printed 5 pages), so
  image-policy computes an inline margin-top from each block's known
  aspect ratio against the landscape content box instead — fragmentation
  handles margins fine. .page-wide also drops its explicit break-before/
  after (the page-name change already breaks on both sides).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): pin zero-truncation invariant, typography floor, centering math

Global img cap pinned as a regex invariant (the figure-scoped-cap regression
class); typography floor (12pt body, 56pt cover, 12pt TOC); .page-wide must
NOT carry min-height/flex (the phantom-landscape-page regression class);
centering margin math verified both ways (2400×1000 image → 1.38in,
2050×600 viewBox diagram → 1.93in, page-filling directive block → no margin).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: diagram + multi-format documentation across README, make-pdf skill, and how-to guide

README gains /make-pdf (Publisher) and /diagram (Diagram Maker) rows in the
sprint table. make-pdf's skill doc — the agent-facing contract — gains Core
patterns for mermaid/excalidraw fences (title/render=false/page= options),
the image policy ({width=}/{page=} directives, zero-truncation, conservative
auto-landscape), --to html|docx, and --strict, plus the --to vs --format
disambiguation in Common flags. New docs/howto-diagrams-and-formats.md is
the user-facing walkthrough: fences, directives, formats, /diagram triplet,
the mermaid racetrack trick, troubleshooting.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf): fill ship-audit coverage gaps — downscale, reset contract, excalidraw fence, WebP

Ship coverage audit found 9 gaps (85%); this fills the 2 HIGH + 3 MEDIUM and
most LOW. diagram-gate fixture gains a 4200px incompressible photo (the only
live coverage of __downscaleRaster AND the 64KB chunked jsViaBuffer eval
transport — asserted via the downscale stderr warning), an ```excalidraw
scene fence rendered through exportToSvg (vector labels + caption in
pdftotext, no leaked scene JSON), and the broken fence MOVED BETWEEN the two
mermaid fences so the second diagram rendering proves the D6.2 reset
contract end-to-end. New coverage-gaps.test.ts (16 tests): mock-tab reset
contract (exactly one reload, post-failure fence renders), excalidraw
fail-fast diagnostic without a bundle call, rasterize error fallbacks
(figure/tag kept, never silent), WebP VP8/VP8L/VP8X byte parsers,
landscapeContentBox a4/asymmetric margins, bare-token slot fallback,
resolveBundlePath env override + error shape, screenCss media scoping.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(make-pdf): pre-landing review wave — fence fidelity, injection hardening, Windows paths, transport rework

Review army (6 specialists + red team) findings, all fixed:

- Indented fences replay byte-for-byte and indented diagram fences are NOT
  extracted (red-team conf-9: the pre-pass reconstructed fences at column 0,
  splitting any list containing fenced code — every ordinary document).
- String.replace $-pattern injection killed at every seam: substituteSlots,
  mergeStyle, img/src rewrites all use function replacements (a diagram label
  containing $' duplicated the document tail).
- Big-expression transport reworked: browse `eval <file>` (one spawn, any
  size, Windows-safe) replaces the 64KB chunked window-buffer eval — fixes
  the per-chunk spawn cost, the char-vs-byte argv units, AND the Windows
  32,767-char command-line ceiling in one move.
- Staged-bundle trust: content verified by hash even when the file exists,
  and the rename-failure path re-hashes the survivor (sticky-bit /tmp EPERM
  would otherwise ride a pre-planted file past the check).
- Windows drive-letter img srcs (C:/x.png) reach the local-path branch
  instead of being swallowed as unknown URL schemes.
- DOCX rasterize-failure now embeds the decoded source as visible text —
  returning the figure made diagrams vanish silently (converter drops svg).
- Fence source preserved as base64 data-gstack-source attribute (the comment
  encoding corrupted every '-->' arrow); decodeFigureSource() round-trips.
- inlineLocalImages memoizes per path; file:// uses fileURLToPath; preview
  prints a divergence note for fences/local images; --to docx strips the
  watermark div and warns about print-only flags; TOC links resolve in
  html/docx (heading ids assigned); waitForExpression sleeps instead of
  busy-spinning; escapeHtml/svg-dims deduped to single definitions;
  typography stragglers (blockquote 12pt, footnotes 10pt, 42em screen
  measure); bundle BUILD_INFO gains srcSha256 for no-node_modules drift
  detection; MAX_TARGET_PX shared guard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ci: make-pdf gate covers the diagram-render bundle; bundle pinned to LF

make-pdf-gate.yml paths gain lib/diagram-render/** and the drift test (a
bundle-only PR previously skipped every render gate AND no CI lane ran the
drift check at all). .gitattributes pins dist html/json to LF so Windows
autocrlf can't break the hash-pinned bundle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(make-pdf)+feat(diagram): review-wave test pins + skill transport hardening

Tests: indented-fence byte-for-byte replay + no-extraction-in-lists,
drive-letter local-path routing, $-pattern slot immunity, base64 source
round-trip ('A --> B' exact), existing-style merge preservation, DOCX
rasterize-failure surfaces source, srcSha256 + font-stack drift guards,
landscape veto asserted as some-portrait/no-landscape (layout-order-proof),
judge rubric cap lowered to 5 so it actually fails, vacuous error-shape test
removed honestly, tmpdir cleanup.

/diagram skill: base64 transport (template literals corrupted backticks/${
in sources), content-addressed staging with hash verification, and --tab-id
pinned on every browse call so a concurrent /qa session can't be clobbered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(make-pdf): out-of-tree image reads warn; --strict makes them fatal (D8.1)

Local CLI semantics stay (absolute paths and ../ still inline, like pandoc),
but never silently: an agent PDF-ing untrusted markdown can't quietly embed a
file from outside the input directory into a shareable document without a
visible warning, and --strict pipelines hard-fail. Two unit tests. Also:
TODOS.md gains the deferred e2e-harness dedup entry (D8.2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: pre-existing test failure in skill-e2e-bws operational-learning

Root cause was the fixture, not model behavior: gstack-learnings-log gained
an import of lib/jsonl-store.ts in the v1.57.5.0 injection-sanitization wave,
but the test copies only bin/ scripts into its sandbox — the inline bun
import failed and the script exited 1 before writing, on every run, on main
too (reproduced at a5833c41). Fixture now stages lib/jsonl-store.ts beside
bin/; verified deterministically (script exits 0, learning written) and via
the paid test (1 pass).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(make-pdf): adversarial-review wave — offline posture enforced, symlink-aware confinement, bounded reads

Codex adversarial + structured review findings:

- Remote images are now BLOCKED with a visible placeholder instead of
  warn-and-keep — leaving the tag meant Chromium fetched the URL at print
  time anyway, so the offline posture was a lie (tracking pixels and
  internal-URL probes ran without --allow-network).
- The out-of-tree read check compares REAL paths: a symlink inside the input
  dir pointing at ~/.ssh/... passed the string-prefix check, including under
  --strict. Ordered after the existence check (realpath of a missing file
  false-positives on macOS /var → /private/var).
- Image reads are bounded BEFORE reading: statSync first, non-regular files
  (fifo/device/dir) and >64MB files degrade to placeholders instead of
  hanging or exhausting memory; malformed percent-encoding (foo%zz.png)
  degrades to missing-image instead of crashing decodeURIComponent.
- browse shell-outs get a 120s timeout — a wedged daemon or hostile mermaid
  source fails the run instead of hanging it.
- TOC entries link to the heading's ACTUAL id (pre-id'd raw-HTML headings
  previously got dead #toc-N links); per-side margins compose into the CSS
  @page shorthand so a landscape promotion flipping preferCSSPageSize no
  longer silently reverts --margin-left/right to defaults (Codex P2).
- The image memo is a typed object — literal NUL-byte separators had made
  diagram-prepass.ts register as binary to text tooling.

Codex structured review GATE: PASS (no P1).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.0.0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: sync make-pdf image-policy docs with final shipped behavior (v1.58.0.0)

The docs wave (87594420) predated the final review-wave commits, so two
docs drifted from shipped behavior:

- make-pdf/SKILL.md.tmpl + generated SKILL.md: remote images are BLOCKED
  with a visible placeholder (not warned-and-kept); out-of-tree reads
  (including via symlink) warn and --strict makes them fatal; --strict
  also covers oversized (>64MB) and non-regular files; troubleshooting
  entry now names the actual "[remote image blocked]" symptom.
- docs/howto-diagrams-and-formats.md: same corrections in the image
  section, CI section, and troubleshooting.
- README.md: docs/howto-diagrams-and-formats.md added to the Docs table
  (was unreachable from any entry-point doc).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: apply Codex doc-review findings for v1.58.0.0

Cross-model doc review (Codex, read-only) checked the v1.58.0.0 docs
against the shipped code. Fixes:

- howto + make-pdf SKILL: diagram source is preserved base64 in a
  data-gstack-source attribute, not an HTML comment (-- in mermaid
  arrows would corrupt a comment); fences must start at column 0;
  fence options example gains page=portrait; --to html "zero network
  refs" qualified (--allow-network deliberately keeps remote tags).
- /diagram description, README + docs/skills.md rows: the hand-drawn
  aesthetic belongs to the .excalidraw artifact; rendered SVG/PNG use
  mermaid's clean neutral theme (lib/diagram-render entry.ts pins
  theme: "neutral").
- CHANGELOG v1.58.0.0 wording: --strict coverage lists all five fatal
  classes (missing/remote/out-of-tree/oversized/non-regular); fences
  are vector SVG in pdf+html, 300dpi PNG in docx; hand-drawn claim
  scoped to the .excalidraw file.
- lib/diagram-render/README: Page API table gains __downscaleRaster.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 15:38:53 -07:00
.github v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
agents feat: user sovereignty — AI models recommend, users decide (v0.13.2.0) (#603) 2026-03-28 10:25:37 -06:00
autoplan v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
benchmark v1.57.4.0 refactor(ethos): rename Boil the Lake principle to Boil the Ocean (#1912) 2026-06-08 05:41:07 -07:00
benchmark-models v1.57.4.0 refactor(ethos): rename Boil the Lake principle to Boil the Ocean (#1912) 2026-06-08 05:41:07 -07:00
bin v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
browse v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
browser-skills/hackernews-frontpage v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391) 2026-05-09 08:06:47 -07:00
canary v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
careful v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
claude v1.13.0.0 feat: add Claude outside-voice skill (#1212) 2026-04-25 11:52:48 -07:00
codex v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916) 2026-06-08 21:17:18 -07:00
context-restore v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
context-save v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
contrib/add-host feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005) 2026-04-16 10:41:38 -07:00
cso v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
design v1.45.0.0 feat(design): persistent board daemon — 24h boards, one tab, board history (#1710) 2026-05-25 20:45:12 -07:00
design-consultation v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
design-html v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
design-review v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
design-shotgun v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
devex-review v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916) 2026-06-08 21:17:18 -07:00
diagram v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
docs v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
document-generate v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
document-release v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
extension v1.51.0.0 feat: $B memory diagnostic + 4 CDP-resource leak fixes (#1751) 2026-05-27 16:09:38 -07:00
freeze v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
gstack v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
gstack-upgrade v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
guard v1.57.6.0 fix wave: 8 community bugs (4 security guards failing open) (#1911) 2026-06-08 06:39:38 -07:00
health v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
hosts v1.57.2.0 feat: AskUserQuestion prose fallback when the tool fails at runtime (#1908) 2026-06-07 21:38:21 -07:00
investigate v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
ios-clean v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
ios-design-review v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
ios-fix v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
ios-qa v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
ios-sync v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
land-and-deploy v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
landing-report v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
learn v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
lib v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
make-pdf v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
model-overlays feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166) 2026-04-23 18:42:58 -07:00
office-hours v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
open-gstack-browser v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
openclaw community wave: 6 PRs + hardening (v0.18.1.0) (#1028) 2026-04-17 00:45:13 -07:00
pair-agent v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
plan-ceo-review v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
plan-design-review v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916) 2026-06-08 21:17:18 -07:00
plan-devex-review v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
plan-eng-review v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
plan-tune v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
qa v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
qa-only v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
retro v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
review v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
scrape v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
scripts v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
setup-browser-cookies v1.57.4.0 refactor(ethos): rename Boil the Lake principle to Boil the Ocean (#1912) 2026-06-08 05:41:07 -07:00
setup-deploy v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
setup-gbrain v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
ship v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966) 2026-06-10 21:14:58 -07:00
skillify v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
spec v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
supabase v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592) 2026-05-20 06:56:41 -07:00
sync-gbrain v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910) 2026-06-08 06:20:58 -07:00
test v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
unfreeze v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712) 2026-05-26 16:50:03 -07:00
.env.example feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
.gitattributes v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
.gitignore v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
.gitlab-ci.yml v1.11.0.0 feat(ship): workspace-aware version allocation (#1168) 2026-04-23 23:03:27 -07:00
AGENTS.md v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
ARCHITECTURE.md v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505) 2026-05-14 21:19:58 -07:00
BROWSER.md v1.57.8.0 feat: browse js/eval --out render-to-file (canonical Chromium for offline rendering) (#1929) 2026-06-09 21:02:30 -07:00
CHANGELOG.md v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
CLAUDE.md v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) 2026-06-09 22:29:23 -07:00
CONTRIBUTING.md v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) 2026-06-09 22:29:23 -07:00
DESIGN.md feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517) 2026-03-26 11:15:24 -06:00
ETHOS.md v1.57.4.0 refactor(ethos): rename Boil the Lake principle to Boil the Ocean (#1912) 2026-06-08 05:41:07 -07:00
LICENSE Initial release — gstack v0.0.1 2026-03-12 01:32:16 -07:00
README.md v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
SKILL.md v1.57.8.0 feat: browse js/eval --out render-to-file (canonical Chromium for offline rendering) (#1929) 2026-06-09 21:02:30 -07:00
SKILL.md.tmpl v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733) 2026-05-26 21:36:53 -07:00
TODOS.md v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
USING_GBRAIN_WITH_GSTACK.md v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808) 2026-05-30 14:57:07 -07:00
VERSION v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
bun.lock v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
bunfig.toml v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678) 2026-05-24 01:43:51 -07:00
conductor.json feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) 2026-03-13 21:08:12 -07:00
connect-chrome feat: GStack Browser — double-click AI browser with anti-bot stealth (#695) 2026-04-04 10:17:05 -07:00
package.json v1.58.0.0 feat: diagram + multi-format document engine (mermaid, excalidraw, single-file HTML, DOCX) (#1990) 2026-06-12 15:38:53 -07:00
setup v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) 2026-06-09 22:29:23 -07:00
slop-scan.config.json refactor: AI slop reduction with cross-model quality review (v0.16.3.0) (#941) 2026-04-10 17:13:15 -10:00
test-setup.ts v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678) 2026-05-24 01:43:51 -07:00

README.md

gstack

"I don't think I've typed like a line of code probably since December, basically, which is an extremely large change." — Andrej Karpathy, No Priors podcast, March 2026

When I heard Karpathy say this, I wanted to find out how. How does one person ship like a team of twenty? Peter Steinberger built OpenClaw — 247K GitHub stars — essentially solo with AI agents. The revolution is here. A single builder with the right tooling can move faster than a traditional team.

I'm Garry Tan, President & CEO of Y Combinator. I've worked with thousands of startups — Coinbase, Instacart, Rippling — when they were one or two people in a garage. Before YC, I was one of the first eng/PM/designers at Palantir, cofounded Posterous (sold to Twitter), and built Bookface, YC's internal social network.

gstack is my answer. I've been building products for twenty years, and right now I'm shipping more products than I ever have. In the last 60 days: 3 production services, 40+ shipped features, part-time, while running YC full-time. On logical code change — not raw LOC, which AI inflates — my 2026 run rate is ~810× my 2013 pace (11,417 vs 14 logical lines/day). Year-to-date (through April 18), 2026 has already produced 240× the entire 2013 year. Measured across 40 public + private garrytan/* repos including Bookface, after excluding one demo repo. AI wrote most of it. The point isn't who typed it, it's what shipped.

The LOC critics aren't wrong that raw line counts inflate with AI. They are wrong that normalized-for-inflation, I'm less productive. I'm more productive, by a lot. Full methodology, caveats, and reproduction script: On the LOC Controversy.

2026 — 1,237 contributions and counting:

GitHub contributions 2026 — 1,237 contributions, massive acceleration in Jan-Mar

2013 — when I built Bookface at YC (772 contributions):

GitHub contributions 2013 — 772 contributions building Bookface at YC

Same person. Different era. The difference is the tooling.

gstack is how I do it. It turns Claude Code into a virtual engineering team — a CEO who rethinks the product, an eng manager who locks architecture, a designer who catches AI slop, a reviewer who finds production bugs, a QA lead who opens a real browser, a security officer who runs OWASP + STRIDE audits, and a release engineer who ships the PR. Twenty-three specialists and eight power tools, all slash commands, all Markdown, all free, MIT license.

This is my open source software factory. I use it every day. I'm sharing it because these tools should be available to everyone.

Fork it. Improve it. Make it yours. And if you want to hate on free open source software — you're welcome to, but I'd rather you just try it first.

Who this is for:

  • Founders and CEOs — especially technical ones who still want to ship
  • First-time Claude Code users — structured roles instead of a blank prompt
  • Tech leads and staff engineers — rigorous review, QA, and release automation on every PR

Quick start

  1. Install gstack (30 seconds — see below)
  2. Run /office-hours — describe what you're building
  3. Run /plan-ceo-review on any feature idea
  4. Run /review on any branch with changes
  5. Run /qa on your staging URL
  6. Stop there. You'll know if this is for you.

Install — 30 seconds

Requirements: Claude Code, Git, Bun v1.0+, Node.js (Windows only)

Step 1: Install on your machine

Open Claude Code and paste this. Claude does the rest.

Install gstack: run git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp__claude-in-chrome__* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /connect-chrome, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /setup-gbrain, /retro, /investigate, /document-release, /document-generate, /codex, /cso, /autoplan, /plan-devex-review, /devex-review, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, /learn. Then ask the user if they also want to add gstack to the current project so teammates get it.

From inside your repo, paste this. Switches you to team mode, bootstraps the repo so teammates get gstack automatically, and commits the change:

(cd ~/.claude/skills/gstack && ./setup --team) && ~/.claude/skills/gstack/bin/gstack-team-init required && git add .claude/ CLAUDE.md && git commit -m "require gstack for AI-assisted work"

No vendored files in your repo, no version drift, no manual upgrades. Every Claude Code session starts with a fast auto-update check (throttled to once/hour, network-failure-safe, completely silent).

Swap required for optional if you'd rather nudge teammates than block them.

OpenClaw

OpenClaw spawns Claude Code sessions via ACP, so every gstack skill just works when Claude Code has gstack installed. Paste this to your OpenClaw agent:

Install gstack: run git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup to install gstack for Claude Code. Then add a "Coding Tasks" section to AGENTS.md that says: when spawning Claude Code sessions for coding work, tell the session to use gstack skills. Include these examples — security audit: "Load gstack. Run /cso", code review: "Load gstack. Run /review", QA test a URL: "Load gstack. Run /qa https://...", build a feature end-to-end: "Load gstack. Run /autoplan, implement the plan, then run /ship", plan before building: "Load gstack. Run /office-hours then /autoplan. Save the plan, don't implement."

After setup, just talk to your OpenClaw agent naturally:

You say What happens
"Fix the typo in README" Simple — Claude Code session, no gstack needed
"Run a security audit on this repo" Spawns Claude Code with Run /cso
"Build me a notifications feature" Spawns Claude Code with /autoplan → implement → /ship
"Help me plan the v2 API redesign" Spawns Claude Code with /office-hours → /autoplan, saves plan

See docs/OPENCLAW.md for advanced dispatch routing and the gstack-lite/gstack-full prompt templates.

Native OpenClaw Skills (via ClawHub)

Four methodology skills that work directly in your OpenClaw agent, no Claude Code session needed. Install from ClawHub:

clawhub install gstack-openclaw-office-hours gstack-openclaw-ceo-review gstack-openclaw-investigate gstack-openclaw-retro
Skill What it does
gstack-openclaw-office-hours Product interrogation with 6 forcing questions
gstack-openclaw-ceo-review Strategic challenge with 4 scope modes
gstack-openclaw-investigate Root cause debugging methodology
gstack-openclaw-retro Weekly engineering retrospective

These are conversational skills. Your OpenClaw agent runs them directly via chat.

Other AI Agents

gstack works on 10 AI coding agents, not just Claude. Setup auto-detects which agents you have installed:

git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup

Or target a specific agent with ./setup --host <name>:

Agent Flag Skills install to
OpenAI Codex CLI --host codex ~/.codex/skills/gstack-*/
OpenCode --host opencode ~/.config/opencode/skills/gstack-*/
Cursor --host cursor ~/.cursor/skills/gstack-*/
Factory Droid --host factory ~/.factory/skills/gstack-*/
Slate --host slate ~/.slate/skills/gstack-*/
Kiro --host kiro ~/.kiro/skills/gstack-*/
Hermes --host hermes ~/.hermes/skills/gstack-*/
GBrain (mod) --host gbrain ~/.gbrain/skills/gstack-*/

Want to add support for another agent? See docs/ADDING_A_HOST.md. It's one TypeScript config file, zero code changes.

See it work

You:    I want to build a daily briefing app for my calendar.
You:    /office-hours
Claude: [asks about the pain — specific examples, not hypotheticals]

You:    Multiple Google calendars, events with stale info, wrong locations.
        Prep takes forever and the results aren't good enough...

Claude: I'm going to push back on the framing. You said "daily briefing
        app." But what you actually described is a personal chief of
        staff AI.
        [extracts 5 capabilities you didn't realize you were describing]
        [challenges 4 premises — you agree, disagree, or adjust]
        [generates 3 implementation approaches with effort estimates]
        RECOMMENDATION: Ship the narrowest wedge tomorrow, learn from
        real usage. The full vision is a 3-month project — start with
        the daily briefing that actually works.
        [writes design doc → feeds into downstream skills automatically]

You:    /plan-ceo-review
        [reads the design doc, challenges scope, runs 10-section review]

You:    /plan-eng-review
        [ASCII diagrams for data flow, state machines, error paths]
        [test matrix, failure modes, security concerns]

You:    Approve plan. Exit plan mode.
        [writes 2,400 lines across 11 files. ~8 minutes.]

You:    /review
        [AUTO-FIXED] 2 issues. [ASK] Race condition → you approve fix.

You:    /qa https://staging.myapp.com
        [opens real browser, clicks through flows, finds and fixes a bug]

You:    /ship
        Tests: 42 → 51 (+9 new). PR: github.com/you/app/pull/42

You said "daily briefing app." The agent said "you're building a chief of staff AI" — because it listened to your pain, not your feature request. Eight commands, end to end. That is not a copilot. That is a team.

The sprint

gstack is a process, not a collection of tools. The skills run in the order a sprint runs:

Think → Plan → Build → Review → Test → Ship → Reflect

Each skill feeds into the next. /office-hours writes a design doc that /plan-ceo-review reads. /plan-eng-review writes a test plan that /qa picks up. /review catches bugs that /ship verifies are fixed. Nothing falls through the cracks because every step knows what came before it.

Skill Your specialist What they do
/office-hours YC Office Hours Start here. Six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives. Design doc feeds into every downstream skill.
/plan-ceo-review CEO / Founder Rethink the problem. Find the 10-star product hiding inside the request. Four modes: Expansion, Selective Expansion, Hold Scope, Reduction.
/plan-eng-review Eng Manager Lock in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open.
/plan-design-review Senior Designer Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. AI Slop detection. Interactive — one AskUserQuestion per design choice.
/plan-devex-review Developer Experience Lead Interactive DX review: explores developer personas, benchmarks against competitors' TTHW, designs your magical moment, traces friction points step by step. Three modes: DX EXPANSION, DX POLISH, DX TRIAGE. 20-45 forcing questions.
/design-consultation Design Partner Build a complete design system from scratch. Researches the landscape, proposes creative risks, generates realistic product mockups.
/review Staff Engineer Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps.
/investigate Debugger Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes.
/design-review Designer Who Codes Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots.
/devex-review DX Tester Live developer experience audit. Actually tests your onboarding: navigates docs, tries the getting started flow, times TTHW, screenshots errors. Compares against /plan-devex-review scores — the boomerang that shows if your plan matched reality.
/design-shotgun Design Explorer "Show me options." Generates 4-6 AI mockup variants, opens a comparison board in your browser, collects your feedback, and iterates. Taste memory learns what you like. Repeat until you love something, then hand it to /design-html.
/design-html Design Engineer Turn a mockup into production HTML that actually works. Pretext computed layout: text reflows, heights adjust, layouts are dynamic. 30KB, zero deps. Detects React/Svelte/Vue. Smart API routing per design type (landing page vs dashboard vs form). The output is shippable, not a demo.
/qa QA Lead Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix.
/qa-only QA Reporter Same methodology as /qa but report only. Pure bug report without code changes.
/pair-agent Multi-Agent Coordinator Share your browser with any AI agent. One command, one paste, connected. Works with OpenClaw, Hermes, Codex, Cursor, or anything that can curl. Each agent gets its own tab. Auto-launches headed mode so you watch everything. Auto-starts ngrok tunnel for remote agents. Scoped tokens, tab isolation, rate limiting, activity attribution.
/cso Chief Security Officer OWASP Top 10 + STRIDE threat model. Zero-noise: 17 false positive exclusions, 8/10+ confidence gate, independent finding verification. Each finding includes a concrete exploit scenario.
/ship Release Engineer Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one.
/land-and-deploy Release Engineer Merge the PR, wait for CI and deploy, verify production health. One command from "approved" to "verified in production."
/canary SRE Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures.
/benchmark Performance Engineer Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR.
/document-release Technical Writer Update all project docs to match what you just shipped. Catches stale READMEs automatically. Builds a Diataxis coverage map (reference / how-to / tutorial / explanation) so gaps are visible in the PR body.
/document-generate Documentation Author Generate missing docs from scratch using the Diataxis framework. Researches the codebase first, then writes reference / how-to / tutorial / explanation docs that actually match the code. Invokable standalone or chained from /document-release when the coverage map finds gaps. Learn more: tutorialhow-towhy Diataxis.
/retro Eng Manager Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. /retro global runs across all your projects and AI tools (Claude Code, Codex, Gemini).
/browse QA Engineer Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. /open-gstack-browser launches GStack Browser with sidebar, anti-bot stealth, and auto model routing.
/setup-browser-cookies Session Manager Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages.
/autoplan Review Pipeline One command, fully reviewed plan. Runs CEO → design → eng review automatically with encoded decision principles. Surfaces only taste decisions for your approval.
/spec Spec Author Turn vague intent into a precise, executable spec in five phases (why, scope, technical with mandatory code-reading, draft, file). Codex quality gate before file (blocks below 7/10), fail-closed secret redaction, dedupe against existing issues, archive to $GSTACK_STATE_ROOT/projects/$SLUG/specs/ for team-corpus recall. --execute spawns claude -p in a fresh worktree; /ship auto-closes the source issue on merge. Plan-mode aware.
/learn Memory Manage what gstack learned across sessions. Review, search, prune, and export project-specific patterns, pitfalls, and preferences. Learnings compound across sessions so gstack gets smarter on your codebase over time.
/make-pdf Publisher Markdown in, publication-quality document out. Mermaid and excalidraw fences render as vector diagrams, fully offline. Images scale to the page and never truncate; wide diagrams get their own landscape page. --to html emits one self-contained file, --to docx a Word doc.
/diagram Diagram Maker English in, editable diagram out. Emits a triplet: mermaid source, .excalidraw you can open and edit on excalidraw.com (hand-drawn style), and rendered SVG/PNG. Zero network. Embed the source in markdown and /make-pdf renders it.

Which review should I use?

Building for... Plan stage (before code) Live audit (after shipping)
End users (UI, web app, mobile) /plan-design-review /design-review
Developers (API, CLI, SDK, docs) /plan-devex-review /devex-review
Architecture (data flow, perf, tests) /plan-eng-review /review
All of the above /autoplan (runs CEO → design → eng → DX, auto-detects which apply)

Power tools

Skill What it does
/codex Second Opinion — independent code review from OpenAI Codex CLI. Three modes: review (pass/fail gate), adversarial challenge, and open consultation. Cross-model analysis when both /review and /codex have run.
/careful Safety Guardrails — warns before destructive commands (rm -rf, DROP TABLE, force-push). Say "be careful" to activate. Override any warning.
/freeze Edit Lock — restrict file edits to one directory. Prevents accidental changes outside scope while debugging.
/guard Full Safety/careful + /freeze in one command. Maximum safety for prod work.
/unfreeze Unlock — remove the /freeze boundary.
/open-gstack-browser GStack Browser — launch GStack Browser with sidebar, anti-bot stealth, auto model routing (Sonnet for actions, Opus for analysis), one-click cookie import, and Claude Code integration. Clean up pages, take smart screenshots, edit CSS, and pass info back to your terminal.
/setup-deploy Deploy Configurator — one-time setup for /land-and-deploy. Detects your platform, production URL, and deploy commands.
/setup-gbrain GBrain Onboarding — from zero to running gbrain in under 5 minutes. PGLite local, Supabase existing URL, or auto-provision a new Supabase project via Management API. MCP registration for Claude Code + per-repo trust triad (read-write/read-only/deny). Full guide.
/sync-gbrain Keep Brain Current — re-index this repo's code into gbrain via gbrain sources add + gbrain sync --strategy code, refresh the ## GBrain Search Guidance block in CLAUDE.md, and auto-remove guidance when the capability check fails. --incremental (default), --full, --dry-run. Idempotent; safe to re-run.
/gstack-upgrade Self-Updater — upgrade gstack to latest. Detects global vs vendored install, syncs both, shows what changed.
/ios-qa iOS Live-Device QA (v1.43.0.0+) — drive a real iPhone over USB CoreDevice via an embedded StateServer in the app. Read Swift source, codegen typed @Observable accessors, run the agent loop. Optional --tailnet flag exposes the device to OpenClaw or any HTTP-capable agent on your Tailscale tailnet so remote agents can run iOS QA without ever touching the hardware. Capability-tier allowlist (observe/interact/mutate/restore), per-device session lock, audit log.
/ios-fix, /ios-design-review, /ios-clean, /ios-sync iOS bug-fix loop, designer's-eye HIG audit, debug-bridge cleanup, and accessor resync. See docs/skills.md. End-to-end walkthrough: docs/howto-ios-testing-with-gstack.md.

New binaries (v0.19)

Beyond the slash-command skills, gstack ships standalone CLIs for workflows that don't belong inside a session:

Command What it does
gstack-model-benchmark Cross-model benchmark — run the same prompt through Claude, GPT (via Codex CLI), and Gemini; compare latency, tokens, cost, and (optionally) LLM-judge quality score. Auth detected per provider, unavailable providers skip cleanly. Output as table, JSON, or markdown. --dry-run validates flags + auth without spending API calls.
gstack-taste-update Design taste learning — writes approvals and rejections from /design-shotgun into a persistent per-project taste profile. Decays 5%/week. Feeds back into future variant generation so the system learns what you actually pick.
gstack-ios-qa-daemon iOS QA daemon — Mac-side broker between an agent and a connected iPhone over USB CoreDevice. Loopback by default; --tailnet opens a Tailscale-facing listener with identity-gated capability tiers. Single-instance via flock on ~/.gstack/ios-qa-daemon.pid. See docs/howto-ios-testing-with-gstack.md.
gstack-ios-qa-mint iOS allowlist manager — owner-grant CLI for the tailnet allowlist. grant/revoke/list against ~/.gstack/ios-qa-allowlist.json (mode 0600). Remote agents never auto-allowlist; this is the explicit-intent path.

Continuous checkpoint mode (opt-in, local by default)

Set gstack-config set checkpoint_mode continuous and skills auto-commit your work as you go with a WIP: prefix plus a structured [gstack-context] body (decisions, remaining work, failed approaches). Survives crashes and context switches. /context-restore reads those commits to reconstruct session state. /ship filter-squashes WIP commits before the PR (preserving non-WIP commits) so bisect stays clean. Push is opt-in via checkpoint_push=true — default is local-only so you don't trigger CI on every WIP commit.

Domain skills + raw CDP escape hatch

Two new browser primitives compound the gstack agent over time:

  • $B domain-skill save — agent saves a per-site note (e.g., "LinkedIn's Apply button lives in an iframe") that fires automatically next time it visits that hostname. Quarantined → active after 3 successful uses → optional cross-project promotion via $B domain-skill promote-to-global. Storage lives alongside /learn's per-project learnings file. Full reference: docs/domain-skills.md.
  • $B cdp <Domain.method> — raw Chrome DevTools Protocol escape hatch for the rare case curated commands miss. Deny-default: methods must be explicitly added to browse/src/cdp-allowlist.ts with a one-line justification. Two-tier mutex serializes browser-scoped CDP calls against per-tab work. Output for data-exfil methods is wrapped in the UNTRUSTED envelope.

Want raw CDP with no rails, no allowlist, no daemon — just thin transport from agent to Chrome? browser-use/browser-harness-js is a different philosophy (agent-authored helpers vs gstack's curated commands) and a good fit if you don't want gstack's security stack. The two can coexist: gstack's $B cdp and harness can both attach to the same Chrome via Playwright's newCDPSession.

Deep dives with examples and philosophy for every skill →

Karpathy's four failure modes? Already covered.

Andrej Karpathy's AI coding rules (17K stars) nail four failure modes: wrong assumptions, overcomplexity, orthogonal edits, imperative over declarative. gstack's workflow skills enforce all four. /office-hours forces assumptions into the open before code is written. The Confusion Protocol stops Claude from guessing on architectural decisions. /review catches unnecessary complexity and drive-by edits. /ship transforms tasks into verifiable goals with test-first execution. If you already use Karpathy-style CLAUDE.md rules, gstack is the workflow enforcement layer that makes them stick across entire sprints, not just single prompts.

Parallel sprints

gstack works well with one sprint. It gets interesting with ten running at once.

Design is at the heart. /design-consultation builds your design system from scratch, researches what's out there, proposes creative risks, and writes DESIGN.md. But the real magic is the shotgun-to-HTML pipeline.

/design-shotgun is how you explore. You describe what you want. It generates 4-6 AI mockup variants using GPT Image. Then it opens a comparison board in your browser with all variants side by side. You pick favorites, leave feedback ("more whitespace", "bolder headline", "lose the gradient"), and it generates a new round. Repeat until you love something. Taste memory kicks in after a few rounds so it starts biasing toward what you actually like. No more describing your vision in words and hoping the AI gets it. You see options, pick the good ones, and iterate visually.

/design-html makes it real. Take that approved mockup (from /design-shotgun, a CEO plan, a design review, or just a description) and turn it into production-quality HTML/CSS. Not the kind of AI HTML that looks fine at one viewport width and breaks everywhere else. This uses Pretext for computed text layout: text actually reflows on resize, heights adjust to content, layouts are dynamic. 30KB overhead, zero dependencies. It detects your framework (React, Svelte, Vue) and outputs the right format. Smart API routing picks different Pretext patterns depending on whether it's a landing page, dashboard, form, or card layout. The output is something you'd actually ship, not a demo.

/qa was a massive unlock. It let me go from 6 to 12 parallel workers. Claude Code saying "I SEE THE ISSUE" and then actually fixing it, generating a regression test, and verifying the fix — that changed how I work. The agent has eyes now.

Smart review routing. Just like at a well-run startup: CEO doesn't have to look at infra bug fixes, design review isn't needed for backend changes. gstack tracks what reviews are run, figures out what's appropriate, and just does the smart thing. The Review Readiness Dashboard tells you where you stand before you ship.

Test everything. /ship bootstraps test frameworks from scratch if your project doesn't have one. Every /ship run produces a coverage audit. Every /qa bug fix generates a regression test. 100% test coverage is the goal — tests make vibe coding safe instead of yolo coding.

/document-release is the engineer you never had. It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now /ship auto-invokes it — docs stay current without an extra command.

Real browser mode. /open-gstack-browser launches GStack Browser, an AI-controlled Chromium with anti-bot stealth, custom branding, and the sidebar extension baked in. Sites like Google and NYTimes work without captchas. The menu bar says "GStack Browser" instead of "Chrome for Testing." Your regular Chrome stays untouched. All existing browse commands work unchanged. $B disconnect returns to headless. The browser stays alive as long as the window is open... no idle timeout killing it while you're working.

Sidebar agent — your AI browser assistant. Type natural language in the Chrome side panel and a child Claude instance executes it. "Navigate to the settings page and screenshot it." "Fill out this form with test data." "Go through every item in this list and extract the prices." The sidebar auto-routes to the right model: Sonnet for fast actions (click, navigate, screenshot) and Opus for reading and analysis. Each task gets up to 5 minutes. The sidebar agent runs in an isolated session, so it won't interfere with your main Claude Code window. One-click cookie import right from the sidebar footer.

Personal automation. The sidebar agent isn't just for dev workflows. Example: "Browse my kid's school parent portal and add all the other parents' names, phone numbers, and photos to my Google Contacts." Two ways to get authenticated: (1) log in once in the headed browser, your session persists, or (2) click the "cookies" button in the sidebar footer to import cookies from your real Chrome. Once authenticated, Claude navigates the directory, extracts the data, and creates the contacts.

Prompt injection defense. Hostile web pages try to hijack your sidebar agent. gstack ships a layered defense: a 22MB ML classifier bundled with the browser scans every page and tool output locally, a Claude Haiku transcript check votes on the full conversation shape, a random canary token in the system prompt catches session exfil attempts across text, tool args, URLs, and file writes, and a verdict combiner requires two classifiers to agree before blocking (prevents single-model false positives on Stack Overflow-style instruction pages). A shield icon in the sidebar header shows status (green/amber/red). Opt in to a 721MB DeBERTa-v3 ensemble via GSTACK_SECURITY_ENSEMBLE=deberta for 2-of-3 agreement. Emergency kill switch: GSTACK_SECURITY_OFF=1. See ARCHITECTURE.md for the full stack.

Browser handoff when the AI gets stuck. Hit a CAPTCHA, auth wall, or MFA prompt? $B handoff opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, $B resume picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures.

/pair-agent is cross-agent coordination. You're in Claude Code. You also have OpenClaw running. Or Hermes. Or Codex. You want them both looking at the same website. Type /pair-agent, pick your agent, and a GStack Browser window opens so you can watch. The skill prints a block of instructions. Paste that block into the other agent's chat. It exchanges a one-time setup key for a session token, creates its own tab, and starts browsing. You see both agents working in the same browser, each in their own tab, neither able to interfere with the other. If ngrok is installed, the tunnel starts automatically so the other agent can be on a completely different machine. Same-machine agents get a zero-friction shortcut that writes credentials directly. This is the first time AI agents from different vendors can coordinate through a shared browser with real security: scoped tokens, tab isolation, rate limiting, domain restrictions, and activity attribution.

Multi-AI second opinion. /codex gets an independent review from OpenAI's Codex CLI — a completely different AI looking at the same diff. Three modes: code review with a pass/fail gate, adversarial challenge that actively tries to break your code, and open consultation with session continuity. When both /review (Claude) and /codex (OpenAI) have reviewed the same branch, you get a cross-model analysis showing which findings overlap and which are unique to each.

Safety guardrails on demand. Say "be careful" and /careful warns before any destructive command — rm -rf, DROP TABLE, force-push, git reset --hard. /freeze locks edits to one directory while debugging so Claude can't accidentally "fix" unrelated code. /guard activates both. /investigate auto-freezes to the module being investigated.

Proactive skill suggestions. gstack notices what stage you're in — brainstorming, reviewing, debugging, testing — and suggests the right skill. Don't like it? Say "stop suggesting" and it remembers across sessions.

10-15 parallel sprints

gstack is powerful with one sprint. It is transformative with ten running at once.

Conductor runs multiple Claude Code sessions in parallel — each in its own isolated workspace. One session running /office-hours on a new idea, another doing /review on a PR, a third implementing a feature, a fourth running /qa on staging, and six more on other branches. All at the same time. I regularly run 10-15 parallel sprints — that's the practical max right now.

The sprint structure is what makes parallelism work. Without a process, ten agents is ten sources of chaos. With a process — think, plan, build, review, test, ship — each agent knows exactly what to do and when to stop. You manage them the way a CEO manages a team: check in on the decisions that matter, let the rest run.

Voice input (AquaVoice, Whisper, etc.)

gstack skills have voice-friendly trigger phrases. Say what you want naturally — "run a security check", "test the website", "do an engineering review" — and the right skill activates. You don't need to remember slash command names or acronyms.

Uninstall

Option 1: Run the uninstall script

If gstack is installed on your machine:

~/.claude/skills/gstack/bin/gstack-uninstall

This handles skills, symlinks, global state (~/.gstack/), project-local state, browse daemons, and temp files. Use --keep-state to preserve config and analytics. Use --force to skip confirmation.

Option 2: Manual removal (no local repo)

If you don't have the repo cloned (e.g. you installed via a Claude Code paste and later deleted the clone):

# 1. Stop browse daemons
pkill -f "gstack.*browse" 2>/dev/null || true

# 2. Remove per-skill directories whose SKILL.md points into gstack/
find ~/.claude/skills -mindepth 1 -maxdepth 1 -type d ! -name gstack 2>/dev/null |
while IFS= read -r dir; do
  link="$dir/SKILL.md"
  [ -L "$link" ] || continue
  target=$(readlink "$link" 2>/dev/null) || continue
  case "$target" in
    gstack/*|*/gstack/*)
      rm -f "$link"
      rmdir "$dir" 2>/dev/null || true
      ;;
  esac
done

# 3. Remove gstack
rm -rf ~/.claude/skills/gstack

# 4. Remove global state
rm -rf ~/.gstack

# 5. Remove integrations (skip any you never installed)
rm -rf ~/.codex/skills/gstack* 2>/dev/null
rm -rf ~/.factory/skills/gstack* 2>/dev/null
rm -rf ~/.kiro/skills/gstack* 2>/dev/null
rm -rf ~/.openclaw/skills/gstack* 2>/dev/null

# 6. Remove temp files
rm -f /tmp/gstack-* 2>/dev/null

# 7. Per-project cleanup (run from each project root)
rm -rf .gstack .gstack-worktrees .claude/skills/gstack 2>/dev/null
rm -rf .agents/skills/gstack* .factory/skills/gstack* 2>/dev/null

Clean up CLAUDE.md

The uninstall script does not edit CLAUDE.md. In each project where gstack was added, remove the ## gstack and ## Skill routing sections.

Playwright

~/Library/Caches/ms-playwright/ (macOS) is left in place because other tools may share it. Remove it if nothing else needs it.


Free, MIT licensed, open source. No premium tier, no waitlist.

I open sourced how I build software. You can fork it and make it your own.

We're hiring. Want to ship real products at AI-coding speed and help harden gstack? Come work at YC — ycombinator.com/software Extremely competitive salary and equity. San Francisco, Dogpatch District.

GBrain — persistent knowledge for your coding agent

GBrain is a persistent knowledge base for AI agents — think of it as the memory your agent actually keeps between sessions. GStack gives you a one-command path from zero to "it's running, my agent can call it."

/setup-gbrain

Four paths, pick one:

  • Supabase, existing URL — your cloud agent already provisioned a brain; paste the Session Pooler URL, now this laptop uses the same data.
  • Supabase, auto-provision — paste a Supabase Personal Access Token; the skill creates a new project, polls to healthy, fetches the pooler URL, hands it to gbrain init. ~90 seconds end-to-end.
  • PGLite local — zero accounts, zero network, ~30 seconds. Isolated brain on this Mac only. Great for try-first; migrate to Supabase later with /setup-gbrain --switch.
  • Remote gbrain MCP — your brain runs on another machine (Tailscale, ngrok, internal LAN) or a teammate's server; paste an MCP URL and bearer token. Optionally pair with a local PGLite for symbol-aware code search in split-engine mode. Best for cross-machine memory without standing up a local DB.

After init, the skill offers to register gbrain as an MCP server for Claude Code (claude mcp add gbrain -- gbrain serve) so gbrain search, gbrain put, etc. show up as first-class typed tools — not bash shell-outs.

Keeping the brain current. Run /sync-gbrain from any repo to re-index its code into gbrain (incremental by default, --full for a full reindex, --dry-run to preview). The skill registers the cwd as a federated source via gbrain sources add, runs gbrain sync --strategy code, and writes a ## GBrain Search Guidance block to your project's CLAUDE.md so the agent prefers gbrain search/code-def/code-refs over Grep. The block is removed automatically if the capability check fails — no stale guidance pointing at tools that aren't installed.

Per-remote trust policy. Each repo on your machine gets one of three tiers:

  • read-write — agent can search the brain AND write new pages back from this repo
  • read-only — agent can search but never writes (best for multi-client consultants: search the shared brain, don't contaminate it with Client A's work while in Client B's repo)
  • deny — no gbrain interaction at all

The skill asks once per repo. The decision is sticky across worktrees and branches of the same remote.

GStack memory sync (different feature, same private-repo infra). Optionally pushes your gstack state (learnings, CEO plans, design docs, retros, developer profile) to a private git repo so your memory follows you across machines, with a one-time privacy prompt (everything allowlisted / artifacts only / off) and a defense-in-depth secret scanner that blocks AWS keys, tokens, PEM blocks, and JWTs before they leave your machine.

gstack-brain-init

Running gstack in Conductor? Conductor explicitly strips ANTHROPIC_API_KEY and OPENAI_API_KEY from every workspace's process env, so paid evals and gbrain embeddings won't work out of the box. Set GSTACK_ANTHROPIC_API_KEY and GSTACK_OPENAI_API_KEY in Conductor's workspace env config instead — gstack's TS entry points promote them to canonical names at runtime. Full details and the contributor checklist for adding the import to new entry points: Conductor + GSTACK_* env vars.

Full monty — every scenario, every flag, every bin helper, every troubleshooting step: USING_GBRAIN_WITH_GSTACK.md

Other references: docs/gbrain-sync.md (sync-specific guide) • docs/gbrain-sync-errors.md (error index)

Docs

Doc What it covers
Skill Deep Dives Philosophy, examples, and workflow for every skill (includes Greptile integration)
Diagrams & Document Formats Mermaid/excalidraw fences in PDFs, image sizing and safety defaults, --to html|docx, /diagram triplets
Builder Ethos Builder philosophy: Boil the Ocean, Search Before Building, three layers of knowledge
Using GBrain with GStack Every path, flag, bin helper, and troubleshooting step for /setup-gbrain
GBrain Sync Cross-machine memory setup, privacy modes, troubleshooting
Architecture Design decisions and system internals
Browser Reference Full command reference for /browse
Contributing Dev setup, testing, contributor mode, and dev mode
Changelog What's new in every version

Privacy & Telemetry

gstack includes opt-in usage telemetry to help improve the project. Here's exactly what happens:

  • Default is off. Nothing is sent anywhere unless you explicitly say yes.
  • On first run, gstack asks if you want to share anonymous usage data. You can say no.
  • What's sent (if you opt in): skill name, duration, success/fail, gstack version, OS. That's it.
  • What's never sent: code, file paths, repo names, branch names, prompts, or any user-generated content.
  • Change anytime: gstack-config set telemetry off disables everything instantly.

Data is stored in Supabase (open source Firebase alternative). The schema is in supabase/migrations/ — you can verify exactly what's collected. The Supabase publishable key in the repo is a public key (like a Firebase API key) — row-level security policies deny all direct access. Telemetry flows through validated edge functions that enforce schema checks, event type allowlists, and field length limits.

Local analytics are always available. Run gstack-analytics to see your personal usage dashboard from the local JSONL file — no remote data needed.

Troubleshooting

Skill not showing up? cd ~/.claude/skills/gstack && ./setup

/browse fails? cd ~/.claude/skills/gstack && bun install && bun run build

Stale install? Run /gstack-upgrade — or set auto_upgrade: true in ~/.gstack/config.yaml

Want shorter commands? cd ~/.claude/skills/gstack && ./setup --no-prefix — switches from /gstack-qa to /qa. Your choice is remembered for future upgrades.

Want namespaced commands? cd ~/.claude/skills/gstack && ./setup --prefix — switches from /qa to /gstack-qa. Useful if you run other skill packs alongside gstack.

Codex says "Skipped loading skill(s) due to invalid SKILL.md"? Your Codex skill descriptions are stale. Fix: cd ~/.codex/skills/gstack && git pull && ./setup --host codex — or for repo-local installs: cd "$(readlink -f .agents/skills/gstack)" && git pull && ./setup --host codex

Windows users: gstack works on Windows 11 via Git Bash or WSL. Node.js is required in addition to Bun — Bun has a known bug with Playwright's pipe transport on Windows (bun#4253). The browse server automatically falls back to Node.js. Make sure both bun and node are on your PATH.

On Windows without Developer Mode (MSYS2 / Git Bash), setup falls back to file copies instead of symlinks because ln -snf produces frozen copies that don't refresh on git pull. Re-run cd ~/.claude/skills/gstack && ./setup after every git pull so your skill files match the repo. setup prints a one-line note reminding you. Unix and WSL keep symlinks and don't need the re-run.

Claude says it can't see the skills? Make sure your project's CLAUDE.md has a gstack section. Add this:

## gstack
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
/design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy,
/canary, /benchmark, /browse, /open-gstack-browser, /qa, /qa-only, /design-review,
/setup-browser-cookies, /setup-deploy, /setup-gbrain, /sync-gbrain, /retro, /investigate,
/document-release, /document-generate, /codex, /cso, /autoplan, /pair-agent, /careful, /freeze,
/guard, /unfreeze, /gstack-upgrade, /learn.

License

MIT. Free forever. Go build something.