refactor(plan-devex-review): carve review body into on-demand section

Sixth Phase B carve. Moves the 8 DX passes, required outputs, and review report
— everything after the Step 0 DX investigation — into sections/review-sections.md
behind a STOP-Read. All of Step 0 (persona, empathy, benchmark, journey trace,
roleplay) + the rating method + EXIT_PLAN_MODE_GATE stay always-loaded.

Measured: skeleton 110,621 -> 69,658 B (-37%). Union preserved. Atomic with
tests + all-host regen: added to SECTIONS_EXTRACTED, gen-skill-docs reads the
union. Layer 0 green. (No parity invariant entry for plan-devex-review.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan 2026-06-01 23:07:42 -07:00
parent d4938b4288
commit 4cbcc4f64f
No known key found for this signature in database
GPG Key ID: C1F69E85C74EFE1D
6 changed files with 1261 additions and 1220 deletions

View File

@ -1029,6 +1029,18 @@ rm -f /tmp/.gstack-brain-context-$$.md 2>/dev/null || true
`gstack/`, `concepts/` only). Personal/family/therapy content never leaks here.
---
## Section index — Read each section when its situation applies
This skill is a decision-tree skeleton. The steps below point to on-demand
sections. Read a section in full before doing its step; do not work from memory.
| When | Read this section |
|------|-------------------|
| running the 8 DX passes, required outputs, and review report (only after Step 0 investigation is complete) | `sections/review-sections.md` |
---
## Step 0: DX Investigation (before scoring)
The core principle: **gather evidence and force decisions BEFORE scoring, not during
@ -1338,839 +1350,12 @@ Pattern:
- **DX TRIAGE:** Only flag gaps that would block adoption (score below 5). Skip gaps
that are nice-to-have (score 5-7).
## Review Sections (8 passes, after Step 0 is complete)
> **STOP.** Before running the 8 DX passes, required outputs, and review report (only after Step 0 investigation is complete), Read `~/.claude/skills/gstack/plan-devex-review/sections/review-sections.md` and execute it
> in full. Do not work from memory — that section is the source of truth for this step.
**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
## Section self-check (before you finish)
**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.
## Prior Learnings
Search for relevant learnings from previous sessions:
```bash
_CROSS_PROJ=$(~/.claude/skills/gstack/bin/gstack-config get cross_project_learnings 2>/dev/null || echo "unset")
echo "CROSS_PROJECT: $_CROSS_PROJ"
if [ "$_CROSS_PROJ" = "true" ]; then
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --cross-project 2>/dev/null || true
else
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 2>/dev/null || true
fi
```
If `CROSS_PROJECT` is `unset` (first time): Use AskUserQuestion:
> gstack can search learnings from your other projects on this machine to find
> patterns that might apply here. This stays local (no data leaves your machine).
> Recommended for solo developers. Skip if you work on multiple client codebases
> where cross-contamination would be a concern.
Options:
- A) Enable cross-project learnings (recommended)
- B) Keep learnings project-scoped only
If A: run `~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings true`
If B: run `~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings false`
Then re-run the search with the appropriate flag.
If learnings are found, incorporate them into your analysis. When a review finding
matches a past learning, display:
**"Prior learning applied: [key] (confidence N/10, from [date])"**
This makes the compounding visible. The user should see that gstack is getting
smarter on their codebase over time.
### DX Trend Check
Before starting review passes, check for prior DX reviews on this project:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null | grep plan-devex-review || echo "NO_PRIOR_DX_REVIEWS"
```
If prior reviews exist, display the trend:
```
DX TREND (prior reviews):
Dimension | Prior Score | Notes
Getting Started | 4/10 | from 2026-03-15
...
```
### Pass 1: Getting Started Experience (Zero Friction)
Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?
**Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
magical moment from 0D (delivery vehicle), and any Install/Hello World friction
points from 0F.
Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Installation**: One command? One click? No prerequisites?
- **First run**: Does the first command produce visible, meaningful output?
- **Sandbox/Playground**: Can developers try before installing?
- **Free tier**: No credit card, no sales call, no company email?
- **Quick start guide**: Copy-paste complete? Shows real output?
- **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
- **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?
FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
expected output, and time budget per step. Target: 3 steps or fewer, under the
time chosen in 0C.
Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
in one terminal session without leaving the terminal?
**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.
### Pass 2: API/CLI/SDK Design (Usable + Useful)
Rate 0-10: Is the interface intuitive, consistent, and complete?
**Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
A YC founder expects `tool.do(thing)`. A platform engineer expects
`tool.configure(options).execute(thing)`.
Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Naming**: Guessable without docs? Consistent grammar?
- **Defaults**: Every parameter has a sensible default? Simplest call gives useful result?
- **Consistency**: Same patterns across the entire API surface?
- **Completeness**: 100% coverage or do devs drop to raw HTTP for edge cases?
- **Discoverability**: Can devs explore from CLI/playground without docs?
- **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
- **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
- **Persona fit**: Does the interface match how [persona] thinks about the problem?
Good API design test: Can a [persona] use this API correctly after seeing one example?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 3: Error Messages & Debugging (Fight Uncertainty)
Rate 0-10: When something goes wrong, does the developer know what happened, why,
and how to fix it?
**Evidence recall:** Reference any error-related friction points from 0F and confusion
points from 0G.
Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
the three-tier system from the Hall of Fame:
- **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
- **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
- **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
For each error path, show what the developer currently sees vs. what they should see.
Also evaluate:
- **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
- **Debug mode**: Verbose output available?
- **Stack traces**: Useful or internal framework noise?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 4: Documentation & Learning (Findable + Learn by Doing)
Rate 0-10: Can a developer find what they need and learn by doing?
**Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
style? A YC founder needs copy-paste examples front and center. A platform engineer
needs architecture docs and API reference.
Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Information architecture**: Find what they need in under 2 minutes?
- **Progressive disclosure**: Beginners see simple, experts find advanced?
- **Code examples**: Copy-paste complete? Work as-is? Real context?
- **Interactive elements**: Playgrounds, sandboxes, "try it" buttons?
- **Versioning**: Docs match the version dev is using?
- **Tutorials vs references**: Both exist?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 5: Upgrade & Migration Path (Credible)
Rate 0-10: Can developers upgrade without fear?
Load reference: Read the "## Pass 5" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Backward compatibility**: What breaks? Blast radius limited?
- **Deprecation warnings**: Advance notice? Actionable? ("use newMethod() instead")
- **Migration guides**: Step-by-step for every breaking change?
- **Codemods**: Automated migration scripts?
- **Versioning strategy**: Semantic versioning? Clear policy?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 6: Developer Environment & Tooling (Valuable + Accessible)
Rate 0-10: Does this integrate into developers' existing workflows?
**Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
environment?
Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Editor integration**: Language server? Autocomplete? Inline docs?
- **CI/CD**: Works in GitHub Actions, GitLab CI? Non-interactive mode?
- **TypeScript support**: Types included? Good IntelliSense?
- **Testing support**: Easy to mock? Test utilities?
- **Local development**: Hot reload? Watch mode? Fast feedback?
- **Cross-platform**: Mac, Linux, Windows? Docker? ARM/x86?
- **Local env reproducibility**: Works across OS, package managers, containers, proxies?
- **Observability/testability**: Dry-run mode? Verbose output? Sample apps? Fixtures?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 7: Community & Ecosystem (Findable + Desirable)
Rate 0-10: Is there a community, and does the plan invest in ecosystem health?
Load reference: Read the "## Pass 7" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Open source**: Code open? Permissive license?
- **Community channels**: Where do devs ask questions? Someone answering?
- **Examples**: Real-world, runnable? Not just hello world?
- **Plugin/extension ecosystem**: Can devs extend it?
- **Contributing guide**: Process clear?
- **Pricing transparency**: No surprise bills?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 8: DX Measurement & Feedback Loops (Implement + Refine)
Rate 0-10: Does the plan include ways to measure and improve DX over time?
Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
- **Journey analytics**: Where do devs drop off?
- **Feedback mechanisms**: Bug reports? NPS? Feedback button?
- **Friction audits**: Periodic reviews planned?
- **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Appendix: Claude Code Skill DX Checklist
**Conditional: only run when product type includes "Claude Code skill".**
This is NOT a scored pass. It's a checklist of proven patterns from gstack's own DX.
Load reference: Read the "## Claude Code Skill DX Checklist" section from
`~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Check each item. For any unchecked item, explain what's missing and suggest the fix.
**STOP.** AskUserQuestion for any item that requires a design decision.
## Outside Voice — Independent Plan Challenge (optional, recommended)
After all review sections are complete, offer an independent second opinion from a
different AI system. Two models agreeing on a plan is stronger signal than one model's
thorough review.
**Check tool availability:**
```bash
command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
```
Use AskUserQuestion:
> "All review sections are complete. Want an outside voice? A different AI system can
> give a brutally honest, independent challenge of this plan — logical gaps, feasibility
> risks, and blind spots that are hard to catch from inside the review. Takes about 2
> minutes."
>
> RECOMMENDATION: Choose A — an independent second opinion catches structural blind
> spots. Two different AI models agreeing on a plan is stronger signal than one model's
> thorough review. Completeness: A=9/10, B=7/10.
Options:
- A) Get the outside voice (recommended)
- B) Skip — proceed to outputs
**If B:** Print "Skipping outside voice." and continue to the next section.
**If A:** Construct the plan review prompt. Read the plan file being reviewed (the file
the user pointed this review at, or the branch diff scope). If a CEO plan document
was written in Step 0D-POST, read that too — it contains the scope decisions and vision.
Construct this prompt (substitute the actual plan content — if plan content exceeds 30KB,
truncate to the first 30KB and note "Plan truncated for size"). **Always start with the
filesystem boundary instruction:**
"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nYou are a brutally honest technical reviewer examining a development plan that has
already been through a multi-section review. Your job is NOT to repeat that review.
Instead, find what it missed. Look for: logical gaps and unstated assumptions that
survived the review scrutiny, overcomplexity (is there a fundamentally simpler
approach the review was too deep in the weeds to see?), feasibility risks the review
took for granted, missing dependencies or sequencing issues, and strategic
miscalibration (is this the right thing to build at all?). Be direct. Be terse. No
compliments. Just the problems.
THE PLAN:
<plan content>"
**If CODEX_AVAILABLE:**
```bash
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_PV"
```
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
```bash
cat "$TMPERR_PV"
```
Present the full output verbatim:
```
CODEX SAYS (plan review — outside voice):
════════════════════════════════════════════════════════════
<full codex output, verbatim do not truncate or summarize>
════════════════════════════════════════════════════════════
```
**Error handling:** All errors are non-blocking — the outside voice is informational.
- Auth failure (stderr contains "auth", "login", "unauthorized"): "Codex auth failed. Run \`codex login\` to authenticate."
- Timeout: "Codex timed out after 5 minutes."
- Empty response: "Codex returned no response."
On any Codex error, fall back to the Claude adversarial subagent.
**If CODEX_NOT_AVAILABLE (or Codex errored):**
Dispatch via the Agent tool. The subagent has fresh context — genuine independence.
Subagent prompt: same plan review prompt as above.
Present findings under an `OUTSIDE VOICE (Claude subagent):` header.
If the subagent fails or times out: "Outside voice unavailable. Continuing to outputs."
**Cross-model tension:**
After presenting the outside voice findings, note any points where the outside voice
disagrees with the review findings from earlier sections. Flag these as:
```
CROSS-MODEL TENSION:
[Topic]: Review said X. Outside voice says Y. [Present both perspectives neutrally.
State what context you might be missing that would change the answer.]
```
**User Sovereignty:** Do NOT auto-incorporate outside voice recommendations into the plan.
Present each tension point to the user. The user decides. Cross-model agreement is a
strong signal — present it as such — but it is NOT permission to act. You may state
which argument you find more compelling, but you MUST NOT apply the change without
explicit user approval.
For each substantive tension point, use AskUserQuestion:
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
> argues [Y]. [One sentence on what context you might be missing.]"
>
> RECOMMENDATION: Choose [A or B] because [one-line reason explaining which argument
> is more compelling and why]. Completeness: A=X/10, B=Y/10.
Options:
- A) Accept the outside voice's recommendation (I'll apply this change)
- B) Keep the current approach (reject the outside voice)
- C) Investigate further before deciding
- D) Add to TODOS.md for later
Wait for the user's response. Do NOT default to accepting because you agree with the
outside voice. If the user chooses B, the current approach stands — do not re-argue.
If no tension points exist, note: "No cross-model tension — both reviewers agree."
**Persist the result:**
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-plan-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}'
```
Substitute: STATUS = "clean" if no findings, "issues_found" if findings exist.
SOURCE = "codex" if Codex ran, "claude" if subagent ran.
**Cleanup:** Run `rm -f "$TMPERR_PV"` after processing (if Codex was used).
---
When constructing the outside voice prompt, include the Developer Persona from Step 0A
and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
in the context of who is using it and what they're competing against.
## CRITICAL RULE — How to ask questions
Follow the AskUserQuestion format from the Preamble above. Additional rules for
DX reviews:
* **One issue = one AskUserQuestion call.** Never combine multiple issues.
* **Ground every question in evidence.** Reference the persona, competitive benchmark,
empathy narrative, or friction trace. Never ask a question in the abstract.
* **Frame pain from the persona's perspective.** Not "developers would be frustrated"
but "[persona from 0A] would hit this at minute [N] of their getting-started flow
and [specific consequence: abandon, file an issue, hack a workaround]."
* Present 2-3 options. For each: effort to fix, impact on developer adoption.
* **Map to DX First Principles above.** One sentence connecting your recommendation
to a specific principle (e.g., "This violates 'zero friction at T0' because
[persona] needs 3 extra config steps before their first API call").
* **Zero findings:** if a section has zero findings, state "No issues, moving on"
and proceed. Otherwise, use AskUserQuestion for each gap — a gap with an
"obvious fix" is still a gap and still needs user approval before any change
lands in the plan.
* Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.
## Required Outputs
### Developer Persona Card
The persona card from Step 0A. This goes at the top of the plan's DX section.
### Developer Empathy Narrative
The first-person narrative from Step 0B, updated with user corrections.
### Competitive DX Benchmark
The benchmark table from Step 0C, updated with the product's post-review scores.
### Magical Moment Specification
The chosen delivery vehicle from Step 0D with implementation requirements.
### Developer Journey Map
The journey map from Step 0F, updated with all friction point resolutions.
### First-Time Developer Confusion Report
The roleplay report from Step 0G, annotated with which items were addressed.
### "NOT in scope" section
DX improvements considered and explicitly deferred, with one-line rationale each.
### "What already exists" section
Existing docs, examples, error handling, and DX patterns that the plan should reuse.
### TODOS.md updates
After all review passes are complete, present each potential TODO as its own individual
AskUserQuestion. Never batch. For DX debt: missing error messages, unspecified upgrade
paths, documentation gaps, missing SDK languages. Each TODO gets:
* **What:** One-line description
* **Why:** The concrete developer pain it causes
* **Pros:** What you gain (adoption, retention, satisfaction)
* **Cons:** Cost, complexity, or risks
* **Context:** Enough detail for someone to pick this up in 3 months
* **Depends on / blocked by:** Prerequisites
Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
### DX Scorecard
```
+====================================================================+
| DX PLAN REVIEW — SCORECARD |
+====================================================================+
| Dimension | Score | Prior | Trend |
|----------------------|--------|--------|--------|
| Getting Started | __/10 | __/10 | __ ↑↓ |
| API/CLI/SDK | __/10 | __/10 | __ ↑↓ |
| Error Messages | __/10 | __/10 | __ ↑↓ |
| Documentation | __/10 | __/10 | __ ↑↓ |
| Upgrade Path | __/10 | __/10 | __ ↑↓ |
| Dev Environment | __/10 | __/10 | __ ↑↓ |
| Community | __/10 | __/10 | __ ↑↓ |
| DX Measurement | __/10 | __/10 | __ ↑↓ |
+--------------------------------------------------------------------+
| TTHW | __ min | __ min | __ ↑↓ |
| Competitive Rank | [Champion/Competitive/Needs Work/Red Flag] |
| Magical Moment | [designed/missing] via [delivery vehicle] |
| Product Type | [type] |
| Mode | [EXPANSION/POLISH/TRIAGE] |
| Overall DX | __/10 | __/10 | __ ↑↓ |
+====================================================================+
| DX PRINCIPLE COVERAGE |
| Zero Friction | [covered/gap] |
| Learn by Doing | [covered/gap] |
| Fight Uncertainty | [covered/gap] |
| Opinionated + Escape Hatches | [covered/gap] |
| Code in Context | [covered/gap] |
| Magical Moments | [covered/gap] |
+====================================================================+
```
If all passes 8+: "DX plan is solid. Developers will have a good experience."
If any below 6: Flag as critical DX debt with specific impact on adoption.
If TTHW > 10 min: Flag as blocking issue.
### DX Implementation Checklist
```
DX IMPLEMENTATION CHECKLIST
============================
[ ] Time to hello world < [target from 0C]
[ ] Installation is one command
[ ] First run produces meaningful output
[ ] Magical moment delivered via [vehicle from 0D]
[ ] Every error message has: problem + cause + fix + docs link
[ ] API/CLI naming is guessable without docs
[ ] Every parameter has a sensible default
[ ] Docs have copy-paste examples that actually work
[ ] Examples show real use cases, not just hello world
[ ] Upgrade path documented with migration guide
[ ] Breaking changes have deprecation warnings + codemods
[ ] TypeScript types included (if applicable)
[ ] Works in CI/CD without special configuration
[ ] Free tier available, no credit card required
[ ] Changelog exists and is maintained
[ ] Search works in documentation
[ ] Community channel exists and is monitored
```
## Implementation Tasks
Before closing this review, synthesize the findings above into a flat list of
build-actionable tasks. Each task derives from a specific finding — no padding.
Emit the markdown section AND write a JSONL artifact that `/autoplan` can
aggregate across phases.
### Markdown section (always emit)
```markdown
## Implementation Tasks
Synthesized from this review's findings. Each task derives from a specific
finding above. Run with Claude Code or Codex; checkbox as you ship.
- [ ] **T1 (P1, human: ~2h / CC: ~15min)**<component><imperative title>
- Surfaced by: <section name><specific finding text or line reference>
- Files: <paths to touch>
- Verify: <test command or manual check>
- [ ] **T2 (P2, human: ~30min / CC: ~5min)** — ...
```
Rules:
- P1 blocks ship; P2 should land same branch; P3 is a follow-up TODO.
- If a finding produced no actionable task, do not invent one.
- If a section had zero findings, emit `_No new tasks from <section>._`
- Effort uses the AI-compression table from CLAUDE.md.
### JSONL artifact (always write, even if zero tasks)
`/autoplan` reads this file to aggregate across phases. Build each line with
`jq -nc` so titles and source findings containing quotes, newlines, or
backslashes serialize cleanly — never use hand-rolled `echo` / `printf`.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
TASKS_DIR="${HOME}/.gstack/projects/${SLUG:-unknown}"
mkdir -p "$TASKS_DIR"
TASKS_FILE="$TASKS_DIR/tasks-devex-review-$(date +%Y%m%d-%H%M%S).jsonl"
COMMIT=$(git rev-parse HEAD 2>/dev/null || echo unknown)
BRANCH=$(git branch --show-current 2>/dev/null || echo unknown)
RUN_ID="$(date -u +%Y%m%dT%H%M%SZ)-$$"
# Repeat ONE jq invocation per task identified during this review.
# Substitute the placeholders inline with shell variables you set per task:
# TASK_ID (T1, T2, ...), PRIORITY (P1/P2/P3), COMPONENT, TITLE,
# SOURCE_FINDING, EFFORT_HUMAN, EFFORT_CC, FILES_JSON (a JSON array literal
# like '["browse/src/sanitize.ts","browse/src/server.ts"]').
jq -nc \
--arg phase 'devex-review' \
--arg run_id "$RUN_ID" \
--arg branch "$BRANCH" \
--arg commit "$COMMIT" \
--arg id "$TASK_ID" \
--arg priority "$PRIORITY" \
--arg component "$COMPONENT" \
--arg effort_human "$EFFORT_HUMAN" \
--arg effort_cc "$EFFORT_CC" \
--arg title "$TITLE" \
--arg source_finding "$SOURCE_FINDING" \
--argjson files "$FILES_JSON" \
'{phase:$phase, run_id:$run_id, branch:$branch, commit:$commit, id:$id, priority:$priority, component:$component, files:$files, effort_human:$effort_human, effort_cc:$effort_cc, title:$title, source_finding:$source_finding}' \
>> "$TASKS_FILE"
```
If `jq` is not installed, fall back to skipping the JSONL write and warn
the user to install jq for autoplan aggregation. Never hand-roll JSONL.
If zero tasks were identified in this review, still touch the JSONL file
(`: > "$TASKS_FILE"`) so the aggregator sees that the phase produced output
this run (an empty file means "ran, no findings" — distinct from "didn't run").
### Unresolved Decisions
If any AskUserQuestion goes unanswered, note here. Never silently default.
## Review Readiness Dashboard
After completing the review, read the review log and config to display the dashboard.
```bash
~/.claude/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
```
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Outside Voice | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
```
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days from either \`review\` or \`plan-eng-review\` with status "clean" (or \`skip_eng_review\` is \`true\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO, Design, and Codex reviews are shown for context but never block shipping
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
**Staleness detection:** After displaying the dashboard, check if any existing reviews may be stale:
- Parse the \`---HEAD---\` section from the bash output to get the current HEAD commit hash
- For each review entry that has a \`commit\` field: compare it against the current HEAD. If different, count elapsed commits: \`git rev-list --count STORED_COMMIT..HEAD\`. Display: "Note: {skill} review from {date} may be stale — {N} commits since review"
- For entries without a \`commit\` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
- If all reviews match the current HEAD, do not display any staleness notes
## Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the
**plan file** itself so review status is visible to anyone reading the plan.
### Detect the plan file
1. Check if there is an active plan file in this conversation (the host provides plan file
paths in system messages — look for plan file references in the conversation context).
2. If not found, skip this section silently — not every review runs in plan mode.
### Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above.
Parse each JSONL entry. Each skill logs different fields:
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **plan-devex-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`product_type\`, \`tthw_current\`, \`tthw_target\`, \`mode\`, \`persona\`, \`competitive_tier\`, \`unresolved\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, TTHW: {tthw_current} → {tthw_target}"
- **devex-review**: \`status\`, \`overall_score\`, \`product_type\`, \`tthw_measured\`, \`dimensions_tested\`, \`dimensions_inferred\`, \`boomerang\`, \`commit\`
→ Findings: "score: {overall_score}/10, TTHW: {tthw_measured}, {dimensions_tested} tested/{dimensions_inferred} inferred"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries.
For the review you just completed, you may use richer details from your own Completion
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
\`\`\`markdown
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
| DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} |
\`\`\`
Below the table, add these lines (omit any that are empty/not applicable):
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
### Write to the plan file
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
file you are allowed to edit in plan mode. The plan file review report is part of the
plan's living status.
The report must always be the LAST section of the plan file — never mid-file.
Use a single delete-then-append flow:
1. Read the plan file (Read tool) to see its full current content. Search the read
output for a \`## GSTACK REVIEW REPORT\` heading anywhere in the file.
2. If found, use the Edit tool to DELETE the entire existing section. Match from
\`## GSTACK REVIEW REPORT\` through either the next \`## \` heading or end of
file, whichever comes first. Replace with the empty string. This applies
regardless of where the section currently lives — mid-file deletion is
intentional, not a special case. If the Edit fails (e.g., concurrent edit
changed the content), re-read the plan file and retry once.
3. After the delete (or skipped, if no section existed), append the new
\`## GSTACK REVIEW REPORT\` section at the END of the file. Use the Edit
tool to match the file's current last paragraph and add the section after it,
or use Write to re-emit the whole file with the section at the end.
4. Verify with the Read tool that \`## GSTACK REVIEW REPORT\` is the last
\`## \` heading in the file before continuing. If it isn't, repeat steps
2-3 once.
Do NOT replace the section in place. The "replace mid-file" path is what allowed
prior versions to leave the report mid-file when an older report already lived
there — the user then sees a plan whose review report is not at the bottom and
(correctly) rejects it.
## Capture Learnings
If you discovered a non-obvious pattern, pitfall, or architectural insight during
this session, log it for future sessions:
```bash
~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"plan-devex-review","type":"TYPE","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":N,"source":"SOURCE","files":["path/to/relevant/file"]}'
```
**Types:** `pattern` (reusable approach), `pitfall` (what NOT to do), `preference`
(user stated), `architecture` (structural decision), `tool` (library/framework insight),
`operational` (project environment/CLI/workflow knowledge).
**Sources:** `observed` (you found this in the code), `user-stated` (user told you),
`inferred` (AI deduction), `cross-model` (both Claude and Codex agree).
**Confidence:** 1-10. Be honest. An observed pattern you verified in the code is 8-9.
An inference you're not sure about is 4-5. A user preference they explicitly stated is 10.
**files:** Include the specific file paths this learning references. This enables
staleness detection: if those files are later deleted, the learning can be flagged.
**Only log genuine discoveries.** Don't log obvious things. Don't log things the user
already knows. A good test: would this insight save time in a future session? If yes, log it.
## Brain Calibration Write-Back (Phase 2 / gated)
When the skill makes a typed prediction worth tracking (scope decision,
TTHW target, architectural bet, wedge commitment), it MAY write a
`kind=bet` take to the brain so a calibration profile builds over time.
**Gated on two things:**
1. Brain trust policy for the active endpoint is `personal` (check via
`~/.claude/skills/gstack/bin/gstack-config get brain_trust_policy@<endpoint-hash>`).
Shared brains skip write-back to avoid polluting team calibration.
2. Feature flag `BRAIN_CALIBRATION_WRITEBACK` is set (today: false; flips
to true when upstream gbrain v0.42+ ships `takes_add` MCP op).
When both gates pass, the write-back path uses `mcp__gbrain__takes_add`
to record a take with weight 0.6 (per SKILL_CALIBRATION_WEIGHTS).
If the MCP op is unavailable, fall back to `mcp__gbrain__put_page` with
a gstack:takes fence block (documented but uglier path).
Mandatory take frontmatter shape:
```yaml
kind: bet
holder: <user identity from whoami>
claim: <one-line prediction the skill is making>
weight: 0.6
since_date: <today's date>
expected_resolution: <date in 1-3 months depending on skill>
source_skill: plan-devex-review
```
After write, invalidate the affected digests so the next preflight reflects
the new state:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-brain-cache invalidate developer-persona --project "$SLUG" 2>/dev/null || true
```
## Brain Cache Background Refresh
After the skill's work completes (and telemetry has logged), kick a
background refresh of any cache digest that's getting close to its TTL.
This is non-blocking — the user doesn't wait. Next invocation benefits
from the warm cache.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
(~/.claude/skills/gstack/bin/gstack-brain-cache refresh --project "$SLUG" 2>/dev/null &) || true
```
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, recommend next reviews:
**Recommend /plan-eng-review if eng review is not skipped globally** — DX issues often
have architectural implications. If this DX review found API design problems, error
handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
**Suggest /plan-design-review if user-facing UI exists** — DX review focuses on
developer-facing surfaces; design review covers end-user-facing UI.
**Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
be [target from 0C]. Did reality match? Run /devex-review on the live product to find
out. This is where the competitive benchmark pays off: you have a concrete target to
measure against.
Use AskUserQuestion with applicable options:
- **A)** Run /plan-eng-review next (required gate)
- **B)** Run /plan-design-review (only if UI scope detected)
- **C)** Ready to implement, run /devex-review after shipping
- **D)** Skip, I'll handle next steps manually
## Mode Quick Reference
```
| DX EXPANSION | DX POLISH | DX TRIAGE
Scope | Push UP (opt-in) | Maintain | Critical only
Posture | Enthusiastic | Rigorous | Surgical
Competitive | Full benchmark | Full benchmark | Skip
Magical | Full design | Verify exists | Skip
Journey | All stages + | All stages | Install + Hello
| best-in-class | | World only
Passes | All 8, expanded | All 8, standard | Pass 1 + 3 only
Outside voice| Recommended | Recommended | Skip
```
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
* One sentence max per option.
* After each pass, pause and wait for feedback before moving on.
* Rate before and after each pass for scannability.
Confirm you Read the review section the Section index named, and executed all 8 DX passes, the required outputs, and the review report in full. If you produced findings or the review report from memory without Reading `sections/review-sections.md`, stop and Read it now.
## EXIT PLAN MODE GATE (BLOCKING)

View File

@ -138,6 +138,11 @@ Note the product type; it influences which persona options are offered in Step 0
{{BRAIN_PREFLIGHT}}
---
{{SECTION_INDEX:plan-devex-review}}
---
## Step 0: DX Investigation (before scoring)
The core principle: **gather evidence and force decisions BEFORE scoring, not during
@ -447,395 +452,10 @@ Pattern:
- **DX TRIAGE:** Only flag gaps that would block adoption (score below 5). Skip gaps
that are nice-to-have (score 5-7).
## Review Sections (8 passes, after Step 0 is complete)
{{SECTION:review-sections}}
**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
## Section self-check (before you finish)
{{ANTI_SHORTCUT_CLAUSE}}
{{LEARNINGS_SEARCH}}
### DX Trend Check
Before starting review passes, check for prior DX reviews on this project:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null | grep plan-devex-review || echo "NO_PRIOR_DX_REVIEWS"
```
If prior reviews exist, display the trend:
```
DX TREND (prior reviews):
Dimension | Prior Score | Notes
Getting Started | 4/10 | from 2026-03-15
...
```
### Pass 1: Getting Started Experience (Zero Friction)
Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?
**Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
magical moment from 0D (delivery vehicle), and any Install/Hello World friction
points from 0F.
Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Installation**: One command? One click? No prerequisites?
- **First run**: Does the first command produce visible, meaningful output?
- **Sandbox/Playground**: Can developers try before installing?
- **Free tier**: No credit card, no sales call, no company email?
- **Quick start guide**: Copy-paste complete? Shows real output?
- **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
- **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?
FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
expected output, and time budget per step. Target: 3 steps or fewer, under the
time chosen in 0C.
Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
in one terminal session without leaving the terminal?
**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.
### Pass 2: API/CLI/SDK Design (Usable + Useful)
Rate 0-10: Is the interface intuitive, consistent, and complete?
**Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
A YC founder expects `tool.do(thing)`. A platform engineer expects
`tool.configure(options).execute(thing)`.
Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Naming**: Guessable without docs? Consistent grammar?
- **Defaults**: Every parameter has a sensible default? Simplest call gives useful result?
- **Consistency**: Same patterns across the entire API surface?
- **Completeness**: 100% coverage or do devs drop to raw HTTP for edge cases?
- **Discoverability**: Can devs explore from CLI/playground without docs?
- **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
- **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
- **Persona fit**: Does the interface match how [persona] thinks about the problem?
Good API design test: Can a [persona] use this API correctly after seeing one example?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 3: Error Messages & Debugging (Fight Uncertainty)
Rate 0-10: When something goes wrong, does the developer know what happened, why,
and how to fix it?
**Evidence recall:** Reference any error-related friction points from 0F and confusion
points from 0G.
Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
the three-tier system from the Hall of Fame:
- **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
- **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
- **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
For each error path, show what the developer currently sees vs. what they should see.
Also evaluate:
- **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
- **Debug mode**: Verbose output available?
- **Stack traces**: Useful or internal framework noise?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 4: Documentation & Learning (Findable + Learn by Doing)
Rate 0-10: Can a developer find what they need and learn by doing?
**Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
style? A YC founder needs copy-paste examples front and center. A platform engineer
needs architecture docs and API reference.
Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Information architecture**: Find what they need in under 2 minutes?
- **Progressive disclosure**: Beginners see simple, experts find advanced?
- **Code examples**: Copy-paste complete? Work as-is? Real context?
- **Interactive elements**: Playgrounds, sandboxes, "try it" buttons?
- **Versioning**: Docs match the version dev is using?
- **Tutorials vs references**: Both exist?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 5: Upgrade & Migration Path (Credible)
Rate 0-10: Can developers upgrade without fear?
Load reference: Read the "## Pass 5" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Backward compatibility**: What breaks? Blast radius limited?
- **Deprecation warnings**: Advance notice? Actionable? ("use newMethod() instead")
- **Migration guides**: Step-by-step for every breaking change?
- **Codemods**: Automated migration scripts?
- **Versioning strategy**: Semantic versioning? Clear policy?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 6: Developer Environment & Tooling (Valuable + Accessible)
Rate 0-10: Does this integrate into developers' existing workflows?
**Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
environment?
Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Editor integration**: Language server? Autocomplete? Inline docs?
- **CI/CD**: Works in GitHub Actions, GitLab CI? Non-interactive mode?
- **TypeScript support**: Types included? Good IntelliSense?
- **Testing support**: Easy to mock? Test utilities?
- **Local development**: Hot reload? Watch mode? Fast feedback?
- **Cross-platform**: Mac, Linux, Windows? Docker? ARM/x86?
- **Local env reproducibility**: Works across OS, package managers, containers, proxies?
- **Observability/testability**: Dry-run mode? Verbose output? Sample apps? Fixtures?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 7: Community & Ecosystem (Findable + Desirable)
Rate 0-10: Is there a community, and does the plan invest in ecosystem health?
Load reference: Read the "## Pass 7" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Open source**: Code open? Permissive license?
- **Community channels**: Where do devs ask questions? Someone answering?
- **Examples**: Real-world, runnable? Not just hello world?
- **Plugin/extension ecosystem**: Can devs extend it?
- **Contributing guide**: Process clear?
- **Pricing transparency**: No surprise bills?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 8: DX Measurement & Feedback Loops (Implement + Refine)
Rate 0-10: Does the plan include ways to measure and improve DX over time?
Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
- **Journey analytics**: Where do devs drop off?
- **Feedback mechanisms**: Bug reports? NPS? Feedback button?
- **Friction audits**: Periodic reviews planned?
- **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Appendix: Claude Code Skill DX Checklist
**Conditional: only run when product type includes "Claude Code skill".**
This is NOT a scored pass. It's a checklist of proven patterns from gstack's own DX.
Load reference: Read the "## Claude Code Skill DX Checklist" section from
`~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Check each item. For any unchecked item, explain what's missing and suggest the fix.
**STOP.** AskUserQuestion for any item that requires a design decision.
{{CODEX_PLAN_REVIEW}}
When constructing the outside voice prompt, include the Developer Persona from Step 0A
and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
in the context of who is using it and what they're competing against.
## CRITICAL RULE — How to ask questions
Follow the AskUserQuestion format from the Preamble above. Additional rules for
DX reviews:
* **One issue = one AskUserQuestion call.** Never combine multiple issues.
* **Ground every question in evidence.** Reference the persona, competitive benchmark,
empathy narrative, or friction trace. Never ask a question in the abstract.
* **Frame pain from the persona's perspective.** Not "developers would be frustrated"
but "[persona from 0A] would hit this at minute [N] of their getting-started flow
and [specific consequence: abandon, file an issue, hack a workaround]."
* Present 2-3 options. For each: effort to fix, impact on developer adoption.
* **Map to DX First Principles above.** One sentence connecting your recommendation
to a specific principle (e.g., "This violates 'zero friction at T0' because
[persona] needs 3 extra config steps before their first API call").
* **Zero findings:** if a section has zero findings, state "No issues, moving on"
and proceed. Otherwise, use AskUserQuestion for each gap — a gap with an
"obvious fix" is still a gap and still needs user approval before any change
lands in the plan.
* Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.
## Required Outputs
### Developer Persona Card
The persona card from Step 0A. This goes at the top of the plan's DX section.
### Developer Empathy Narrative
The first-person narrative from Step 0B, updated with user corrections.
### Competitive DX Benchmark
The benchmark table from Step 0C, updated with the product's post-review scores.
### Magical Moment Specification
The chosen delivery vehicle from Step 0D with implementation requirements.
### Developer Journey Map
The journey map from Step 0F, updated with all friction point resolutions.
### First-Time Developer Confusion Report
The roleplay report from Step 0G, annotated with which items were addressed.
### "NOT in scope" section
DX improvements considered and explicitly deferred, with one-line rationale each.
### "What already exists" section
Existing docs, examples, error handling, and DX patterns that the plan should reuse.
### TODOS.md updates
After all review passes are complete, present each potential TODO as its own individual
AskUserQuestion. Never batch. For DX debt: missing error messages, unspecified upgrade
paths, documentation gaps, missing SDK languages. Each TODO gets:
* **What:** One-line description
* **Why:** The concrete developer pain it causes
* **Pros:** What you gain (adoption, retention, satisfaction)
* **Cons:** Cost, complexity, or risks
* **Context:** Enough detail for someone to pick this up in 3 months
* **Depends on / blocked by:** Prerequisites
Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
### DX Scorecard
```
+====================================================================+
| DX PLAN REVIEW — SCORECARD |
+====================================================================+
| Dimension | Score | Prior | Trend |
|----------------------|--------|--------|--------|
| Getting Started | __/10 | __/10 | __ ↑↓ |
| API/CLI/SDK | __/10 | __/10 | __ ↑↓ |
| Error Messages | __/10 | __/10 | __ ↑↓ |
| Documentation | __/10 | __/10 | __ ↑↓ |
| Upgrade Path | __/10 | __/10 | __ ↑↓ |
| Dev Environment | __/10 | __/10 | __ ↑↓ |
| Community | __/10 | __/10 | __ ↑↓ |
| DX Measurement | __/10 | __/10 | __ ↑↓ |
+--------------------------------------------------------------------+
| TTHW | __ min | __ min | __ ↑↓ |
| Competitive Rank | [Champion/Competitive/Needs Work/Red Flag] |
| Magical Moment | [designed/missing] via [delivery vehicle] |
| Product Type | [type] |
| Mode | [EXPANSION/POLISH/TRIAGE] |
| Overall DX | __/10 | __/10 | __ ↑↓ |
+====================================================================+
| DX PRINCIPLE COVERAGE |
| Zero Friction | [covered/gap] |
| Learn by Doing | [covered/gap] |
| Fight Uncertainty | [covered/gap] |
| Opinionated + Escape Hatches | [covered/gap] |
| Code in Context | [covered/gap] |
| Magical Moments | [covered/gap] |
+====================================================================+
```
If all passes 8+: "DX plan is solid. Developers will have a good experience."
If any below 6: Flag as critical DX debt with specific impact on adoption.
If TTHW > 10 min: Flag as blocking issue.
### DX Implementation Checklist
```
DX IMPLEMENTATION CHECKLIST
============================
[ ] Time to hello world < [target from 0C]
[ ] Installation is one command
[ ] First run produces meaningful output
[ ] Magical moment delivered via [vehicle from 0D]
[ ] Every error message has: problem + cause + fix + docs link
[ ] API/CLI naming is guessable without docs
[ ] Every parameter has a sensible default
[ ] Docs have copy-paste examples that actually work
[ ] Examples show real use cases, not just hello world
[ ] Upgrade path documented with migration guide
[ ] Breaking changes have deprecation warnings + codemods
[ ] TypeScript types included (if applicable)
[ ] Works in CI/CD without special configuration
[ ] Free tier available, no credit card required
[ ] Changelog exists and is maintained
[ ] Search works in documentation
[ ] Community channel exists and is monitored
```
{{TASKS_SECTION_EMIT:devex-review}}
### Unresolved Decisions
If any AskUserQuestion goes unanswered, note here. Never silently default.
{{REVIEW_DASHBOARD}}
{{PLAN_FILE_REVIEW_REPORT}}
{{LEARNINGS_LOG}}
{{GBRAIN_SAVE_RESULTS}}
{{BRAIN_WRITE_BACK}}
{{BRAIN_CACHE_REFRESH}}
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, recommend next reviews:
**Recommend /plan-eng-review if eng review is not skipped globally** — DX issues often
have architectural implications. If this DX review found API design problems, error
handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
**Suggest /plan-design-review if user-facing UI exists** — DX review focuses on
developer-facing surfaces; design review covers end-user-facing UI.
**Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
be [target from 0C]. Did reality match? Run /devex-review on the live product to find
out. This is where the competitive benchmark pays off: you have a concrete target to
measure against.
Use AskUserQuestion with applicable options:
- **A)** Run /plan-eng-review next (required gate)
- **B)** Run /plan-design-review (only if UI scope detected)
- **C)** Ready to implement, run /devex-review after shipping
- **D)** Skip, I'll handle next steps manually
## Mode Quick Reference
```
| DX EXPANSION | DX POLISH | DX TRIAGE
Scope | Push UP (opt-in) | Maintain | Critical only
Posture | Enthusiastic | Rigorous | Surgical
Competitive | Full benchmark | Full benchmark | Skip
Magical | Full design | Verify exists | Skip
Journey | All stages + | All stages | Install + Hello
| best-in-class | | World only
Passes | All 8, expanded | All 8, standard | Pass 1 + 3 only
Outside voice| Recommended | Recommended | Skip
```
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
* One sentence max per option.
* After each pass, pause and wait for feedback before moving on.
* Rate before and after each pass for scannability.
Confirm you Read the review section the Section index named, and executed all 8 DX passes, the required outputs, and the review report in full. If you produced findings or the review report from memory without Reading `sections/review-sections.md`, stop and Read it now.
{{EXIT_PLAN_MODE_GATE}}

View File

@ -0,0 +1,9 @@
{
"$schema": "https://gstack.dev/schemas/section-manifest.json",
"skill": "plan-devex-review",
"version": 1,
"note": "PASSIVE registry (v2 plan T9 / CM2). id/file/title/trigger text ONLY. The skeleton's decision-tree prose decides WHEN to read. See docs/designs/v2_PLAN.md.",
"sections": [
{ "id": "review-sections", "file": "review-sections.md", "title": "8 DX passes, required outputs + review report", "trigger": "running the 8 DX passes, required outputs, and review report (only after Step 0 investigation is complete)" }
]
}

View File

@ -0,0 +1,836 @@
<!-- AUTO-GENERATED from review-sections.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Review Sections (8 passes, after Step 0 is complete)
**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
**Anti-shortcut clause:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Writing every finding into one plan write and calling ExitPlanMode without firing AskUserQuestion is the precise failure mode of the May 2026 transcript bug — the model explored, found issues, and dumped them into a deliverable rather than walking the user through them. If you have ANY non-trivial finding in any review section, the path from finding to ExitPlanMode goes THROUGH AskUserQuestion. Zero findings in every section is the only path to ExitPlanMode that bypasses AskUserQuestion. If you find yourself wanting to write a plan with findings before asking, stop and call AskUserQuestion now — that's the bug, recognize it.
## Prior Learnings
Search for relevant learnings from previous sessions:
```bash
_CROSS_PROJ=$(~/.claude/skills/gstack/bin/gstack-config get cross_project_learnings 2>/dev/null || echo "unset")
echo "CROSS_PROJECT: $_CROSS_PROJ"
if [ "$_CROSS_PROJ" = "true" ]; then
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --cross-project 2>/dev/null || true
else
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 2>/dev/null || true
fi
```
If `CROSS_PROJECT` is `unset` (first time): Use AskUserQuestion:
> gstack can search learnings from your other projects on this machine to find
> patterns that might apply here. This stays local (no data leaves your machine).
> Recommended for solo developers. Skip if you work on multiple client codebases
> where cross-contamination would be a concern.
Options:
- A) Enable cross-project learnings (recommended)
- B) Keep learnings project-scoped only
If A: run `~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings true`
If B: run `~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings false`
Then re-run the search with the appropriate flag.
If learnings are found, incorporate them into your analysis. When a review finding
matches a past learning, display:
**"Prior learning applied: [key] (confidence N/10, from [date])"**
This makes the compounding visible. The user should see that gstack is getting
smarter on their codebase over time.
### DX Trend Check
Before starting review passes, check for prior DX reviews on this project:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null | grep plan-devex-review || echo "NO_PRIOR_DX_REVIEWS"
```
If prior reviews exist, display the trend:
```
DX TREND (prior reviews):
Dimension | Prior Score | Notes
Getting Started | 4/10 | from 2026-03-15
...
```
### Pass 1: Getting Started Experience (Zero Friction)
Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?
**Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
magical moment from 0D (delivery vehicle), and any Install/Hello World friction
points from 0F.
Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Installation**: One command? One click? No prerequisites?
- **First run**: Does the first command produce visible, meaningful output?
- **Sandbox/Playground**: Can developers try before installing?
- **Free tier**: No credit card, no sales call, no company email?
- **Quick start guide**: Copy-paste complete? Shows real output?
- **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
- **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?
FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
expected output, and time budget per step. Target: 3 steps or fewer, under the
time chosen in 0C.
Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
in one terminal session without leaving the terminal?
**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.
### Pass 2: API/CLI/SDK Design (Usable + Useful)
Rate 0-10: Is the interface intuitive, consistent, and complete?
**Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
A YC founder expects `tool.do(thing)`. A platform engineer expects
`tool.configure(options).execute(thing)`.
Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Naming**: Guessable without docs? Consistent grammar?
- **Defaults**: Every parameter has a sensible default? Simplest call gives useful result?
- **Consistency**: Same patterns across the entire API surface?
- **Completeness**: 100% coverage or do devs drop to raw HTTP for edge cases?
- **Discoverability**: Can devs explore from CLI/playground without docs?
- **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
- **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
- **Persona fit**: Does the interface match how [persona] thinks about the problem?
Good API design test: Can a [persona] use this API correctly after seeing one example?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 3: Error Messages & Debugging (Fight Uncertainty)
Rate 0-10: When something goes wrong, does the developer know what happened, why,
and how to fix it?
**Evidence recall:** Reference any error-related friction points from 0F and confusion
points from 0G.
Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
the three-tier system from the Hall of Fame:
- **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
- **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
- **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
For each error path, show what the developer currently sees vs. what they should see.
Also evaluate:
- **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
- **Debug mode**: Verbose output available?
- **Stack traces**: Useful or internal framework noise?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 4: Documentation & Learning (Findable + Learn by Doing)
Rate 0-10: Can a developer find what they need and learn by doing?
**Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
style? A YC founder needs copy-paste examples front and center. A platform engineer
needs architecture docs and API reference.
Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Information architecture**: Find what they need in under 2 minutes?
- **Progressive disclosure**: Beginners see simple, experts find advanced?
- **Code examples**: Copy-paste complete? Work as-is? Real context?
- **Interactive elements**: Playgrounds, sandboxes, "try it" buttons?
- **Versioning**: Docs match the version dev is using?
- **Tutorials vs references**: Both exist?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 5: Upgrade & Migration Path (Credible)
Rate 0-10: Can developers upgrade without fear?
Load reference: Read the "## Pass 5" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Backward compatibility**: What breaks? Blast radius limited?
- **Deprecation warnings**: Advance notice? Actionable? ("use newMethod() instead")
- **Migration guides**: Step-by-step for every breaking change?
- **Codemods**: Automated migration scripts?
- **Versioning strategy**: Semantic versioning? Clear policy?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 6: Developer Environment & Tooling (Valuable + Accessible)
Rate 0-10: Does this integrate into developers' existing workflows?
**Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
environment?
Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Editor integration**: Language server? Autocomplete? Inline docs?
- **CI/CD**: Works in GitHub Actions, GitLab CI? Non-interactive mode?
- **TypeScript support**: Types included? Good IntelliSense?
- **Testing support**: Easy to mock? Test utilities?
- **Local development**: Hot reload? Watch mode? Fast feedback?
- **Cross-platform**: Mac, Linux, Windows? Docker? ARM/x86?
- **Local env reproducibility**: Works across OS, package managers, containers, proxies?
- **Observability/testability**: Dry-run mode? Verbose output? Sample apps? Fixtures?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 7: Community & Ecosystem (Findable + Desirable)
Rate 0-10: Is there a community, and does the plan invest in ecosystem health?
Load reference: Read the "## Pass 7" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Open source**: Code open? Permissive license?
- **Community channels**: Where do devs ask questions? Someone answering?
- **Examples**: Real-world, runnable? Not just hello world?
- **Plugin/extension ecosystem**: Can devs extend it?
- **Contributing guide**: Process clear?
- **Pricing transparency**: No surprise bills?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 8: DX Measurement & Feedback Loops (Implement + Refine)
Rate 0-10: Does the plan include ways to measure and improve DX over time?
Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
- **Journey analytics**: Where do devs drop off?
- **Feedback mechanisms**: Bug reports? NPS? Feedback button?
- **Friction audits**: Periodic reviews planned?
- **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Appendix: Claude Code Skill DX Checklist
**Conditional: only run when product type includes "Claude Code skill".**
This is NOT a scored pass. It's a checklist of proven patterns from gstack's own DX.
Load reference: Read the "## Claude Code Skill DX Checklist" section from
`~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Check each item. For any unchecked item, explain what's missing and suggest the fix.
**STOP.** AskUserQuestion for any item that requires a design decision.
## Outside Voice — Independent Plan Challenge (optional, recommended)
After all review sections are complete, offer an independent second opinion from a
different AI system. Two models agreeing on a plan is stronger signal than one model's
thorough review.
**Check tool availability:**
```bash
command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
```
Use AskUserQuestion:
> "All review sections are complete. Want an outside voice? A different AI system can
> give a brutally honest, independent challenge of this plan — logical gaps, feasibility
> risks, and blind spots that are hard to catch from inside the review. Takes about 2
> minutes."
>
> RECOMMENDATION: Choose A — an independent second opinion catches structural blind
> spots. Two different AI models agreeing on a plan is stronger signal than one model's
> thorough review. Completeness: A=9/10, B=7/10.
Options:
- A) Get the outside voice (recommended)
- B) Skip — proceed to outputs
**If B:** Print "Skipping outside voice." and continue to the next section.
**If A:** Construct the plan review prompt. Read the plan file being reviewed (the file
the user pointed this review at, or the branch diff scope). If a CEO plan document
was written in Step 0D-POST, read that too — it contains the scope decisions and vision.
Construct this prompt (substitute the actual plan content — if plan content exceeds 30KB,
truncate to the first 30KB and note "Plan truncated for size"). **Always start with the
filesystem boundary instruction:**
"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nYou are a brutally honest technical reviewer examining a development plan that has
already been through a multi-section review. Your job is NOT to repeat that review.
Instead, find what it missed. Look for: logical gaps and unstated assumptions that
survived the review scrutiny, overcomplexity (is there a fundamentally simpler
approach the review was too deep in the weeds to see?), feasibility risks the review
took for granted, missing dependencies or sequencing issues, and strategic
miscalibration (is this the right thing to build at all?). Be direct. Be terse. No
compliments. Just the problems.
THE PLAN:
<plan content>"
**If CODEX_AVAILABLE:**
```bash
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached < /dev/null 2>"$TMPERR_PV"
```
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
```bash
cat "$TMPERR_PV"
```
Present the full output verbatim:
```
CODEX SAYS (plan review — outside voice):
════════════════════════════════════════════════════════════
<full codex output, verbatim do not truncate or summarize>
════════════════════════════════════════════════════════════
```
**Error handling:** All errors are non-blocking — the outside voice is informational.
- Auth failure (stderr contains "auth", "login", "unauthorized"): "Codex auth failed. Run \`codex login\` to authenticate."
- Timeout: "Codex timed out after 5 minutes."
- Empty response: "Codex returned no response."
On any Codex error, fall back to the Claude adversarial subagent.
**If CODEX_NOT_AVAILABLE (or Codex errored):**
Dispatch via the Agent tool. The subagent has fresh context — genuine independence.
Subagent prompt: same plan review prompt as above.
Present findings under an `OUTSIDE VOICE (Claude subagent):` header.
If the subagent fails or times out: "Outside voice unavailable. Continuing to outputs."
**Cross-model tension:**
After presenting the outside voice findings, note any points where the outside voice
disagrees with the review findings from earlier sections. Flag these as:
```
CROSS-MODEL TENSION:
[Topic]: Review said X. Outside voice says Y. [Present both perspectives neutrally.
State what context you might be missing that would change the answer.]
```
**User Sovereignty:** Do NOT auto-incorporate outside voice recommendations into the plan.
Present each tension point to the user. The user decides. Cross-model agreement is a
strong signal — present it as such — but it is NOT permission to act. You may state
which argument you find more compelling, but you MUST NOT apply the change without
explicit user approval.
For each substantive tension point, use AskUserQuestion:
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
> argues [Y]. [One sentence on what context you might be missing.]"
>
> RECOMMENDATION: Choose [A or B] because [one-line reason explaining which argument
> is more compelling and why]. Completeness: A=X/10, B=Y/10.
Options:
- A) Accept the outside voice's recommendation (I'll apply this change)
- B) Keep the current approach (reject the outside voice)
- C) Investigate further before deciding
- D) Add to TODOS.md for later
Wait for the user's response. Do NOT default to accepting because you agree with the
outside voice. If the user chooses B, the current approach stands — do not re-argue.
If no tension points exist, note: "No cross-model tension — both reviewers agree."
**Persist the result:**
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-plan-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}'
```
Substitute: STATUS = "clean" if no findings, "issues_found" if findings exist.
SOURCE = "codex" if Codex ran, "claude" if subagent ran.
**Cleanup:** Run `rm -f "$TMPERR_PV"` after processing (if Codex was used).
---
When constructing the outside voice prompt, include the Developer Persona from Step 0A
and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
in the context of who is using it and what they're competing against.
## CRITICAL RULE — How to ask questions
Follow the AskUserQuestion format from the Preamble above. Additional rules for
DX reviews:
* **One issue = one AskUserQuestion call.** Never combine multiple issues.
* **Ground every question in evidence.** Reference the persona, competitive benchmark,
empathy narrative, or friction trace. Never ask a question in the abstract.
* **Frame pain from the persona's perspective.** Not "developers would be frustrated"
but "[persona from 0A] would hit this at minute [N] of their getting-started flow
and [specific consequence: abandon, file an issue, hack a workaround]."
* Present 2-3 options. For each: effort to fix, impact on developer adoption.
* **Map to DX First Principles above.** One sentence connecting your recommendation
to a specific principle (e.g., "This violates 'zero friction at T0' because
[persona] needs 3 extra config steps before their first API call").
* **Zero findings:** if a section has zero findings, state "No issues, moving on"
and proceed. Otherwise, use AskUserQuestion for each gap — a gap with an
"obvious fix" is still a gap and still needs user approval before any change
lands in the plan.
* Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.
## Required Outputs
### Developer Persona Card
The persona card from Step 0A. This goes at the top of the plan's DX section.
### Developer Empathy Narrative
The first-person narrative from Step 0B, updated with user corrections.
### Competitive DX Benchmark
The benchmark table from Step 0C, updated with the product's post-review scores.
### Magical Moment Specification
The chosen delivery vehicle from Step 0D with implementation requirements.
### Developer Journey Map
The journey map from Step 0F, updated with all friction point resolutions.
### First-Time Developer Confusion Report
The roleplay report from Step 0G, annotated with which items were addressed.
### "NOT in scope" section
DX improvements considered and explicitly deferred, with one-line rationale each.
### "What already exists" section
Existing docs, examples, error handling, and DX patterns that the plan should reuse.
### TODOS.md updates
After all review passes are complete, present each potential TODO as its own individual
AskUserQuestion. Never batch. For DX debt: missing error messages, unspecified upgrade
paths, documentation gaps, missing SDK languages. Each TODO gets:
* **What:** One-line description
* **Why:** The concrete developer pain it causes
* **Pros:** What you gain (adoption, retention, satisfaction)
* **Cons:** Cost, complexity, or risks
* **Context:** Enough detail for someone to pick this up in 3 months
* **Depends on / blocked by:** Prerequisites
Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
### DX Scorecard
```
+====================================================================+
| DX PLAN REVIEW — SCORECARD |
+====================================================================+
| Dimension | Score | Prior | Trend |
|----------------------|--------|--------|--------|
| Getting Started | __/10 | __/10 | __ ↑↓ |
| API/CLI/SDK | __/10 | __/10 | __ ↑↓ |
| Error Messages | __/10 | __/10 | __ ↑↓ |
| Documentation | __/10 | __/10 | __ ↑↓ |
| Upgrade Path | __/10 | __/10 | __ ↑↓ |
| Dev Environment | __/10 | __/10 | __ ↑↓ |
| Community | __/10 | __/10 | __ ↑↓ |
| DX Measurement | __/10 | __/10 | __ ↑↓ |
+--------------------------------------------------------------------+
| TTHW | __ min | __ min | __ ↑↓ |
| Competitive Rank | [Champion/Competitive/Needs Work/Red Flag] |
| Magical Moment | [designed/missing] via [delivery vehicle] |
| Product Type | [type] |
| Mode | [EXPANSION/POLISH/TRIAGE] |
| Overall DX | __/10 | __/10 | __ ↑↓ |
+====================================================================+
| DX PRINCIPLE COVERAGE |
| Zero Friction | [covered/gap] |
| Learn by Doing | [covered/gap] |
| Fight Uncertainty | [covered/gap] |
| Opinionated + Escape Hatches | [covered/gap] |
| Code in Context | [covered/gap] |
| Magical Moments | [covered/gap] |
+====================================================================+
```
If all passes 8+: "DX plan is solid. Developers will have a good experience."
If any below 6: Flag as critical DX debt with specific impact on adoption.
If TTHW > 10 min: Flag as blocking issue.
### DX Implementation Checklist
```
DX IMPLEMENTATION CHECKLIST
============================
[ ] Time to hello world < [target from 0C]
[ ] Installation is one command
[ ] First run produces meaningful output
[ ] Magical moment delivered via [vehicle from 0D]
[ ] Every error message has: problem + cause + fix + docs link
[ ] API/CLI naming is guessable without docs
[ ] Every parameter has a sensible default
[ ] Docs have copy-paste examples that actually work
[ ] Examples show real use cases, not just hello world
[ ] Upgrade path documented with migration guide
[ ] Breaking changes have deprecation warnings + codemods
[ ] TypeScript types included (if applicable)
[ ] Works in CI/CD without special configuration
[ ] Free tier available, no credit card required
[ ] Changelog exists and is maintained
[ ] Search works in documentation
[ ] Community channel exists and is monitored
```
## Implementation Tasks
Before closing this review, synthesize the findings above into a flat list of
build-actionable tasks. Each task derives from a specific finding — no padding.
Emit the markdown section AND write a JSONL artifact that `/autoplan` can
aggregate across phases.
### Markdown section (always emit)
```markdown
## Implementation Tasks
Synthesized from this review's findings. Each task derives from a specific
finding above. Run with Claude Code or Codex; checkbox as you ship.
- [ ] **T1 (P1, human: ~2h / CC: ~15min)**<component><imperative title>
- Surfaced by: <section name><specific finding text or line reference>
- Files: <paths to touch>
- Verify: <test command or manual check>
- [ ] **T2 (P2, human: ~30min / CC: ~5min)** — ...
```
Rules:
- P1 blocks ship; P2 should land same branch; P3 is a follow-up TODO.
- If a finding produced no actionable task, do not invent one.
- If a section had zero findings, emit `_No new tasks from <section>._`
- Effort uses the AI-compression table from CLAUDE.md.
### JSONL artifact (always write, even if zero tasks)
`/autoplan` reads this file to aggregate across phases. Build each line with
`jq -nc` so titles and source findings containing quotes, newlines, or
backslashes serialize cleanly — never use hand-rolled `echo` / `printf`.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
TASKS_DIR="${HOME}/.gstack/projects/${SLUG:-unknown}"
mkdir -p "$TASKS_DIR"
TASKS_FILE="$TASKS_DIR/tasks-devex-review-$(date +%Y%m%d-%H%M%S).jsonl"
COMMIT=$(git rev-parse HEAD 2>/dev/null || echo unknown)
BRANCH=$(git branch --show-current 2>/dev/null || echo unknown)
RUN_ID="$(date -u +%Y%m%dT%H%M%SZ)-$$"
# Repeat ONE jq invocation per task identified during this review.
# Substitute the placeholders inline with shell variables you set per task:
# TASK_ID (T1, T2, ...), PRIORITY (P1/P2/P3), COMPONENT, TITLE,
# SOURCE_FINDING, EFFORT_HUMAN, EFFORT_CC, FILES_JSON (a JSON array literal
# like '["browse/src/sanitize.ts","browse/src/server.ts"]').
jq -nc \
--arg phase 'devex-review' \
--arg run_id "$RUN_ID" \
--arg branch "$BRANCH" \
--arg commit "$COMMIT" \
--arg id "$TASK_ID" \
--arg priority "$PRIORITY" \
--arg component "$COMPONENT" \
--arg effort_human "$EFFORT_HUMAN" \
--arg effort_cc "$EFFORT_CC" \
--arg title "$TITLE" \
--arg source_finding "$SOURCE_FINDING" \
--argjson files "$FILES_JSON" \
'{phase:$phase, run_id:$run_id, branch:$branch, commit:$commit, id:$id, priority:$priority, component:$component, files:$files, effort_human:$effort_human, effort_cc:$effort_cc, title:$title, source_finding:$source_finding}' \
>> "$TASKS_FILE"
```
If `jq` is not installed, fall back to skipping the JSONL write and warn
the user to install jq for autoplan aggregation. Never hand-roll JSONL.
If zero tasks were identified in this review, still touch the JSONL file
(`: > "$TASKS_FILE"`) so the aggregator sees that the phase produced output
this run (an empty file means "ran, no findings" — distinct from "didn't run").
### Unresolved Decisions
If any AskUserQuestion goes unanswered, note here. Never silently default.
## Review Readiness Dashboard
After completing the review, read the review log and config to display the dashboard.
```bash
~/.claude/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
```
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | — | — | no |
| Design Review | 0 | — | — | no |
| Adversarial | 0 | — | — | no |
| Outside Voice | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed |
+====================================================================+
```
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days from either \`review\` or \`plan-eng-review\` with status "clean" (or \`skip_eng_review\` is \`true\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO, Design, and Codex reviews are shown for context but never block shipping
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
**Staleness detection:** After displaying the dashboard, check if any existing reviews may be stale:
- Parse the \`---HEAD---\` section from the bash output to get the current HEAD commit hash
- For each review entry that has a \`commit\` field: compare it against the current HEAD. If different, count elapsed commits: \`git rev-list --count STORED_COMMIT..HEAD\`. Display: "Note: {skill} review from {date} may be stale — {N} commits since review"
- For entries without a \`commit\` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
- If all reviews match the current HEAD, do not display any staleness notes
## Plan File Review Report
After displaying the Review Readiness Dashboard in conversation output, also update the
**plan file** itself so review status is visible to anyone reading the plan.
### Detect the plan file
1. Check if there is an active plan file in this conversation (the host provides plan file
paths in system messages — look for plan file references in the conversation context).
2. If not found, skip this section silently — not every review runs in plan mode.
### Generate the report
Read the review log output you already have from the Review Readiness Dashboard step above.
Parse each JSONL entry. Each skill logs different fields:
- **plan-ceo-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`mode\`, \`scope_proposed\`, \`scope_accepted\`, \`scope_deferred\`, \`commit\`
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
- **plan-eng-review**: \`status\`, \`unresolved\`, \`critical_gaps\`, \`issues_found\`, \`mode\`, \`commit\`
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
- **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **plan-devex-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`product_type\`, \`tthw_current\`, \`tthw_target\`, \`mode\`, \`persona\`, \`competitive_tier\`, \`unresolved\`, \`commit\`
→ Findings: "score: {initial_score}/10 → {overall_score}/10, TTHW: {tthw_current} → {tthw_target}"
- **devex-review**: \`status\`, \`overall_score\`, \`product_type\`, \`tthw_measured\`, \`dimensions_tested\`, \`dimensions_inferred\`, \`boomerang\`, \`commit\`
→ Findings: "score: {overall_score}/10, TTHW: {tthw_measured}, {dimensions_tested} tested/{dimensions_inferred} inferred"
- **codex-review**: \`status\`, \`gate\`, \`findings\`, \`findings_fixed\`
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries.
For the review you just completed, you may use richer details from your own Completion
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
\`\`\`markdown
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | \`/plan-ceo-review\` | Scope & strategy | {runs} | {status} | {findings} |
| Codex Review | \`/codex review\` | Independent 2nd opinion | {runs} | {status} | {findings} |
| Eng Review | \`/plan-eng-review\` | Architecture & tests (required) | {runs} | {status} | {findings} |
| Design Review | \`/plan-design-review\` | UI/UX gaps | {runs} | {status} | {findings} |
| DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} |
\`\`\`
Below the table, add these lines (omit any that are empty/not applicable):
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
- **UNRESOLVED:** total unresolved decisions across all reviews
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
### Write to the plan file
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
file you are allowed to edit in plan mode. The plan file review report is part of the
plan's living status.
The report must always be the LAST section of the plan file — never mid-file.
Use a single delete-then-append flow:
1. Read the plan file (Read tool) to see its full current content. Search the read
output for a \`## GSTACK REVIEW REPORT\` heading anywhere in the file.
2. If found, use the Edit tool to DELETE the entire existing section. Match from
\`## GSTACK REVIEW REPORT\` through either the next \`## \` heading or end of
file, whichever comes first. Replace with the empty string. This applies
regardless of where the section currently lives — mid-file deletion is
intentional, not a special case. If the Edit fails (e.g., concurrent edit
changed the content), re-read the plan file and retry once.
3. After the delete (or skipped, if no section existed), append the new
\`## GSTACK REVIEW REPORT\` section at the END of the file. Use the Edit
tool to match the file's current last paragraph and add the section after it,
or use Write to re-emit the whole file with the section at the end.
4. Verify with the Read tool that \`## GSTACK REVIEW REPORT\` is the last
\`## \` heading in the file before continuing. If it isn't, repeat steps
2-3 once.
Do NOT replace the section in place. The "replace mid-file" path is what allowed
prior versions to leave the report mid-file when an older report already lived
there — the user then sees a plan whose review report is not at the bottom and
(correctly) rejects it.
## Capture Learnings
If you discovered a non-obvious pattern, pitfall, or architectural insight during
this session, log it for future sessions:
```bash
~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"plan-devex-review","type":"TYPE","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":N,"source":"SOURCE","files":["path/to/relevant/file"]}'
```
**Types:** `pattern` (reusable approach), `pitfall` (what NOT to do), `preference`
(user stated), `architecture` (structural decision), `tool` (library/framework insight),
`operational` (project environment/CLI/workflow knowledge).
**Sources:** `observed` (you found this in the code), `user-stated` (user told you),
`inferred` (AI deduction), `cross-model` (both Claude and Codex agree).
**Confidence:** 1-10. Be honest. An observed pattern you verified in the code is 8-9.
An inference you're not sure about is 4-5. A user preference they explicitly stated is 10.
**files:** Include the specific file paths this learning references. This enables
staleness detection: if those files are later deleted, the learning can be flagged.
**Only log genuine discoveries.** Don't log obvious things. Don't log things the user
already knows. A good test: would this insight save time in a future session? If yes, log it.
## Brain Calibration Write-Back (Phase 2 / gated)
When the skill makes a typed prediction worth tracking (scope decision,
TTHW target, architectural bet, wedge commitment), it MAY write a
`kind=bet` take to the brain so a calibration profile builds over time.
**Gated on two things:**
1. Brain trust policy for the active endpoint is `personal` (check via
`~/.claude/skills/gstack/bin/gstack-config get brain_trust_policy@<endpoint-hash>`).
Shared brains skip write-back to avoid polluting team calibration.
2. Feature flag `BRAIN_CALIBRATION_WRITEBACK` is set (today: false; flips
to true when upstream gbrain v0.42+ ships `takes_add` MCP op).
When both gates pass, the write-back path uses `mcp__gbrain__takes_add`
to record a take with weight 0.6 (per SKILL_CALIBRATION_WEIGHTS).
If the MCP op is unavailable, fall back to `mcp__gbrain__put_page` with
a gstack:takes fence block (documented but uglier path).
Mandatory take frontmatter shape:
```yaml
kind: bet
holder: <user identity from whoami>
claim: <one-line prediction the skill is making>
weight: 0.6
since_date: <today's date>
expected_resolution: <date in 1-3 months depending on skill>
source_skill: plan-devex-review
```
After write, invalidate the affected digests so the next preflight reflects
the new state:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-brain-cache invalidate developer-persona --project "$SLUG" 2>/dev/null || true
```
## Brain Cache Background Refresh
After the skill's work completes (and telemetry has logged), kick a
background refresh of any cache digest that's getting close to its TTL.
This is non-blocking — the user doesn't wait. Next invocation benefits
from the warm cache.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
(~/.claude/skills/gstack/bin/gstack-brain-cache refresh --project "$SLUG" 2>/dev/null &) || true
```
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, recommend next reviews:
**Recommend /plan-eng-review if eng review is not skipped globally** — DX issues often
have architectural implications. If this DX review found API design problems, error
handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
**Suggest /plan-design-review if user-facing UI exists** — DX review focuses on
developer-facing surfaces; design review covers end-user-facing UI.
**Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
be [target from 0C]. Did reality match? Run /devex-review on the live product to find
out. This is where the competitive benchmark pays off: you have a concrete target to
measure against.
Use AskUserQuestion with applicable options:
- **A)** Run /plan-eng-review next (required gate)
- **B)** Run /plan-design-review (only if UI scope detected)
- **C)** Ready to implement, run /devex-review after shipping
- **D)** Skip, I'll handle next steps manually
## Mode Quick Reference
```
| DX EXPANSION | DX POLISH | DX TRIAGE
Scope | Push UP (opt-in) | Maintain | Critical only
Posture | Enthusiastic | Rigorous | Surgical
Competitive | Full benchmark | Full benchmark | Skip
Magical | Full design | Verify exists | Skip
Journey | All stages + | All stages | Install + Hello
| best-in-class | | World only
Passes | All 8, expanded | All 8, standard | Pass 1 + 3 only
Outside voice| Recommended | Recommended | Skip
```
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
* One sentence max per option.
* After each pass, pause and wait for feedback before moving on.
* Rate before and after each pass for scannability.

View File

@ -0,0 +1,391 @@
## Review Sections (8 passes, after Step 0 is complete)
**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-8) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so DX passes don't apply" is always wrong — DX gaps are where adoption breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
{{ANTI_SHORTCUT_CLAUSE}}
{{LEARNINGS_SEARCH}}
### DX Trend Check
Before starting review passes, check for prior DX reviews on this project:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null | grep plan-devex-review || echo "NO_PRIOR_DX_REVIEWS"
```
If prior reviews exist, display the trend:
```
DX TREND (prior reviews):
Dimension | Prior Score | Notes
Getting Started | 4/10 | from 2026-03-15
...
```
### Pass 1: Getting Started Experience (Zero Friction)
Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?
**Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
magical moment from 0D (delivery vehicle), and any Install/Hello World friction
points from 0F.
Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Installation**: One command? One click? No prerequisites?
- **First run**: Does the first command produce visible, meaningful output?
- **Sandbox/Playground**: Can developers try before installing?
- **Free tier**: No credit card, no sales call, no company email?
- **Quick start guide**: Copy-paste complete? Shows real output?
- **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
- **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?
FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
expected output, and time budget per step. Target: 3 steps or fewer, under the
time chosen in 0C.
Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
in one terminal session without leaving the terminal?
**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.
### Pass 2: API/CLI/SDK Design (Usable + Useful)
Rate 0-10: Is the interface intuitive, consistent, and complete?
**Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
A YC founder expects `tool.do(thing)`. A platform engineer expects
`tool.configure(options).execute(thing)`.
Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Naming**: Guessable without docs? Consistent grammar?
- **Defaults**: Every parameter has a sensible default? Simplest call gives useful result?
- **Consistency**: Same patterns across the entire API surface?
- **Completeness**: 100% coverage or do devs drop to raw HTTP for edge cases?
- **Discoverability**: Can devs explore from CLI/playground without docs?
- **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
- **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
- **Persona fit**: Does the interface match how [persona] thinks about the problem?
Good API design test: Can a [persona] use this API correctly after seeing one example?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 3: Error Messages & Debugging (Fight Uncertainty)
Rate 0-10: When something goes wrong, does the developer know what happened, why,
and how to fix it?
**Evidence recall:** Reference any error-related friction points from 0F and confusion
points from 0G.
Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
the three-tier system from the Hall of Fame:
- **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
- **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
- **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
For each error path, show what the developer currently sees vs. what they should see.
Also evaluate:
- **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
- **Debug mode**: Verbose output available?
- **Stack traces**: Useful or internal framework noise?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 4: Documentation & Learning (Findable + Learn by Doing)
Rate 0-10: Can a developer find what they need and learn by doing?
**Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
style? A YC founder needs copy-paste examples front and center. A platform engineer
needs architecture docs and API reference.
Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Information architecture**: Find what they need in under 2 minutes?
- **Progressive disclosure**: Beginners see simple, experts find advanced?
- **Code examples**: Copy-paste complete? Work as-is? Real context?
- **Interactive elements**: Playgrounds, sandboxes, "try it" buttons?
- **Versioning**: Docs match the version dev is using?
- **Tutorials vs references**: Both exist?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 5: Upgrade & Migration Path (Credible)
Rate 0-10: Can developers upgrade without fear?
Load reference: Read the "## Pass 5" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Backward compatibility**: What breaks? Blast radius limited?
- **Deprecation warnings**: Advance notice? Actionable? ("use newMethod() instead")
- **Migration guides**: Step-by-step for every breaking change?
- **Codemods**: Automated migration scripts?
- **Versioning strategy**: Semantic versioning? Clear policy?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 6: Developer Environment & Tooling (Valuable + Accessible)
Rate 0-10: Does this integrate into developers' existing workflows?
**Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
environment?
Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Editor integration**: Language server? Autocomplete? Inline docs?
- **CI/CD**: Works in GitHub Actions, GitLab CI? Non-interactive mode?
- **TypeScript support**: Types included? Good IntelliSense?
- **Testing support**: Easy to mock? Test utilities?
- **Local development**: Hot reload? Watch mode? Fast feedback?
- **Cross-platform**: Mac, Linux, Windows? Docker? ARM/x86?
- **Local env reproducibility**: Works across OS, package managers, containers, proxies?
- **Observability/testability**: Dry-run mode? Verbose output? Sample apps? Fixtures?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 7: Community & Ecosystem (Findable + Desirable)
Rate 0-10: Is there a community, and does the plan invest in ecosystem health?
Load reference: Read the "## Pass 7" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **Open source**: Code open? Permissive license?
- **Community channels**: Where do devs ask questions? Someone answering?
- **Examples**: Real-world, runnable? Not just hello world?
- **Plugin/extension ecosystem**: Can devs extend it?
- **Contributing guide**: Process clear?
- **Pricing transparency**: No surprise bills?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Pass 8: DX Measurement & Feedback Loops (Implement + Refine)
Rate 0-10: Does the plan include ways to measure and improve DX over time?
Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Evaluate:
- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
- **Journey analytics**: Where do devs drop off?
- **Feedback mechanisms**: Bug reports? NPS? Feedback button?
- **Friction audits**: Periodic reviews planned?
- **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?
**STOP.** AskUserQuestion once per issue. Recommend + WHY.
### Appendix: Claude Code Skill DX Checklist
**Conditional: only run when product type includes "Claude Code skill".**
This is NOT a scored pass. It's a checklist of proven patterns from gstack's own DX.
Load reference: Read the "## Claude Code Skill DX Checklist" section from
`~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
Check each item. For any unchecked item, explain what's missing and suggest the fix.
**STOP.** AskUserQuestion for any item that requires a design decision.
{{CODEX_PLAN_REVIEW}}
When constructing the outside voice prompt, include the Developer Persona from Step 0A
and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
in the context of who is using it and what they're competing against.
## CRITICAL RULE — How to ask questions
Follow the AskUserQuestion format from the Preamble above. Additional rules for
DX reviews:
* **One issue = one AskUserQuestion call.** Never combine multiple issues.
* **Ground every question in evidence.** Reference the persona, competitive benchmark,
empathy narrative, or friction trace. Never ask a question in the abstract.
* **Frame pain from the persona's perspective.** Not "developers would be frustrated"
but "[persona from 0A] would hit this at minute [N] of their getting-started flow
and [specific consequence: abandon, file an issue, hack a workaround]."
* Present 2-3 options. For each: effort to fix, impact on developer adoption.
* **Map to DX First Principles above.** One sentence connecting your recommendation
to a specific principle (e.g., "This violates 'zero friction at T0' because
[persona] needs 3 extra config steps before their first API call").
* **Zero findings:** if a section has zero findings, state "No issues, moving on"
and proceed. Otherwise, use AskUserQuestion for each gap — a gap with an
"obvious fix" is still a gap and still needs user approval before any change
lands in the plan.
* Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.
## Required Outputs
### Developer Persona Card
The persona card from Step 0A. This goes at the top of the plan's DX section.
### Developer Empathy Narrative
The first-person narrative from Step 0B, updated with user corrections.
### Competitive DX Benchmark
The benchmark table from Step 0C, updated with the product's post-review scores.
### Magical Moment Specification
The chosen delivery vehicle from Step 0D with implementation requirements.
### Developer Journey Map
The journey map from Step 0F, updated with all friction point resolutions.
### First-Time Developer Confusion Report
The roleplay report from Step 0G, annotated with which items were addressed.
### "NOT in scope" section
DX improvements considered and explicitly deferred, with one-line rationale each.
### "What already exists" section
Existing docs, examples, error handling, and DX patterns that the plan should reuse.
### TODOS.md updates
After all review passes are complete, present each potential TODO as its own individual
AskUserQuestion. Never batch. For DX debt: missing error messages, unspecified upgrade
paths, documentation gaps, missing SDK languages. Each TODO gets:
* **What:** One-line description
* **Why:** The concrete developer pain it causes
* **Pros:** What you gain (adoption, retention, satisfaction)
* **Cons:** Cost, complexity, or risks
* **Context:** Enough detail for someone to pick this up in 3 months
* **Depends on / blocked by:** Prerequisites
Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
### DX Scorecard
```
+====================================================================+
| DX PLAN REVIEW — SCORECARD |
+====================================================================+
| Dimension | Score | Prior | Trend |
|----------------------|--------|--------|--------|
| Getting Started | __/10 | __/10 | __ ↑↓ |
| API/CLI/SDK | __/10 | __/10 | __ ↑↓ |
| Error Messages | __/10 | __/10 | __ ↑↓ |
| Documentation | __/10 | __/10 | __ ↑↓ |
| Upgrade Path | __/10 | __/10 | __ ↑↓ |
| Dev Environment | __/10 | __/10 | __ ↑↓ |
| Community | __/10 | __/10 | __ ↑↓ |
| DX Measurement | __/10 | __/10 | __ ↑↓ |
+--------------------------------------------------------------------+
| TTHW | __ min | __ min | __ ↑↓ |
| Competitive Rank | [Champion/Competitive/Needs Work/Red Flag] |
| Magical Moment | [designed/missing] via [delivery vehicle] |
| Product Type | [type] |
| Mode | [EXPANSION/POLISH/TRIAGE] |
| Overall DX | __/10 | __/10 | __ ↑↓ |
+====================================================================+
| DX PRINCIPLE COVERAGE |
| Zero Friction | [covered/gap] |
| Learn by Doing | [covered/gap] |
| Fight Uncertainty | [covered/gap] |
| Opinionated + Escape Hatches | [covered/gap] |
| Code in Context | [covered/gap] |
| Magical Moments | [covered/gap] |
+====================================================================+
```
If all passes 8+: "DX plan is solid. Developers will have a good experience."
If any below 6: Flag as critical DX debt with specific impact on adoption.
If TTHW > 10 min: Flag as blocking issue.
### DX Implementation Checklist
```
DX IMPLEMENTATION CHECKLIST
============================
[ ] Time to hello world < [target from 0C]
[ ] Installation is one command
[ ] First run produces meaningful output
[ ] Magical moment delivered via [vehicle from 0D]
[ ] Every error message has: problem + cause + fix + docs link
[ ] API/CLI naming is guessable without docs
[ ] Every parameter has a sensible default
[ ] Docs have copy-paste examples that actually work
[ ] Examples show real use cases, not just hello world
[ ] Upgrade path documented with migration guide
[ ] Breaking changes have deprecation warnings + codemods
[ ] TypeScript types included (if applicable)
[ ] Works in CI/CD without special configuration
[ ] Free tier available, no credit card required
[ ] Changelog exists and is maintained
[ ] Search works in documentation
[ ] Community channel exists and is monitored
```
{{TASKS_SECTION_EMIT:devex-review}}
### Unresolved Decisions
If any AskUserQuestion goes unanswered, note here. Never silently default.
{{REVIEW_DASHBOARD}}
{{PLAN_FILE_REVIEW_REPORT}}
{{LEARNINGS_LOG}}
{{GBRAIN_SAVE_RESULTS}}
{{BRAIN_WRITE_BACK}}
{{BRAIN_CACHE_REFRESH}}
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, recommend next reviews:
**Recommend /plan-eng-review if eng review is not skipped globally** — DX issues often
have architectural implications. If this DX review found API design problems, error
handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
**Suggest /plan-design-review if user-facing UI exists** — DX review focuses on
developer-facing surfaces; design review covers end-user-facing UI.
**Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
be [target from 0C]. Did reality match? Run /devex-review on the live product to find
out. This is where the competitive benchmark pays off: you have a concrete target to
measure against.
Use AskUserQuestion with applicable options:
- **A)** Run /plan-eng-review next (required gate)
- **B)** Run /plan-design-review (only if UI scope detected)
- **C)** Ready to implement, run /devex-review after shipping
- **D)** Skip, I'll handle next steps manually
## Mode Quick Reference
```
| DX EXPANSION | DX POLISH | DX TRIAGE
Scope | Push UP (opt-in) | Maintain | Critical only
Posture | Enthusiastic | Rigorous | Surgical
Competitive | Full benchmark | Full benchmark | Skip
Magical | Full design | Verify exists | Skip
Journey | All stages + | All stages | Install + Hello
| best-in-class | | World only
Passes | All 8, expanded | All 8, standard | Pass 1 + 3 only
Outside voice| Recommended | Recommended | Skip
```
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
* One sentence max per option.
* After each pass, pause and wait for feedback before moving on.
* Rate before and after each pass for scannability.

View File

@ -163,7 +163,7 @@ describe('SKILL.md size budget regression (gate, free)', () => {
// because prose moved into sections/*.md. The union size is guarded instead
// by the sectioned ship invariant in parity-harness.ts (minBytes on the
// skeleton+sections union), so exempt the skeleton from the body-strip floor.
const SECTIONS_EXTRACTED = new Set<string>(['ship', 'plan-ceo-review', 'office-hours', 'plan-eng-review', 'plan-design-review']);
const SECTIONS_EXTRACTED = new Set<string>(['ship', 'plan-ceo-review', 'office-hours', 'plan-eng-review', 'plan-design-review', 'plan-devex-review']);
const undershoots: Array<{
skill: string; beforeBytes: number; afterBytes: number; ratio: number;