diff --git a/docs/skills.md b/docs/skills.md index 3749fd89c..1ef0f6ae9 100644 --- a/docs/skills.md +++ b/docs/skills.md @@ -5,6 +5,7 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples. | Skill | Your specialist | What they do | |-------|----------------|--------------| | [`/office-hours`](#office-hours) | **YC Office Hours** | Start here. Six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives. Design doc feeds into every downstream skill. | +| [`/spec`](#spec) | **Spec Author** | Turn vague intent into a precise, executable spec in five phases. Backlog-ready output that downstream skills can pick up. Optional agent spawn at the end. | | [`/plan-ceo-review`](#plan-ceo-review) | **CEO / Founder** | Rethink the problem. Find the 10-star product hiding inside the request. Four modes: Expansion, Selective Expansion, Hold Scope, Reduction. | | [`/plan-eng-review`](#plan-eng-review) | **Eng Manager** | Lock in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open. | | [`/plan-design-review`](#plan-design-review) | **Senior Designer** | Interactive plan-mode design review. Rates each dimension 0-10, explains what a 10 looks like, fixes the plan. Works in plan mode. | diff --git a/spec/SKILL.md b/spec/SKILL.md index a125a8792..3e7187d18 100644 --- a/spec/SKILL.md +++ b/spec/SKILL.md @@ -335,7 +335,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC Net line closes the tradeoff. Per-skill instructions may add stricter rules. -12. **Non-ASCII characters — write directly, never \u-escape.** When any +### Handling 5+ options — split, never drop + +AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER +drop, merge, or silently defer one to fit. Pick a compliant shape: + +- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps, + layout variants). One call, 5th surfaced only if first 4 don't fit. +- **Split per-option** — for independent scope items (e.g. "ship E1..E6?"). + Fire N sequential calls, one per option. Default to this when unsure. + +Per-option call shape: `D.k` header (e.g. D3.1..D3.5), ELI10 per option, +Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are +decision actions), and 4 buckets: +**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss). + +After the chain, fire `D.final` to validate the assembled set (reprompt +dependency conflicts) and confirm shipping it. Use `D.revise-` to +revise one option without re-running the chain. + +For N>6, fire a `D.0` meta-AskUserQuestion first (proceed / narrow / batch). + +question_ids for split chains: `-split-` (kebab-case ASCII, +≤64 chars, `-2`/`-3` suffix on collision). The runtime checker +(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id, +so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred. + +**Full rule + worked examples + Hold/dependency semantics:** see +`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4. + +**Non-ASCII characters — write directly, never \u-escape.** When any string field (question, option label, option description) contains Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit the literal UTF-8 characters in the JSON string. **Never escape them @@ -368,6 +397,9 @@ Before calling AskUserQuestion, verify: - [ ] Net line closes the decision - [ ] You are calling the tool, not writing prose - [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped +- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any +- [ ] If you split, you checked dependencies between options before firing the chain +- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue) ## Artifacts Sync (skill start) @@ -1242,7 +1274,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC Net line closes the tradeoff. Per-skill instructions may add stricter rules. -12. **Non-ASCII characters — write directly, never \u-escape.** When any +### Handling 5+ options — split, never drop + +AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER +drop, merge, or silently defer one to fit. Pick a compliant shape: + +- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps, + layout variants). One call, 5th surfaced only if first 4 don't fit. +- **Split per-option** — for independent scope items (e.g. "ship E1..E6?"). + Fire N sequential calls, one per option. Default to this when unsure. + +Per-option call shape: `D.k` header (e.g. D3.1..D3.5), ELI10 per option, +Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are +decision actions), and 4 buckets: +**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss). + +After the chain, fire `D.final` to validate the assembled set (reprompt +dependency conflicts) and confirm shipping it. Use `D.revise-` to +revise one option without re-running the chain. + +For N>6, fire a `D.0` meta-AskUserQuestion first (proceed / narrow / batch). + +question_ids for split chains: `-split-` (kebab-case ASCII, +≤64 chars, `-2`/`-3` suffix on collision). The runtime checker +(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id, +so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred. + +**Full rule + worked examples + Hold/dependency semantics:** see +`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4. + +**Non-ASCII characters — write directly, never \u-escape.** When any string field (question, option label, option description) contains Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit the literal UTF-8 characters in the JSON string. **Never escape them @@ -1275,6 +1336,9 @@ Before calling AskUserQuestion, verify: - [ ] Net line closes the decision - [ ] You are calling the tool, not writing prose - [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped +- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any +- [ ] If you split, you checked dependencies between options before firing the chain +- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue) ## Artifacts Sync (skill start) diff --git a/test/fixtures/golden/claude-ship-SKILL.md b/test/fixtures/golden/claude-ship-SKILL.md index 21dbcb694..9611072f7 100644 --- a/test/fixtures/golden/claude-ship-SKILL.md +++ b/test/fixtures/golden/claude-ship-SKILL.md @@ -107,6 +107,19 @@ _CHECKPOINT_MODE=$(~/.claude/skills/gstack/bin/gstack-config get checkpoint_mode _CHECKPOINT_PUSH=$(~/.claude/skills/gstack/bin/gstack-config get checkpoint_push 2>/dev/null || echo "false") echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE" echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH" +# Plan-mode hint for skills like /spec that branch behavior on plan-mode state. +# Claude Code exposes plan mode via system reminders; we detect best-effort +# from CLAUDE_PLAN_FILE (set by the harness when plan mode is active) and +# fall back to "inactive". Codex hosts and Claude execution mode both end up +# inactive, which is the safe default (defaults to file+execute pipeline). +if [ -n "${CLAUDE_PLAN_FILE:-}${GSTACK_PLAN_MODE_FORCE:-}" ]; then + export GSTACK_PLAN_MODE="active" +elif [ "${GSTACK_PLAN_MODE:-}" = "active" ]; then + export GSTACK_PLAN_MODE="active" +else + export GSTACK_PLAN_MODE="inactive" +fi +echo "GSTACK_PLAN_MODE: $GSTACK_PLAN_MODE" [ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true ``` @@ -238,6 +251,7 @@ Key routing rules: - Ship/deploy/PR → invoke /ship or /land-and-deploy - Save progress → invoke /context-save - Resume context → invoke /context-restore +- Author a backlog-ready spec/issue → invoke /spec ``` Then commit the change: `git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"` @@ -2958,6 +2972,39 @@ you missed it.> +## Linked Spec +-$$, the spawned worktree IS where /ship runs). + SPEC_FILE=$(grep -l "^spec_branch: $CURRENT_BRANCH$" "$SPEC_ARCHIVES"/*.md 2>/dev/null | head -1) + [ -z "$SPEC_FILE" ] && exit # no spec; omit this section entirely + SPEC_ISSUE=$(grep "^spec_issue_number:" "$SPEC_FILE" | cut -d' ' -f2) + [ -z "$SPEC_ISSUE" ] && exit # spec archive exists but no issue number; omit + + # CONDITIONAL Closes #N (codex F4): only add when Plan Completion above is "complete". + # If the plan completion gate from Step 8 reports any deferred or failed items, emit: + # "Linked to #$SPEC_ISSUE (partial delivery — NOT auto-closing; close manually after follow-up)" + # If Plan Completion is fully complete, emit: + # "Closes #$SPEC_ISSUE" + # and include the Closes #N line in the PR body so GitHub auto-closes on merge.> + + + + This PR delivers the spec at . + Spec filed: > + + (partial delivery — not auto-closing). + Deferred items: . + Close # manually after follow-up lands.> + + + ## Verification Results diff --git a/test/fixtures/golden/codex-ship-SKILL.md b/test/fixtures/golden/codex-ship-SKILL.md index 8b736a3bc..8eaaee369 100644 --- a/test/fixtures/golden/codex-ship-SKILL.md +++ b/test/fixtures/golden/codex-ship-SKILL.md @@ -93,6 +93,19 @@ _CHECKPOINT_MODE=$($GSTACK_BIN/gstack-config get checkpoint_mode 2>/dev/null || _CHECKPOINT_PUSH=$($GSTACK_BIN/gstack-config get checkpoint_push 2>/dev/null || echo "false") echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE" echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH" +# Plan-mode hint for skills like /spec that branch behavior on plan-mode state. +# Claude Code exposes plan mode via system reminders; we detect best-effort +# from CLAUDE_PLAN_FILE (set by the harness when plan mode is active) and +# fall back to "inactive". Codex hosts and Claude execution mode both end up +# inactive, which is the safe default (defaults to file+execute pipeline). +if [ -n "${CLAUDE_PLAN_FILE:-}${GSTACK_PLAN_MODE_FORCE:-}" ]; then + export GSTACK_PLAN_MODE="active" +elif [ "${GSTACK_PLAN_MODE:-}" = "active" ]; then + export GSTACK_PLAN_MODE="active" +else + export GSTACK_PLAN_MODE="inactive" +fi +echo "GSTACK_PLAN_MODE: $GSTACK_PLAN_MODE" [ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true ``` @@ -224,6 +237,7 @@ Key routing rules: - Ship/deploy/PR → invoke /ship or /land-and-deploy - Save progress → invoke /context-save - Resume context → invoke /context-restore +- Author a backlog-ready spec/issue → invoke /spec ``` Then commit the change: `git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"` @@ -2568,6 +2582,39 @@ you missed it.> +## Linked Spec +-$$, the spawned worktree IS where /ship runs). + SPEC_FILE=$(grep -l "^spec_branch: $CURRENT_BRANCH$" "$SPEC_ARCHIVES"/*.md 2>/dev/null | head -1) + [ -z "$SPEC_FILE" ] && exit # no spec; omit this section entirely + SPEC_ISSUE=$(grep "^spec_issue_number:" "$SPEC_FILE" | cut -d' ' -f2) + [ -z "$SPEC_ISSUE" ] && exit # spec archive exists but no issue number; omit + + # CONDITIONAL Closes #N (codex F4): only add when Plan Completion above is "complete". + # If the plan completion gate from Step 8 reports any deferred or failed items, emit: + # "Linked to #$SPEC_ISSUE (partial delivery — NOT auto-closing; close manually after follow-up)" + # If Plan Completion is fully complete, emit: + # "Closes #$SPEC_ISSUE" + # and include the Closes #N line in the PR body so GitHub auto-closes on merge.> + + + + This PR delivers the spec at . + Spec filed: > + + (partial delivery — not auto-closing). + Deferred items: . + Close # manually after follow-up lands.> + + + ## Verification Results diff --git a/test/fixtures/golden/factory-ship-SKILL.md b/test/fixtures/golden/factory-ship-SKILL.md index 41ad93177..343768d89 100644 --- a/test/fixtures/golden/factory-ship-SKILL.md +++ b/test/fixtures/golden/factory-ship-SKILL.md @@ -95,6 +95,19 @@ _CHECKPOINT_MODE=$($GSTACK_BIN/gstack-config get checkpoint_mode 2>/dev/null || _CHECKPOINT_PUSH=$($GSTACK_BIN/gstack-config get checkpoint_push 2>/dev/null || echo "false") echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE" echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH" +# Plan-mode hint for skills like /spec that branch behavior on plan-mode state. +# Claude Code exposes plan mode via system reminders; we detect best-effort +# from CLAUDE_PLAN_FILE (set by the harness when plan mode is active) and +# fall back to "inactive". Codex hosts and Claude execution mode both end up +# inactive, which is the safe default (defaults to file+execute pipeline). +if [ -n "${CLAUDE_PLAN_FILE:-}${GSTACK_PLAN_MODE_FORCE:-}" ]; then + export GSTACK_PLAN_MODE="active" +elif [ "${GSTACK_PLAN_MODE:-}" = "active" ]; then + export GSTACK_PLAN_MODE="active" +else + export GSTACK_PLAN_MODE="inactive" +fi +echo "GSTACK_PLAN_MODE: $GSTACK_PLAN_MODE" [ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true ``` @@ -226,6 +239,7 @@ Key routing rules: - Ship/deploy/PR → invoke /ship or /land-and-deploy - Save progress → invoke /context-save - Resume context → invoke /context-restore +- Author a backlog-ready spec/issue → invoke /spec ``` Then commit the change: `git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"` @@ -2946,6 +2960,39 @@ you missed it.> +## Linked Spec +-$$, the spawned worktree IS where /ship runs). + SPEC_FILE=$(grep -l "^spec_branch: $CURRENT_BRANCH$" "$SPEC_ARCHIVES"/*.md 2>/dev/null | head -1) + [ -z "$SPEC_FILE" ] && exit # no spec; omit this section entirely + SPEC_ISSUE=$(grep "^spec_issue_number:" "$SPEC_FILE" | cut -d' ' -f2) + [ -z "$SPEC_ISSUE" ] && exit # spec archive exists but no issue number; omit + + # CONDITIONAL Closes #N (codex F4): only add when Plan Completion above is "complete". + # If the plan completion gate from Step 8 reports any deferred or failed items, emit: + # "Linked to #$SPEC_ISSUE (partial delivery — NOT auto-closing; close manually after follow-up)" + # If Plan Completion is fully complete, emit: + # "Closes #$SPEC_ISSUE" + # and include the Closes #N line in the PR body so GitHub auto-closes on merge.> + + + + This PR delivers the spec at . + Spec filed: > + + (partial delivery — not auto-closing). + Deferred items: . + Close # manually after follow-up lands.> + + + ## Verification Results diff --git a/test/fixtures/parity-baseline-v1.47.0.0.json b/test/fixtures/parity-baseline-v1.47.0.0.json new file mode 100644 index 000000000..aad9c538e --- /dev/null +++ b/test/fixtures/parity-baseline-v1.47.0.0.json @@ -0,0 +1,633 @@ +{ + "tag": "v1.47.0.0", + "capturedAt": "2026-05-27T05:50:57.656Z", + "capturedFromCommit": "e08e5fa8", + "capturedFromBranch": "garrytan/askuserquestion-split-on-overflow", + "totalSkills": 52, + "totalCorpusBytes": 3090887, + "estTotalCatalogTokens": 4116, + "topHeaviest": [ + { + "skill": "ship", + "skillMdBytes": 166782, + "skillMdLines": 3099, + "estTokens": 41696, + "tmplBytes": 50495, + "descriptionLen": 291, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "plan-ceo-review", + "skillMdBytes": 132488, + "skillMdLines": 2197, + "estTokens": 33122, + "tmplBytes": 63393, + "descriptionLen": 794, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "office-hours", + "skillMdBytes": 112842, + "skillMdLines": 2066, + "estTokens": 28211, + "tmplBytes": 55466, + "descriptionLen": 860, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "plan-design-review", + "skillMdBytes": 107855, + "skillMdLines": 1928, + "estTokens": 26964, + "tmplBytes": 28624, + "descriptionLen": 218, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "plan-devex-review", + "skillMdBytes": 106167, + "skillMdLines": 2119, + "estTokens": 26542, + "tmplBytes": 35680, + "descriptionLen": 250, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "plan-eng-review", + "skillMdBytes": 103009, + "skillMdLines": 1762, + "estTokens": 25752, + "tmplBytes": 26234, + "descriptionLen": 231, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "spec", + "skillMdBytes": 102629, + "skillMdLines": 2141, + "estTokens": 25657, + "tmplBytes": 28429, + "descriptionLen": 282, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "design-review", + "skillMdBytes": 95654, + "skillMdLines": 1932, + "estTokens": 23914, + "tmplBytes": 11674, + "descriptionLen": 304, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "review", + "skillMdBytes": 94048, + "skillMdLines": 1762, + "estTokens": 23512, + "tmplBytes": 14099, + "descriptionLen": 205, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "land-and-deploy", + "skillMdBytes": 91886, + "skillMdLines": 1856, + "estTokens": 22972, + "tmplBytes": 48624, + "descriptionLen": 160, + "hasGateEval": true, + "hasPeriodicEval": false + } + ], + "skills": { + "autoplan": { + "skill": "autoplan", + "skillMdBytes": 90870, + "skillMdLines": 1784, + "estTokens": 22718, + "tmplBytes": 45271, + "descriptionLen": 366, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "benchmark": { + "skill": "benchmark", + "skillMdBytes": 33266, + "skillMdLines": 747, + "estTokens": 8317, + "tmplBytes": 9378, + "descriptionLen": 213, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "benchmark-models": { + "skill": "benchmark-models", + "skillMdBytes": 29333, + "skillMdLines": 622, + "estTokens": 7333, + "tmplBytes": 6631, + "descriptionLen": 217, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "browse": { + "skill": "browse", + "skillMdBytes": 48018, + "skillMdLines": 929, + "estTokens": 12005, + "tmplBytes": 10805, + "descriptionLen": 181, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "canary": { + "skill": "canary", + "skillMdBytes": 47105, + "skillMdLines": 990, + "estTokens": 11776, + "tmplBytes": 8033, + "descriptionLen": 180, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "careful": { + "skill": "careful", + "skillMdBytes": 2551, + "skillMdLines": 68, + "estTokens": 638, + "tmplBytes": 2435, + "descriptionLen": 315, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "codex": { + "skill": "codex", + "skillMdBytes": 79620, + "skillMdLines": 1519, + "estTokens": 19905, + "tmplBytes": 34143, + "descriptionLen": 187, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "context-restore": { + "skill": "context-restore", + "skillMdBytes": 41493, + "skillMdLines": 848, + "estTokens": 10373, + "tmplBytes": 5255, + "descriptionLen": 238, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "context-save": { + "skill": "context-save", + "skillMdBytes": 45690, + "skillMdLines": 966, + "estTokens": 11423, + "tmplBytes": 9293, + "descriptionLen": 168, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "cso": { + "skill": "cso", + "skillMdBytes": 77397, + "skillMdLines": 1451, + "estTokens": 19349, + "tmplBytes": 35158, + "descriptionLen": 196, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "design-consultation": { + "skill": "design-consultation", + "skillMdBytes": 79222, + "skillMdLines": 1561, + "estTokens": 19806, + "tmplBytes": 25899, + "descriptionLen": 888, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "design-html": { + "skill": "design-html", + "skillMdBytes": 66547, + "skillMdLines": 1449, + "estTokens": 16637, + "tmplBytes": 22567, + "descriptionLen": 233, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "design-review": { + "skill": "design-review", + "skillMdBytes": 95654, + "skillMdLines": 1932, + "estTokens": 23914, + "tmplBytes": 11674, + "descriptionLen": 304, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "design-shotgun": { + "skill": "design-shotgun", + "skillMdBytes": 62836, + "skillMdLines": 1311, + "estTokens": 15709, + "tmplBytes": 13331, + "descriptionLen": 786, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "devex-review": { + "skill": "devex-review", + "skillMdBytes": 64413, + "skillMdLines": 1233, + "estTokens": 16103, + "tmplBytes": 7984, + "descriptionLen": 201, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "document-generate": { + "skill": "document-generate", + "skillMdBytes": 52987, + "skillMdLines": 1176, + "estTokens": 13247, + "tmplBytes": 15093, + "descriptionLen": 334, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "document-release": { + "skill": "document-release", + "skillMdBytes": 58251, + "skillMdLines": 1235, + "estTokens": 14563, + "tmplBytes": 20362, + "descriptionLen": 192, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "freeze": { + "skill": "freeze", + "skillMdBytes": 3154, + "skillMdLines": 92, + "estTokens": 789, + "tmplBytes": 3038, + "descriptionLen": 503, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "gstack-upgrade": { + "skill": "gstack-upgrade", + "skillMdBytes": 10817, + "skillMdLines": 285, + "estTokens": 2704, + "tmplBytes": 10667, + "descriptionLen": 163, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "guard": { + "skill": "guard", + "skillMdBytes": 3297, + "skillMdLines": 91, + "estTokens": 824, + "tmplBytes": 3181, + "descriptionLen": 686, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "health": { + "skill": "health", + "skillMdBytes": 47916, + "skillMdLines": 1014, + "estTokens": 11979, + "tmplBytes": 11617, + "descriptionLen": 184, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "investigate": { + "skill": "investigate", + "skillMdBytes": 50409, + "skillMdLines": 1012, + "estTokens": 12602, + "tmplBytes": 11561, + "descriptionLen": 1379, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "ios-clean": { + "skill": "ios-clean", + "skillMdBytes": 41045, + "skillMdLines": 813, + "estTokens": 10261, + "tmplBytes": 3851, + "descriptionLen": 252, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "ios-design-review": { + "skill": "ios-design-review", + "skillMdBytes": 41631, + "skillMdLines": 815, + "estTokens": 10408, + "tmplBytes": 4417, + "descriptionLen": 209, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "ios-fix": { + "skill": "ios-fix", + "skillMdBytes": 40760, + "skillMdLines": 811, + "estTokens": 10190, + "tmplBytes": 3574, + "descriptionLen": 187, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "ios-qa": { + "skill": "ios-qa", + "skillMdBytes": 47271, + "skillMdLines": 931, + "estTokens": 11818, + "tmplBytes": 10090, + "descriptionLen": 223, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "ios-sync": { + "skill": "ios-sync", + "skillMdBytes": 40737, + "skillMdLines": 804, + "estTokens": 10184, + "tmplBytes": 3544, + "descriptionLen": 269, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "land-and-deploy": { + "skill": "land-and-deploy", + "skillMdBytes": 91886, + "skillMdLines": 1856, + "estTokens": 22972, + "tmplBytes": 48624, + "descriptionLen": 160, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "landing-report": { + "skill": "landing-report", + "skillMdBytes": 43985, + "skillMdLines": 874, + "estTokens": 10996, + "tmplBytes": 6806, + "descriptionLen": 195, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "learn": { + "skill": "learn", + "skillMdBytes": 41722, + "skillMdLines": 891, + "estTokens": 10431, + "tmplBytes": 5594, + "descriptionLen": 178, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "make-pdf": { + "skill": "make-pdf", + "skillMdBytes": 29450, + "skillMdLines": 663, + "estTokens": 7363, + "tmplBytes": 5106, + "descriptionLen": 177, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "office-hours": { + "skill": "office-hours", + "skillMdBytes": 112842, + "skillMdLines": 2066, + "estTokens": 28211, + "tmplBytes": 55466, + "descriptionLen": 860, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "open-gstack-browser": { + "skill": "open-gstack-browser", + "skillMdBytes": 46131, + "skillMdLines": 954, + "estTokens": 11533, + "tmplBytes": 7702, + "descriptionLen": 204, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "pair-agent": { + "skill": "pair-agent", + "skillMdBytes": 46939, + "skillMdLines": 1010, + "estTokens": 11735, + "tmplBytes": 8548, + "descriptionLen": 167, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "plan-ceo-review": { + "skill": "plan-ceo-review", + "skillMdBytes": 132488, + "skillMdLines": 2197, + "estTokens": 33122, + "tmplBytes": 63393, + "descriptionLen": 794, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-design-review": { + "skill": "plan-design-review", + "skillMdBytes": 107855, + "skillMdLines": 1928, + "estTokens": 26964, + "tmplBytes": 28624, + "descriptionLen": 218, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-devex-review": { + "skill": "plan-devex-review", + "skillMdBytes": 106167, + "skillMdLines": 2119, + "estTokens": 26542, + "tmplBytes": 35680, + "descriptionLen": 250, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-eng-review": { + "skill": "plan-eng-review", + "skillMdBytes": 103009, + "skillMdLines": 1762, + "estTokens": 25752, + "tmplBytes": 26234, + "descriptionLen": 231, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-tune": { + "skill": "plan-tune", + "skillMdBytes": 51717, + "skillMdLines": 1077, + "estTokens": 12929, + "tmplBytes": 15586, + "descriptionLen": 325, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "qa": { + "skill": "qa", + "skillMdBytes": 73863, + "skillMdLines": 1622, + "estTokens": 18466, + "tmplBytes": 12701, + "descriptionLen": 218, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "qa-only": { + "skill": "qa-only", + "skillMdBytes": 56421, + "skillMdLines": 1194, + "estTokens": 14105, + "tmplBytes": 3851, + "descriptionLen": 165, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "retro": { + "skill": "retro", + "skillMdBytes": 82889, + "skillMdLines": 1750, + "estTokens": 20722, + "tmplBytes": 42427, + "descriptionLen": 648, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "review": { + "skill": "review", + "skillMdBytes": 94048, + "skillMdLines": 1762, + "estTokens": 23512, + "tmplBytes": 14099, + "descriptionLen": 205, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "scrape": { + "skill": "scrape", + "skillMdBytes": 43641, + "skillMdLines": 887, + "estTokens": 10910, + "tmplBytes": 5220, + "descriptionLen": 167, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "setup-browser-cookies": { + "skill": "setup-browser-cookies", + "skillMdBytes": 26618, + "skillMdLines": 594, + "estTokens": 6655, + "tmplBytes": 2724, + "descriptionLen": 222, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "setup-deploy": { + "skill": "setup-deploy", + "skillMdBytes": 43927, + "skillMdLines": 919, + "estTokens": 10982, + "tmplBytes": 7780, + "descriptionLen": 197, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "setup-gbrain": { + "skill": "setup-gbrain", + "skillMdBytes": 78394, + "skillMdLines": 1704, + "estTokens": 19599, + "tmplBytes": 42245, + "descriptionLen": 323, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "ship": { + "skill": "ship", + "skillMdBytes": 166782, + "skillMdLines": 3099, + "estTokens": 41696, + "tmplBytes": 50495, + "descriptionLen": 291, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "skillify": { + "skill": "skillify", + "skillMdBytes": 53534, + "skillMdLines": 1168, + "estTokens": 13384, + "tmplBytes": 15107, + "descriptionLen": 233, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "spec": { + "skill": "spec", + "skillMdBytes": 102629, + "skillMdLines": 2141, + "estTokens": 25657, + "tmplBytes": 28429, + "descriptionLen": 282, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "sync-gbrain": { + "skill": "sync-gbrain", + "skillMdBytes": 50156, + "skillMdLines": 1028, + "estTokens": 12539, + "tmplBytes": 13996, + "descriptionLen": 299, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "unfreeze": { + "skill": "unfreeze", + "skillMdBytes": 1504, + "skillMdLines": 49, + "estTokens": 376, + "tmplBytes": 1386, + "descriptionLen": 199, + "hasGateEval": false, + "hasPeriodicEval": false + } + } +} diff --git a/test/skill-size-budget.test.ts b/test/skill-size-budget.test.ts index a22550d3f..f86f8c5f4 100644 --- a/test/skill-size-budget.test.ts +++ b/test/skill-size-budget.test.ts @@ -1,15 +1,20 @@ /** * Per-skill SKILL.md size budget regression (v1.46.0.0 T5). * - * Asserts that no skill's generated SKILL.md grew beyond the v1.44.1 + * Asserts that no skill's generated SKILL.md grew beyond the v1.47.0.0 * baseline. Catches preamble/resolver changes that bloat skills back to * the pre-compression size. Free — pure file IO + JSON diff. * + * Baseline rebased v1.44.1 → v1.47.0.0 in the AskUserQuestion split-rule + * PR after main merged GSTACK_PLAN_MODE + /spec, pushing the v1.44.1 + * anchor past the 5% ratchet. Historical v1.44.1.json and v1.46.0.0.json + * are retained in test/fixtures/ for reference. + * * Why a separate test from skill-budget-regression.test.ts: that one * compares LIVE eval runs (tool calls, turns, cost); this one compares * static SKILL.md sizes. Both gate-tier. * - * The baseline lives at test/fixtures/parity-baseline-v1.44.1.json, + * The baseline lives at test/fixtures/parity-baseline-v1.47.0.0.json, * captured by scripts/capture-baseline.ts before any Phase A work landed. * * Override: @@ -30,7 +35,7 @@ import { captureBaseline, type ParityBaseline } from './helpers/capture-parity-b import { logBudgetOverride } from './helpers/budget-override'; const REPO_ROOT = path.resolve(import.meta.dir, '..'); -const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.44.1.json'); +const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.47.0.0.json'); // Default per-skill ratio is 1.05 (5% growth tolerance). T4 catalog trim // MOVES text from frontmatter (always-loaded catalog) to a body section @@ -49,11 +54,11 @@ interface Regression { } describe('SKILL.md size budget regression (gate, free)', () => { - test('parity-baseline-v1.44.1.json exists', () => { + test('parity-baseline-v1.47.0.0.json exists', () => { expect(fs.existsSync(BASELINE_PATH)).toBe(true); }); - test('no skill exceeds v1.44.1 baseline size × ratio', () => { + test('no skill exceeds v1.47.0.0 baseline size × ratio', () => { const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8')); const current = captureBaseline({ repoRoot: REPO_ROOT }); @@ -94,7 +99,7 @@ describe('SKILL.md size budget regression (gate, free)', () => { ` ${r.skill}: ${r.beforeBytes} → ${r.afterBytes} bytes (×${r.growth.toFixed(2)})`, ).join('\n'); throw new Error( - `${regressions.length} skill(s) regressed past v1.44.1 baseline × ${RATIO}:\n${msg}\n` + + `${regressions.length} skill(s) regressed past v1.47.0.0 baseline × ${RATIO}:\n${msg}\n` + `Override: set GSTACK_SIZE_BUDGET_OVERRIDE_REASON="why this is OK" to allow and audit-log.`, ); }); @@ -120,7 +125,7 @@ describe('SKILL.md size budget regression (gate, free)', () => { return; } throw new Error( - `Total corpus regressed past v1.44.1 baseline × ${RATIO}: ` + + `Total corpus regressed past v1.47.0.0 baseline × ${RATIO}: ` + `${baseline.totalCorpusBytes} → ${current.totalCorpusBytes} bytes (×${ratio.toFixed(3)}). ` + `Override: set GSTACK_SIZE_BUDGET_OVERRIDE_REASON to allow.`, ); @@ -130,13 +135,13 @@ describe('SKILL.md size budget regression (gate, free)', () => { * Gap E (v1.46.0.0): per-skill min-size floor. * * The existing skill-coverage-floor enforces body ≥ 200 bytes, which is - * a tiny noise floor. A skill that was 100 KB at v1.44.1 and shrinks to + * a tiny noise floor. A skill that was 100 KB at v1.47.0.0 and shrinks to * 250 bytes passes that check despite losing 99.75% of content. The * parity-suite content invariants cover this for 10 hand-picked skills * (cso, ship, plan-ceo, etc.); the remaining 41 skills had no per-skill * shrinkage floor. * - * Floor: 80% of the v1.44.1 baseline. v1.46 actual shrinkage is <1% per + * Floor: 80% of the v1.47.0.0 baseline. v1.46 actual shrinkage is <1% per * skill, so this is a comfortable ceiling that still catches accidental * mass deletion (e.g., a refactor that strips the body of a skill). * @@ -146,7 +151,7 @@ describe('SKILL.md size budget regression (gate, free)', () => { * skeletons. When that lands, add them to SECTIONS_EXTRACTED so the floor * relaxes for them. */ - test('no skill shrinks past 80% of v1.44.1 baseline (catches accidental body strip)', () => { + test('no skill shrinks past 80% of v1.47.0.0 baseline (catches accidental body strip)', () => { const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8')); const current = captureBaseline({ repoRoot: REPO_ROOT }); const MIN_RATIO = 0.80; // a skill at <80% of its v1.44 size signals mass-deletion @@ -187,7 +192,7 @@ describe('SKILL.md size budget regression (gate, free)', () => { ` ${u.skill}: ${u.beforeBytes} → ${u.afterBytes} bytes (×${u.ratio.toFixed(2)} — below ${MIN_RATIO} floor)`, ).join('\n'); throw new Error( - `${undershoots.length} skill(s) shrunk past v1.44.1 × ${MIN_RATIO} floor:\n${msg}\n` + + `${undershoots.length} skill(s) shrunk past v1.47.0.0 × ${MIN_RATIO} floor:\n${msg}\n` + `This usually signals accidental body strip (e.g., a resolver returning empty, a ` + `template losing a section). If the shrinkage is intentional (e.g., the skill moved ` + `to the sections/ pattern), add it to SECTIONS_EXTRACTED in this test. Override: ` +