mirror of https://github.com/garrytan/gstack.git
fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions
Three journey E2E tests (ideation, ship, debug) were failing because Claude answered directly instead of invoking the Skill tool. Root cause: skill descriptions in system-reminder are too weak to override Claude's default behavior for tasks it can handle natively. Fix has two parts: 1. CLAUDE.md routing rules in test workdir — Claude weighs project-level instructions higher than skill description metadata 2. "Proactively invoke" (not "suggest") in office-hours, investigate, ship descriptions — reinforces the routing signal 10/10 journey tests now pass (was 7/10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
0784264aa0
commit
a2ee09519c
51
SKILL.md
51
SKILL.md
|
|
@ -267,28 +267,37 @@ Then write a `## GSTACK REVIEW REPORT` section to the end of the plan file:
|
||||||
file you are allowed to edit in plan mode. The plan file review report is part of the
|
file you are allowed to edit in plan mode. The plan file review report is part of the
|
||||||
plan's living status.
|
plan's living status.
|
||||||
|
|
||||||
If `PROACTIVE` is `false`: do NOT proactively suggest other gstack skills during this session.
|
If `PROACTIVE` is `false`: do NOT proactively invoke or suggest other gstack skills during
|
||||||
Only run skills the user explicitly invokes. This preference persists across sessions via
|
this session. Only run skills the user explicitly invokes. This preference persists across
|
||||||
`gstack-config`.
|
sessions via `gstack-config`.
|
||||||
|
|
||||||
If `PROACTIVE` is `true` (default): suggest adjacent gstack skills when relevant to the
|
If `PROACTIVE` is `true` (default): **invoke the Skill tool** when the user's request
|
||||||
user's workflow stage:
|
matches a skill's purpose. Do NOT answer directly when a skill exists for the task.
|
||||||
- Brainstorming → /office-hours
|
Use the Skill tool to invoke it. The skill has specialized workflows, checklists, and
|
||||||
- Strategy → /plan-ceo-review
|
quality gates that produce better results than answering inline.
|
||||||
- Architecture → /plan-eng-review
|
|
||||||
- Design → /plan-design-review or /design-consultation
|
**Routing rules — when you see these patterns, INVOKE the skill via the Skill tool:**
|
||||||
- Auto-review → /autoplan
|
- User describes a new idea, asks "is this worth building", wants to brainstorm → invoke `/office-hours`
|
||||||
- Debugging → /investigate
|
- User asks about strategy, scope, ambition, "think bigger" → invoke `/plan-ceo-review`
|
||||||
- QA → /qa
|
- User asks to review architecture, lock in the plan → invoke `/plan-eng-review`
|
||||||
- Code review → /review
|
- User asks about design system, brand, visual identity → invoke `/design-consultation`
|
||||||
- Visual audit → /design-review
|
- User asks to review design of a plan → invoke `/plan-design-review`
|
||||||
- Shipping → /ship
|
- User wants all reviews done automatically → invoke `/autoplan`
|
||||||
- Docs → /document-release
|
- User reports a bug, error, broken behavior, asks "why is this broken" → invoke `/investigate`
|
||||||
- Retro → /retro
|
- User asks to test the site, find bugs, QA → invoke `/qa`
|
||||||
- Second opinion → /codex
|
- User asks to review code, check the diff, pre-landing review → invoke `/review`
|
||||||
- Prod safety → /careful or /guard
|
- User asks about visual polish, design audit of a live site → invoke `/design-review`
|
||||||
- Scoped edits → /freeze or /unfreeze
|
- User asks to ship, deploy, push, create a PR → invoke `/ship`
|
||||||
- Upgrades → /gstack-upgrade
|
- User asks to update docs after shipping → invoke `/document-release`
|
||||||
|
- User asks for a weekly retro, what did we ship → invoke `/retro`
|
||||||
|
- User asks for a second opinion, codex review → invoke `/codex`
|
||||||
|
- User asks for safety mode, careful mode → invoke `/careful` or `/guard`
|
||||||
|
- User asks to restrict edits to a directory → invoke `/freeze` or `/unfreeze`
|
||||||
|
- User asks to upgrade gstack → invoke `/gstack-upgrade`
|
||||||
|
|
||||||
|
**Do NOT answer the user's question directly when a matching skill exists.** The skill
|
||||||
|
provides a structured, multi-step workflow that is always better than an ad-hoc answer.
|
||||||
|
Invoke the skill first. If no skill matches, answer directly as usual.
|
||||||
|
|
||||||
If the user opts out of suggestions, run `gstack-config set proactive false`.
|
If the user opts out of suggestions, run `gstack-config set proactive false`.
|
||||||
If they opt back in, run `gstack-config set proactive true`.
|
If they opt back in, run `gstack-config set proactive true`.
|
||||||
|
|
|
||||||
|
|
@ -16,28 +16,37 @@ allowed-tools:
|
||||||
|
|
||||||
{{PREAMBLE}}
|
{{PREAMBLE}}
|
||||||
|
|
||||||
If `PROACTIVE` is `false`: do NOT proactively suggest other gstack skills during this session.
|
If `PROACTIVE` is `false`: do NOT proactively invoke or suggest other gstack skills during
|
||||||
Only run skills the user explicitly invokes. This preference persists across sessions via
|
this session. Only run skills the user explicitly invokes. This preference persists across
|
||||||
`gstack-config`.
|
sessions via `gstack-config`.
|
||||||
|
|
||||||
If `PROACTIVE` is `true` (default): suggest adjacent gstack skills when relevant to the
|
If `PROACTIVE` is `true` (default): **invoke the Skill tool** when the user's request
|
||||||
user's workflow stage:
|
matches a skill's purpose. Do NOT answer directly when a skill exists for the task.
|
||||||
- Brainstorming → /office-hours
|
Use the Skill tool to invoke it. The skill has specialized workflows, checklists, and
|
||||||
- Strategy → /plan-ceo-review
|
quality gates that produce better results than answering inline.
|
||||||
- Architecture → /plan-eng-review
|
|
||||||
- Design → /plan-design-review or /design-consultation
|
**Routing rules — when you see these patterns, INVOKE the skill via the Skill tool:**
|
||||||
- Auto-review → /autoplan
|
- User describes a new idea, asks "is this worth building", wants to brainstorm → invoke `/office-hours`
|
||||||
- Debugging → /investigate
|
- User asks about strategy, scope, ambition, "think bigger" → invoke `/plan-ceo-review`
|
||||||
- QA → /qa
|
- User asks to review architecture, lock in the plan → invoke `/plan-eng-review`
|
||||||
- Code review → /review
|
- User asks about design system, brand, visual identity → invoke `/design-consultation`
|
||||||
- Visual audit → /design-review
|
- User asks to review design of a plan → invoke `/plan-design-review`
|
||||||
- Shipping → /ship
|
- User wants all reviews done automatically → invoke `/autoplan`
|
||||||
- Docs → /document-release
|
- User reports a bug, error, broken behavior, asks "why is this broken" → invoke `/investigate`
|
||||||
- Retro → /retro
|
- User asks to test the site, find bugs, QA → invoke `/qa`
|
||||||
- Second opinion → /codex
|
- User asks to review code, check the diff, pre-landing review → invoke `/review`
|
||||||
- Prod safety → /careful or /guard
|
- User asks about visual polish, design audit of a live site → invoke `/design-review`
|
||||||
- Scoped edits → /freeze or /unfreeze
|
- User asks to ship, deploy, push, create a PR → invoke `/ship`
|
||||||
- Upgrades → /gstack-upgrade
|
- User asks to update docs after shipping → invoke `/document-release`
|
||||||
|
- User asks for a weekly retro, what did we ship → invoke `/retro`
|
||||||
|
- User asks for a second opinion, codex review → invoke `/codex`
|
||||||
|
- User asks for safety mode, careful mode → invoke `/careful` or `/guard`
|
||||||
|
- User asks to restrict edits to a directory → invoke `/freeze` or `/unfreeze`
|
||||||
|
- User asks to upgrade gstack → invoke `/gstack-upgrade`
|
||||||
|
|
||||||
|
**Do NOT answer the user's question directly when a matching skill exists.** The skill
|
||||||
|
provides a structured, multi-step workflow that is always better than an ad-hoc answer.
|
||||||
|
Invoke the skill first. If no skill matches, answer directly as usual.
|
||||||
|
|
||||||
If the user opts out of suggestions, run `gstack-config set proactive false`.
|
If the user opts out of suggestions, run `gstack-config set proactive false`.
|
||||||
If they opt back in, run `gstack-config set proactive true`.
|
If they opt back in, run `gstack-config set proactive true`.
|
||||||
|
|
|
||||||
|
|
@ -7,8 +7,9 @@ description: |
|
||||||
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
|
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
|
||||||
Use when asked to "debug this", "fix this bug", "why is this broken",
|
Use when asked to "debug this", "fix this bug", "why is this broken",
|
||||||
"investigate this error", or "root cause analysis".
|
"investigate this error", or "root cause analysis".
|
||||||
Proactively suggest when the user reports errors, unexpected behavior, or
|
Proactively invoke this skill (do NOT debug directly) when the user reports
|
||||||
is troubleshooting why something stopped working.
|
errors, 500 errors, stack traces, unexpected behavior, "it was working
|
||||||
|
yesterday", or is troubleshooting why something stopped working.
|
||||||
allowed-tools:
|
allowed-tools:
|
||||||
- Bash
|
- Bash
|
||||||
- Read
|
- Read
|
||||||
|
|
|
||||||
|
|
@ -7,8 +7,9 @@ description: |
|
||||||
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
|
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
|
||||||
Use when asked to "debug this", "fix this bug", "why is this broken",
|
Use when asked to "debug this", "fix this bug", "why is this broken",
|
||||||
"investigate this error", or "root cause analysis".
|
"investigate this error", or "root cause analysis".
|
||||||
Proactively suggest when the user reports errors, unexpected behavior, or
|
Proactively invoke this skill (do NOT debug directly) when the user reports
|
||||||
is troubleshooting why something stopped working.
|
errors, 500 errors, stack traces, unexpected behavior, "it was working
|
||||||
|
yesterday", or is troubleshooting why something stopped working.
|
||||||
allowed-tools:
|
allowed-tools:
|
||||||
- Bash
|
- Bash
|
||||||
- Read
|
- Read
|
||||||
|
|
|
||||||
|
|
@ -9,8 +9,10 @@ description: |
|
||||||
hackathons, learning, and open source. Saves a design doc.
|
hackathons, learning, and open source. Saves a design doc.
|
||||||
Use when asked to "brainstorm this", "I have an idea", "help me think through
|
Use when asked to "brainstorm this", "I have an idea", "help me think through
|
||||||
this", "office hours", or "is this worth building".
|
this", "office hours", or "is this worth building".
|
||||||
Proactively suggest when the user describes a new product idea or is exploring
|
Proactively invoke this skill (do NOT answer directly) when the user describes
|
||||||
whether something is worth building — before any code is written.
|
a new product idea, asks whether something is worth building, wants to think
|
||||||
|
through design decisions for something that doesn't exist yet, or is exploring
|
||||||
|
a concept before any code is written.
|
||||||
Use before /plan-ceo-review or /plan-eng-review.
|
Use before /plan-ceo-review or /plan-eng-review.
|
||||||
allowed-tools:
|
allowed-tools:
|
||||||
- Bash
|
- Bash
|
||||||
|
|
|
||||||
|
|
@ -9,8 +9,10 @@ description: |
|
||||||
hackathons, learning, and open source. Saves a design doc.
|
hackathons, learning, and open source. Saves a design doc.
|
||||||
Use when asked to "brainstorm this", "I have an idea", "help me think through
|
Use when asked to "brainstorm this", "I have an idea", "help me think through
|
||||||
this", "office hours", or "is this worth building".
|
this", "office hours", or "is this worth building".
|
||||||
Proactively suggest when the user describes a new product idea or is exploring
|
Proactively invoke this skill (do NOT answer directly) when the user describes
|
||||||
whether something is worth building — before any code is written.
|
a new product idea, asks whether something is worth building, wants to think
|
||||||
|
through design decisions for something that doesn't exist yet, or is exploring
|
||||||
|
a concept before any code is written.
|
||||||
Use before /plan-ceo-review or /plan-eng-review.
|
Use before /plan-ceo-review or /plan-eng-review.
|
||||||
allowed-tools:
|
allowed-tools:
|
||||||
- Bash
|
- Bash
|
||||||
|
|
|
||||||
|
|
@ -3,8 +3,11 @@ name: ship
|
||||||
preamble-tier: 4
|
preamble-tier: 4
|
||||||
version: 1.0.0
|
version: 1.0.0
|
||||||
description: |
|
description: |
|
||||||
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy", "push to main", "create a PR", or "merge and push".
|
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION,
|
||||||
Proactively suggest when the user says code is ready or asks about deploying.
|
update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy",
|
||||||
|
"push to main", "create a PR", "merge and push", or "get it deployed".
|
||||||
|
Proactively invoke this skill (do NOT push/PR directly) when the user says code
|
||||||
|
is ready, asks about deploying, wants to push code up, or asks to create a PR.
|
||||||
allowed-tools:
|
allowed-tools:
|
||||||
- Bash
|
- Bash
|
||||||
- Read
|
- Read
|
||||||
|
|
|
||||||
|
|
@ -3,8 +3,11 @@ name: ship
|
||||||
preamble-tier: 4
|
preamble-tier: 4
|
||||||
version: 1.0.0
|
version: 1.0.0
|
||||||
description: |
|
description: |
|
||||||
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy", "push to main", "create a PR", or "merge and push".
|
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION,
|
||||||
Proactively suggest when the user says code is ready or asks about deploying.
|
update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy",
|
||||||
|
"push to main", "create a PR", "merge and push", or "get it deployed".
|
||||||
|
Proactively invoke this skill (do NOT push/PR directly) when the user says code
|
||||||
|
is ready, asks about deploying, wants to push code up, or asks to create a PR.
|
||||||
allowed-tools:
|
allowed-tools:
|
||||||
- Bash
|
- Bash
|
||||||
- Read
|
- Read
|
||||||
|
|
|
||||||
|
|
@ -93,11 +93,30 @@ function installSkills(tmpDir: string) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Copy CLAUDE.md so Claude has project context for skill routing.
|
// Write a CLAUDE.md with explicit routing instructions.
|
||||||
const claudeMdSrc = path.join(ROOT, 'CLAUDE.md');
|
// The skill descriptions in system-reminder aren't strong enough to override
|
||||||
if (fs.existsSync(claudeMdSrc)) {
|
// Claude's default behavior of answering directly. A CLAUDE.md instruction
|
||||||
fs.copyFileSync(claudeMdSrc, path.join(tmpDir, 'CLAUDE.md'));
|
// puts routing rules in project context which Claude weighs more heavily.
|
||||||
}
|
fs.writeFileSync(path.join(tmpDir, 'CLAUDE.md'), `# Project Instructions
|
||||||
|
|
||||||
|
## Skill routing
|
||||||
|
|
||||||
|
When the user's request matches an available skill, ALWAYS invoke it using the Skill
|
||||||
|
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
|
||||||
|
The skill has specialized workflows that produce better results than ad-hoc answers.
|
||||||
|
|
||||||
|
Key routing rules:
|
||||||
|
- Product ideas, "is this worth building", brainstorming → invoke office-hours
|
||||||
|
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
|
||||||
|
- Ship, deploy, push, create PR → invoke ship
|
||||||
|
- QA, test the site, find bugs → invoke qa
|
||||||
|
- Code review, check my diff → invoke review
|
||||||
|
- Update docs after shipping → invoke document-release
|
||||||
|
- Weekly retro → invoke retro
|
||||||
|
- Design system, brand → invoke design-consultation
|
||||||
|
- Visual audit, design polish → invoke design-review
|
||||||
|
- Architecture review → invoke plan-eng-review
|
||||||
|
`);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Init a git repo with config */
|
/** Init a git repo with config */
|
||||||
|
|
|
||||||
|
|
@ -1409,13 +1409,13 @@ describe('Skill trigger phrases', () => {
|
||||||
];
|
];
|
||||||
|
|
||||||
for (const skill of SKILLS_REQUIRING_PROACTIVE) {
|
for (const skill of SKILLS_REQUIRING_PROACTIVE) {
|
||||||
test(`${skill}/SKILL.md has "Proactively suggest" phrase`, () => {
|
test(`${skill}/SKILL.md has proactive routing phrase`, () => {
|
||||||
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
||||||
if (!fs.existsSync(skillPath)) return;
|
if (!fs.existsSync(skillPath)) return;
|
||||||
const content = fs.readFileSync(skillPath, 'utf-8');
|
const content = fs.readFileSync(skillPath, 'utf-8');
|
||||||
const frontmatterEnd = content.indexOf('---', 4);
|
const frontmatterEnd = content.indexOf('---', 4);
|
||||||
const frontmatter = content.slice(0, frontmatterEnd);
|
const frontmatter = content.slice(0, frontmatterEnd);
|
||||||
expect(frontmatter).toMatch(/Proactively suggest/i);
|
expect(frontmatter).toMatch(/Proactively (suggest|invoke)/i);
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue