feat: await support in browse js/eval + contributor mode v2 (#104)

* feat: support await in $B js and eval commands

Auto-wrap await expressions in async IIFE context so
$B js "await fetch(...)" works without SyntaxError.

- hasAwait() strips comments before detection
- js: expression wrapping (async()=>(expr))()
- eval: smart wrapping — single-line=expression, multi-line=block
- 6 new unit tests covering async, false-positive, and return semantics

* feat: redesign contributor mode — periodic reflection with 0-10 rating

Replace passive "report when things break" with active reflection:
- Rate gstack experience 0-10 at workflow step boundaries
- Historical calibration example (await bug) anchors the reporting bar
- "What would make this a 10" field focuses on actionable improvements
- Removed category lists in favor of judgment-based assessment

* test: add deterministic contributor mode preamble validation

40 new skill-validation tests (4 checks × 10 skills) verify:
- 0-10 rating scale present
- Calibration example present
- "What would make this a 10" field present
- Periodic reflection (not per-command)

Update existing E2E contributor eval for new report format.

* chore: bump version and changelog (v0.4.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve contributor mode + qa-quick E2E reliability

Contributor mode:
- Add "do not truncate" directive to template — agent was stopping
  after "My rating" without completing Steps/Raw output/What would
  make this a 10 sections
- Restore assertions for Steps to reproduce and Date footer

QA quick:
- Make test server URL prominent: top of prompt, explicit "already
  running" and "do NOT discover ports" instructions
- Bump session timeout 180s→240s and test timeout 240s→300s
- Set B= at top of prompt (was buried in prose)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use flexible assertions for contributor mode E2E

Agent writes thorough reports with creative section names
("Repro Steps" vs "Steps to reproduce"). Match intent not formatting:
- /repro|steps to reproduce/ for reproduction steps
- /date.*2026/ for date footer presence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add E2E eval failure blame protocol

"Not related to our changes" is an extraordinary claim that requires
extraordinary proof. When evals fail during /ship:

1. Run the same eval on main — prove it fails there too
2. If it passes on main, it IS your change — trace the blame
3. If you can't verify, say "unverified" not "pre-existing"

Added to CLAUDE.md and as a comment in skill-e2e.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update CONTRIBUTING.md and BROWSER.md for v0.4.2

CONTRIBUTING.md: update contributor mode description — now describes
periodic 0-10 reflection loop instead of passive friction detection.

BROWSER.md: add js/eval async documentation — await expressions are
auto-wrapped in async context, single-line eval returns values directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: restore v0.4.2 changelog entries lost during cherry-pick conflict

The base branch detection entries from main were dropped when resolving
the CHANGELOG conflict — should have merged both sets, not replaced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan 2026-03-16 11:28:58 -05:00 committed by GitHub
parent 1e06b6a5c6
commit 78e519e3b7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
19 changed files with 329 additions and 107 deletions

View File

@ -127,6 +127,18 @@ The `console`, `network`, and `dialog` commands read from the in-memory buffers,
Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken. Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.
### JavaScript execution (`js` and `eval`)
`js` runs a single expression, `eval` runs a JS file. Both support `await` — expressions containing `await` are automatically wrapped in an async context:
```bash
$B js "await fetch('/api/data').then(r => r.json())" # works
$B js "document.title" # also works (no wrapping needed)
$B eval my-script.js # file with await works too
```
For `eval` files, single-line files return the expression value directly. Multi-line files need explicit `return` when using `await`. Comments containing "await" don't trigger wrapping.
### Multi-workspace support ### Multi-workspace support
Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`). Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`).

View File

@ -2,6 +2,8 @@
## 0.4.2 — 2026-03-16 ## 0.4.2 — 2026-03-16
- **`$B js "await fetch(...)"` now just works.** Any `await` expression in `$B js` or `$B eval` is automatically wrapped in an async context. No more `SyntaxError: await is only valid in async functions`. Single-line eval files return values directly; multi-line files use explicit `return`.
- **Contributor mode now reflects, not just reacts.** Instead of only filing reports when something breaks, contributor mode now prompts periodic reflection: "Rate your gstack experience 0-10. Not a 10? Think about why." Catches quality-of-life issues and friction that passive detection misses. Reports now include a 0-10 rating and "What would make this a 10" to focus on actionable improvements.
- **Skills now respect your branch target.** `/ship`, `/review`, `/qa`, and `/plan-ceo-review` detect which branch your PR actually targets instead of assuming `main`. Stacked branches, Conductor workspaces targeting feature branches, and repos using `master` all just work now. - **Skills now respect your branch target.** `/ship`, `/review`, `/qa`, and `/plan-ceo-review` detect which branch your PR actually targets instead of assuming `main`. Stacked branches, Conductor workspaces targeting feature branches, and repos using `master` all just work now.
- **`/retro` works on any default branch.** Repos using `master`, `develop`, or other default branch names are detected automatically — no more empty retros because the branch name was wrong. - **`/retro` works on any default branch.** Repos using `master`, `develop`, or other default branch names are detected automatically — no more empty retros because the branch name was wrong.
- **New `{{BASE_BRANCH_DETECT}}` placeholder** for skill authors — drop it into any template and get 3-step branch detection (PR base → repo default → fallback) for free. - **New `{{BASE_BRANCH_DETECT}}` placeholder** for skill authors — drop it into any template and get 3-step branch detection (PR base → repo default → fallback) for free.
@ -9,6 +11,10 @@
### For contributors ### For contributors
- Added `hasAwait()` helper with comment-stripping to avoid false positives on `// await` in eval files.
- Smart eval wrapping: single-line → expression `(...)`, multi-line → block `{...}` with explicit `return`.
- 6 new async wrapping unit tests, 40 new contributor mode preamble validation tests.
- Calibration example framed as historical ("used to fail") to avoid implying a live bug post-fix.
- Added "Writing SKILL templates" section to CLAUDE.md — rules for natural language over bash-isms, dynamic branch detection, self-contained code blocks. - Added "Writing SKILL templates" section to CLAUDE.md — rules for natural language over bash-isms, dynamic branch detection, self-contained code blocks.
- Hardcoded-main regression test scans all `.tmpl` files for git commands with hardcoded `main`. - Hardcoded-main regression test scans all `.tmpl` files for git commands with hardcoded `main`.
- QA template cleaned up: removed `REPORT_DIR` shell variable, simplified port detection to prose. - QA template cleaned up: removed `REPORT_DIR` shell variable, simplified port detection to prose.

View File

@ -118,6 +118,21 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
- No jargon: say "every question now tells you which project and branch you're in" not - No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver." "AskUserQuestion format standardized across skill templates via preamble resolver."
## E2E eval failure blame protocol
When an E2E eval fails during `/ship` or any other workflow, **never claim "not
related to our changes" without proving it.** These systems have invisible couplings —
a preamble text change affects agent behavior, a new helper changes timing, a
regenerated SKILL.md shifts prompt context.
**Required before attributing a failure to "pre-existing":**
1. Run the same eval on main (or base branch) and show it fails there too
2. If it passes on main but fails on the branch — it IS your change. Trace the blame.
3. If you can't run on main, say "unverified — may or may not be related" and flag it
as a risk in the PR body
"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.
## Deploying to the active skill ## Deploying to the active skill
The active skill lives at `~/.claude/skills/gstack/`. After making changes: The active skill lives at `~/.claude/skills/gstack/`. After making changes:

View File

@ -22,9 +22,11 @@ bin/dev-teardown # deactivate — back to your global install
## Contributor mode ## Contributor mode
Contributor mode is for people who want to fix gstack when it annoys them. Enable it Contributor mode turns gstack into a self-improving tool. Enable it and Claude Code
and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you will periodically reflect on its gstack experience — rating it 0-10 at the end of
work — what you were doing, what went wrong, repro steps, raw output. each major workflow step. When something isn't a 10, it thinks about why and files
a report to `~/.gstack/contributor-logs/` with what happened, repro steps, and what
would make it better.
```bash ```bash
~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true ~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true
@ -36,7 +38,7 @@ the issue, fix it, and open a PR.
### The contributor workflow ### The contributor workflow
1. **Hit friction while using gstack** — contributor mode logs it automatically 1. **Use gstack normally** — contributor mode reflects and logs issues automatically
2. **Check your logs:** `ls ~/.gstack/contributor-logs/` 2. **Check your logs:** `ls ~/.gstack/contributor-logs/`
3. **Fork and clone gstack** (if you haven't already) 3. **Fork and clone gstack** (if you haven't already)
4. **Symlink your fork into the project where you hit the bug:** 4. **Symlink your fork into the project where you hit the bug:**

View File

@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
# gstack browse: QA Testing & Dogfooding # gstack browse: QA Testing & Dogfooding

View File

@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
# browse: QA Testing & Dogfooding # browse: QA Testing & Dogfooding

View File

@ -11,6 +11,12 @@ import type { Page } from 'playwright';
import * as fs from 'fs'; import * as fs from 'fs';
import * as path from 'path'; import * as path from 'path';
/** Detect await keyword, ignoring comments. Accepted risk: await in string literals triggers wrapping (harmless). */
function hasAwait(code: string): boolean {
const stripped = code.replace(/\/\/.*$/gm, '').replace(/\/\*[\s\S]*?\*\//g, '');
return /\bawait\b/.test(stripped);
}
// Security: Path validation to prevent path traversal attacks // Security: Path validation to prevent path traversal attacks
const SAFE_DIRECTORIES = ['/tmp', process.cwd()]; const SAFE_DIRECTORIES = ['/tmp', process.cwd()];
@ -118,7 +124,8 @@ export async function handleReadCommand(
case 'js': { case 'js': {
const expr = args[0]; const expr = args[0];
if (!expr) throw new Error('Usage: browse js <expression>'); if (!expr) throw new Error('Usage: browse js <expression>');
const result = await page.evaluate(expr); const wrapped = hasAwait(expr) ? `(async()=>(${expr}))()` : expr;
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
} }
@ -128,6 +135,13 @@ export async function handleReadCommand(
validateReadPath(filePath); validateReadPath(filePath);
if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`); if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
const code = fs.readFileSync(filePath, 'utf-8'); const code = fs.readFileSync(filePath, 'utf-8');
if (hasAwait(code)) {
const trimmed = code.trim();
const isSingleExpr = trimmed.split('\n').length === 1;
const wrapped = isSingleExpr ? `(async()=>(${trimmed}))()` : `(async()=>{\n${code}\n})()`;
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}
const result = await page.evaluate(code); const result = await page.evaluate(code);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
} }

View File

@ -144,6 +144,60 @@ describe('Inspection', () => {
expect(obj.b).toBe(2); expect(obj.b).toBe(2);
}); });
test('js supports await expressions', async () => {
const result = await handleReadCommand('js', ['await Promise.resolve(42)'], bm);
expect(result).toBe('42');
});
test('js does not false-positive on await substring', async () => {
const result = await handleReadCommand('js', ['(() => { const awaitable = 5; return awaitable })()'], bm);
expect(result).toBe('5');
});
test('eval supports await in single-line file', async () => {
const tmp = '/tmp/eval-await-test.js';
fs.writeFileSync(tmp, 'await Promise.resolve("hello from eval")');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('hello from eval');
} finally {
fs.unlinkSync(tmp);
}
});
test('eval does not wrap when await is only in a comment', async () => {
const tmp = '/tmp/eval-comment-test.js';
fs.writeFileSync(tmp, '// no need to await this\ndocument.title');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('Test Page - Basic');
} finally {
fs.unlinkSync(tmp);
}
});
test('eval multi-line with await and explicit return', async () => {
const tmp = '/tmp/eval-multiline-await.js';
fs.writeFileSync(tmp, 'const data = await Promise.resolve("multi");\nreturn data;');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('multi');
} finally {
fs.unlinkSync(tmp);
}
});
test('eval multi-line with await but no return gives empty string', async () => {
const tmp = '/tmp/eval-multiline-no-return.js';
fs.writeFileSync(tmp, 'const data = await Promise.resolve("lost");\ndata;');
try {
const result = await handleReadCommand('eval', [tmp], bm);
expect(result).toBe('');
} finally {
fs.unlinkSync(tmp);
}
});
test('css returns computed property', async () => { test('css returns computed property', async () => {
const result = await handleReadCommand('css', ['h1', 'color'], bm); const result = await handleReadCommand('css', ['h1', 'color'], bm);
// Navy color // Navy color

View File

@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Step 0: Detect base branch ## Step 0: Detect base branch

View File

@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
# Plan Review Mode # Plan Review Mode

View File

@ -43,12 +43,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -57,20 +60,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
# /qa-only: Report-Only QA Testing # /qa-only: Report-Only QA Testing

View File

@ -48,12 +48,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -62,20 +65,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Step 0: Detect base branch ## Step 0: Detect base branch

View File

@ -43,12 +43,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -57,20 +60,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Detect default branch ## Detect default branch

View File

@ -44,12 +44,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -58,20 +61,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Step 0: Detect base branch ## Step 0: Detect base branch

View File

@ -123,12 +123,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If \`_CONTRIB\` is \`true\`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If \`_CONTRIB\` is \`true\`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write \`~/.gstack/contributor-logs/{slug}.md\` with this structure: **Calibration this is the bar:** For example, \`$B js "await fetch(...)"\` used to fail with \`SyntaxError: await is only valid in async functions\` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write \`~/.gstack/contributor-logs/{slug}.md\` with **all sections below** (do not truncate — include every section through the Date/Version footer):
\`\`\` \`\`\`
# {Title} # {Title}
@ -137,20 +140,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) \`\`\`
{paste the actual error or unexpected output here}
\`\`\`
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
\`\`\` \`\`\`
Then run: \`mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md\` Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-js-no-await\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"`;
Slug: lowercase, hyphens, max 60 chars (e.g. \`browse-snapshot-ref-gap\`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"`;
} }
function generateBrowseSetup(): string { function generateBrowseSetup(): string {

View File

@ -41,12 +41,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -55,20 +58,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
# Setup Browser Cookies # Setup Browser Cookies

View File

@ -43,12 +43,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli
## Contributor Mode ## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened." If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. **At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure: **Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
``` ```
# {Title} # {Title}
@ -57,20 +60,23 @@ Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting} **What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened} **What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker} **My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce ## Steps to reproduce
1. {step} 1. {step}
## Raw output ## Raw output
(wrap any error messages or unexpected output in a markdown code block) ```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} **Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
``` ```
Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md` Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Step 0: Detect base branch ## Step 0: Detect base branch

View File

@ -13,6 +13,11 @@ import * as os from 'os';
const ROOT = path.resolve(import.meta.dir, '..'); const ROOT = path.resolve(import.meta.dir, '..');
// Skip unless EVALS=1. Session runner strips CLAUDE* env vars to avoid nested session issues. // Skip unless EVALS=1. Session runner strips CLAUDE* env vars to avoid nested session issues.
//
// BLAME PROTOCOL: When an eval fails, do NOT claim "pre-existing" or "not related
// to our changes" without proof. Run the same eval on main to verify. These tests
// have invisible couplings — preamble text, SKILL.md content, and timing all affect
// agent behavior. See CLAUDE.md "E2E eval failure blame protocol" for details.
const evalsEnabled = !!process.env.EVALS; const evalsEnabled = !!process.env.EVALS;
const describeE2E = evalsEnabled ? describe : describe.skip; const describeE2E = evalsEnabled ? describe : describe.skip;
@ -322,10 +327,16 @@ File a contributor report about this issue. Then tell me what you filed.`,
const logFiles = fs.readdirSync(logsDir).filter(f => f.endsWith('.md')); const logFiles = fs.readdirSync(logsDir).filter(f => f.endsWith('.md'));
expect(logFiles.length).toBeGreaterThan(0); expect(logFiles.length).toBeGreaterThan(0);
// Verify new reflection-based format
const logContent = fs.readFileSync(path.join(logsDir, logFiles[0]), 'utf-8'); const logContent = fs.readFileSync(path.join(logsDir, logFiles[0]), 'utf-8');
expect(logContent).toContain('Hey gstack team'); expect(logContent).toContain('Hey gstack team');
expect(logContent).toContain('What I was trying to do'); expect(logContent).toContain('What I was trying to do');
expect(logContent).toContain('What happened instead'); expect(logContent).toContain('What happened instead');
expect(logContent).toMatch(/rating/i);
// Verify report has repro steps (agent may use "Steps to reproduce", "Repro Steps", etc.)
expect(logContent).toMatch(/repro|steps to reproduce|how to reproduce/i);
// Verify report has date/version footer (agent may format differently)
expect(logContent).toMatch(/date.*2026|2026.*date/i);
// Clean up // Clean up
try { fs.rmSync(contribDir, { recursive: true, force: true }); } catch {} try { fs.rmSync(contribDir, { recursive: true, force: true }); } catch {}
@ -424,16 +435,20 @@ describeE2E('QA skill E2E', () => {
test('/qa quick completes without browse errors', async () => { test('/qa quick completes without browse errors', async () => {
const result = await runSkillTest({ const result = await runSkillTest({
prompt: `You have a browse binary at ${browseBin}. Assign it to B variable like: B="${browseBin}" prompt: `B="${browseBin}"
The test server is already running at: ${testServer.url}
Target page: ${testServer.url}/basic.html
Read the file qa/SKILL.md for the QA workflow instructions. Read the file qa/SKILL.md for the QA workflow instructions.
Run a Quick-depth QA test on ${testServer.url}/basic.html Run a Quick-depth QA test on ${testServer.url}/basic.html
Do NOT use AskUserQuestion run Quick tier directly. Do NOT use AskUserQuestion run Quick tier directly.
Do NOT try to start a server or discover ports the URL above is ready.
Write your report to ${qaDir}/qa-reports/qa-report.md`, Write your report to ${qaDir}/qa-reports/qa-report.md`,
workingDirectory: qaDir, workingDirectory: qaDir,
maxTurns: 35, maxTurns: 35,
timeout: 180_000, timeout: 240_000,
testName: 'qa-quick', testName: 'qa-quick',
runId, runId,
}); });
@ -448,7 +463,7 @@ Write your report to ${qaDir}/qa-reports/qa-report.md`,
} }
// Accept error_max_turns — the agent doing thorough QA work is not a failure // Accept error_max_turns — the agent doing thorough QA work is not a failure
expect(['success', 'error_max_turns']).toContain(result.exitReason); expect(['success', 'error_max_turns']).toContain(result.exitReason);
}, 240_000); }, 300_000);
}); });
// --- B5: Review skill E2E --- // --- B5: Review skill E2E ---

View File

@ -496,6 +496,44 @@ describe('v0.4.1 preamble features', () => {
} }
}); });
// --- Contributor mode preamble structure validation ---
describe('Contributor mode preamble structure', () => {
const skillsWithPreamble = [
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
'qa-only/SKILL.md',
'setup-browser-cookies/SKILL.md',
'ship/SKILL.md', 'review/SKILL.md',
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
'retro/SKILL.md',
];
for (const skill of skillsWithPreamble) {
test(`${skill} has 0-10 rating in contributor mode`, () => {
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
expect(content).toContain('0 to 10');
expect(content).toContain('My rating');
});
test(`${skill} has calibration example`, () => {
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
expect(content).toContain('Calibration');
expect(content).toContain('the bar');
});
test(`${skill} has "what would make this a 10" field`, () => {
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
expect(content).toContain('What would make this a 10');
});
test(`${skill} uses periodic reflection (not per-command)`, () => {
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
expect(content).toContain('workflow step');
expect(content).not.toContain('After you use gstack-provided CLIs');
});
}
});
describe('Enum & Value Completeness in review checklist', () => { describe('Enum & Value Completeness in review checklist', () => {
const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8'); const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8');