mirror of https://github.com/garrytan/gstack.git
350 lines
15 KiB
Markdown
350 lines
15 KiB
Markdown
<!-- AUTO-GENERATED from tests.md.tmpl — do not edit directly -->
|
|
<!-- Regenerate: bun run gen:skill-docs -->
|
|
## Step 4: Test Framework Bootstrap
|
|
|
|
## Test Framework Bootstrap
|
|
|
|
**Detect existing test framework and project runtime:**
|
|
|
|
```bash
|
|
setopt +o nomatch 2>/dev/null || true # zsh compat
|
|
# Detect project runtime
|
|
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
|
[ -f package.json ] && echo "RUNTIME:node"
|
|
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
|
[ -f go.mod ] && echo "RUNTIME:go"
|
|
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
|
[ -f composer.json ] && echo "RUNTIME:php"
|
|
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
|
# Detect sub-frameworks
|
|
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
|
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
|
# Check for existing test infrastructure
|
|
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
|
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
|
# Check opt-out marker
|
|
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
|
```
|
|
|
|
**If test framework detected** (config files or test directories found):
|
|
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
|
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
|
Store conventions as prose context for use in Phase 8e.5 or Step 7. **Skip the rest of bootstrap.**
|
|
|
|
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
|
|
|
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
|
"I couldn't detect your project's language. What runtime are you using?"
|
|
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
|
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
|
|
|
**If runtime detected but no test framework — bootstrap:**
|
|
|
|
### B2. Research best practices
|
|
|
|
Use WebSearch to find current best practices for the detected runtime:
|
|
- `"[runtime] best test framework 2025 2026"`
|
|
- `"[framework A] vs [framework B] comparison"`
|
|
|
|
If WebSearch is unavailable, use this built-in knowledge table:
|
|
|
|
| Runtime | Primary recommendation | Alternative |
|
|
|---------|----------------------|-------------|
|
|
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
|
| Node.js | vitest + @testing-library | jest + @testing-library |
|
|
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
|
| Python | pytest + pytest-cov | unittest |
|
|
| Go | stdlib testing + testify | stdlib only |
|
|
| Rust | cargo test (built-in) + mockall | — |
|
|
| PHP | phpunit + mockery | pest |
|
|
| Elixir | ExUnit (built-in) + ex_machina | — |
|
|
|
|
### B3. Framework selection
|
|
|
|
Use AskUserQuestion:
|
|
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
|
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
|
B) [Alternative] — [rationale]. Includes: [packages]
|
|
C) Skip — don't set up testing right now
|
|
RECOMMENDATION: Choose A because [reason based on project context]"
|
|
|
|
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
|
|
|
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
|
|
|
### B4. Install and configure
|
|
|
|
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
|
2. Create minimal config file
|
|
3. Create directory structure (test/, spec/, etc.)
|
|
4. Create one example test matching the project's code to verify setup works
|
|
|
|
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
|
|
|
### B4.5. First real tests
|
|
|
|
Generate 3-5 real tests for existing code:
|
|
|
|
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
|
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
|
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
|
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
|
5. Generate at least 1 test, cap at 5.
|
|
|
|
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
|
|
|
### B5. Verify
|
|
|
|
```bash
|
|
# Run the full test suite to confirm everything works
|
|
{detected test command}
|
|
```
|
|
|
|
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
|
|
|
### B5.5. CI/CD pipeline
|
|
|
|
```bash
|
|
# Check CI provider
|
|
ls -d .github/ 2>/dev/null && echo "CI:github"
|
|
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
|
```
|
|
|
|
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
|
Create `.github/workflows/test.yml` with:
|
|
- `runs-on: ubuntu-latest`
|
|
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
|
- The same test command verified in B5
|
|
- Trigger: push + pull_request
|
|
|
|
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
|
|
|
### B6. Create TESTING.md
|
|
|
|
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
|
|
|
Write TESTING.md with:
|
|
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
|
- Framework name and version
|
|
- How to run tests (the verified command from B5)
|
|
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
|
- Conventions: file naming, assertion style, setup/teardown patterns
|
|
|
|
### B7. Update CLAUDE.md
|
|
|
|
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
|
|
|
Append a `## Testing` section:
|
|
- Run command and test directory
|
|
- Reference to TESTING.md
|
|
- Test expectations:
|
|
- 100% test coverage is the goal — tests make vibe coding safe
|
|
- When writing new functions, write a corresponding test
|
|
- When fixing a bug, write a regression test
|
|
- When adding error handling, write a test that triggers the error
|
|
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
|
- Never commit code that makes existing tests fail
|
|
|
|
### B8. Commit
|
|
|
|
```bash
|
|
git status --porcelain
|
|
```
|
|
|
|
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
|
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
|
|
|
---
|
|
|
|
---
|
|
|
|
## Step 5: Run tests (on merged code)
|
|
|
|
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
|
`db:test:prepare` internally, which loads the schema into the correct lane database.
|
|
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
|
|
|
|
Run both test suites in parallel:
|
|
|
|
```bash
|
|
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
|
|
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
|
|
wait
|
|
```
|
|
|
|
After both complete, read the output files and check pass/fail.
|
|
|
|
**If any test fails:** Do NOT immediately stop. Apply the Test Failure Ownership Triage:
|
|
|
|
## Test Failure Ownership Triage
|
|
|
|
When tests fail, do NOT immediately stop. First, determine ownership:
|
|
|
|
### Step T1: Classify each failure
|
|
|
|
For each failing test:
|
|
|
|
1. **Get the files changed on this branch:**
|
|
```bash
|
|
git diff origin/<base>...HEAD --name-only
|
|
```
|
|
|
|
2. **Classify the failure:**
|
|
- **In-branch** if: the failing test file itself was modified on this branch, OR the test output references code that was changed on this branch, OR you can trace the failure to a change in the branch diff.
|
|
- **Likely pre-existing** if: neither the test file nor the code it tests was modified on this branch, AND the failure is unrelated to any branch change you can identify.
|
|
- **When ambiguous, default to in-branch.** It is safer to stop the developer than to let a broken test ship. Only classify as pre-existing when you are confident.
|
|
|
|
This classification is heuristic — use your judgment reading the diff and the test output. You do not have a programmatic dependency graph.
|
|
|
|
### Step T2: Handle in-branch failures
|
|
|
|
**STOP.** These are your failures. Show them and do not proceed. The developer must fix their own broken tests before shipping.
|
|
|
|
### Step T3: Handle pre-existing failures
|
|
|
|
Check `REPO_MODE` from the preamble output.
|
|
|
|
**If REPO_MODE is `solo`:**
|
|
|
|
Use AskUserQuestion:
|
|
|
|
> These test failures appear pre-existing (not caused by your branch changes):
|
|
>
|
|
> [list each failure with file:line and brief error description]
|
|
>
|
|
> Since this is a solo repo, you're the only one who will fix these.
|
|
>
|
|
> RECOMMENDATION: Choose A — fix now while the context is fresh. Completeness: 9/10.
|
|
> A) Investigate and fix now (human: ~2-4h / CC: ~15min) — Completeness: 10/10
|
|
> B) Add as P0 TODO — fix after this branch lands — Completeness: 7/10
|
|
> C) Skip — I know about this, ship anyway — Completeness: 3/10
|
|
|
|
**If REPO_MODE is `collaborative` or `unknown`:**
|
|
|
|
Use AskUserQuestion:
|
|
|
|
> These test failures appear pre-existing (not caused by your branch changes):
|
|
>
|
|
> [list each failure with file:line and brief error description]
|
|
>
|
|
> This is a collaborative repo — these may be someone else's responsibility.
|
|
>
|
|
> RECOMMENDATION: Choose B — assign it to whoever broke it so the right person fixes it. Completeness: 9/10.
|
|
> A) Investigate and fix now anyway — Completeness: 10/10
|
|
> B) Blame + assign GitHub issue to the author — Completeness: 9/10
|
|
> C) Add as P0 TODO — Completeness: 7/10
|
|
> D) Skip — ship anyway — Completeness: 3/10
|
|
|
|
### Step T4: Execute the chosen action
|
|
|
|
**If "Investigate and fix now":**
|
|
- Switch to /investigate mindset: root cause first, then minimal fix.
|
|
- Fix the pre-existing failure.
|
|
- Commit the fix separately from the branch's changes: `git commit -m "fix: pre-existing test failure in <test-file>"`
|
|
- Continue with the workflow.
|
|
|
|
**If "Add as P0 TODO":**
|
|
- If `TODOS.md` exists, add the entry following the format in `review/TODOS-format.md` (or `.claude/skills/review/TODOS-format.md`).
|
|
- If `TODOS.md` does not exist, create it with the standard header and add the entry.
|
|
- Entry should include: title, the error output, which branch it was noticed on, and priority P0.
|
|
- Continue with the workflow — treat the pre-existing failure as non-blocking.
|
|
|
|
**If "Blame + assign GitHub issue" (collaborative only):**
|
|
- Find who likely broke it. Check BOTH the test file AND the production code it tests:
|
|
```bash
|
|
# Who last touched the failing test?
|
|
git log --format="%an (%ae)" -1 -- <failing-test-file>
|
|
# Who last touched the production code the test covers? (often the actual breaker)
|
|
git log --format="%an (%ae)" -1 -- <source-file-under-test>
|
|
```
|
|
If these are different people, prefer the production code author — they likely introduced the regression.
|
|
- Create an issue assigned to that person (use the platform detected in Step 0):
|
|
- **If GitHub:**
|
|
```bash
|
|
gh issue create \
|
|
--title "Pre-existing test failure: <test-name>" \
|
|
--body "Found failing on branch <current-branch>. Failure is pre-existing.\n\n**Error:**\n```\n<first 10 lines>\n```\n\n**Last modified by:** <author>\n**Noticed by:** gstack /ship on <date>" \
|
|
--assignee "<github-username>"
|
|
```
|
|
- **If GitLab:**
|
|
```bash
|
|
glab issue create \
|
|
-t "Pre-existing test failure: <test-name>" \
|
|
-d "Found failing on branch <current-branch>. Failure is pre-existing.\n\n**Error:**\n```\n<first 10 lines>\n```\n\n**Last modified by:** <author>\n**Noticed by:** gstack /ship on <date>" \
|
|
-a "<gitlab-username>"
|
|
```
|
|
- If neither CLI is available or `--assignee`/`-a` fails (user not in org, etc.), create the issue without assignee and note who should look at it in the body.
|
|
- Continue with the workflow.
|
|
|
|
**If "Skip":**
|
|
- Continue with the workflow.
|
|
- Note in output: "Pre-existing test failure skipped: <test-name>"
|
|
|
|
**After triage:** If any in-branch failures remain unfixed, **STOP**. Do not proceed. If all failures were pre-existing and handled (fixed, TODOed, assigned, or skipped), continue to Step 6.
|
|
|
|
**If all pass:** Continue silently — just note the counts briefly.
|
|
|
|
---
|
|
|
|
## Step 6: Eval Suites (conditional)
|
|
|
|
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
|
|
|
|
**1. Check if the diff touches prompt-related files:**
|
|
|
|
```bash
|
|
git diff origin/<base> --name-only
|
|
```
|
|
|
|
Match against these patterns (from CLAUDE.md):
|
|
- `app/services/*_prompt_builder.rb`
|
|
- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
|
|
- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
|
|
- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
|
|
- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
|
|
- `config/system_prompts/*.txt`
|
|
- `test/evals/**/*` (eval infrastructure changes affect all suites)
|
|
|
|
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 9.
|
|
|
|
**2. Identify affected eval suites:**
|
|
|
|
Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
|
|
|
|
```bash
|
|
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
|
|
```
|
|
|
|
Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
|
|
|
|
**Special cases:**
|
|
- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
|
|
- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
|
|
- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
|
|
|
|
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
|
|
|
|
`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
|
|
|
|
```bash
|
|
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
|
|
```
|
|
|
|
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
|
|
|
**4. Check results:**
|
|
|
|
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
|
- **If all pass:** Note pass counts and cost. Continue to Step 9.
|
|
|
|
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 19).
|
|
|
|
**Tier reference (for context — /ship always uses `full`):**
|
|
| Tier | When | Speed (cached) | Cost |
|
|
|------|------|----------------|------|
|
|
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
|
|
| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
|
|
| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
|
|
|
|
---
|