gstack

History

Garry Tan 8ee16b867b feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 ) * feat: restore mode-posture energy to expansion + forcing + builder output Rewrites Writing Style rule 2-4 examples in scripts/resolvers/preamble.ts to cover three framing families (pain reduction, upside/delight, forcing pressure) instead of diagnostic-pain only. Adds inline exemplars to plan-ceo-review (0D-prelude shared between SCOPE + SELECTIVE EXPANSION) and office-hours (Q3 forcing exemplar with career/day/weekend domain gating, builder operating principles wild exemplar). V1 shipped rule 2-4 examples that all pointed to diagnostic-pain framing ("3-second spinner", "double-click button"). Models follow concrete examples over abstract taxonomies, so any skill with a non-diagnostic mode posture (expansion, forcing, delight) got flattened at runtime even when the template itself said "dream big" or "direct to the point of discomfort." This change targets the actual lever: swap the single diagnostic example for three paired framings, one per posture family. Preserves V1 clarity gains — rules 2, 3, 4 principles unchanged, only examples expanded. Terse mode (EXPLAIN_LEVEL: terse) still skips the block entirely. * chore: regenerate SKILL.md after preamble + template changes Mechanical cascade from `bun run gen:skill-docs --host all` after the Writing Style rule 2-4 example rewrite and the plan-ceo-review / office-hours template exemplar additions. No hand edits — every change flows from the prior commit's templates. * test: add gate-tier mode-posture regression tests Three gate-tier E2E tests detect when preamble / template changes flatten the distinctive posture of /plan-ceo-review SCOPE EXPANSION or /office-hours (startup Q3, builder mode). The V1 regression that this PR fixes shipped without anyone catching it at ship time — this is the ongoing signal so the same thing doesn't happen again. Pieces: - `judgePosture(mode, text)` in `test/helpers/llm-judge.ts`. Sonnet judge with mode-specific dual-axis rubric (expansion: surface_framing + decision_preservation; forcing: stacking_preserved + domain_matched_consequence; builder: unexpected_combinations + excitement_over_optimization). Pass threshold 4/5 on both axes. - Three fixtures in `test/fixtures/mode-posture/` — deterministic input for expansion proposal generation, Q3 forcing question, and builder adjacent-unlock riffing. - `plan-ceo-review-expansion-energy` case appended to `test/skill-e2e-plan.test.ts`. Generator: Opus (skill default). Judge: Sonnet. - New `test/skill-e2e-office-hours.test.ts` with `office-hours-forcing-energy` + `office-hours-builder-wildness` cases. Generator: Sonnet. Judge: Sonnet. - Touchfile registration in `test/helpers/touchfiles.ts` — all three as `gate` tier in `E2E_TIERS`, triggered by changes to `scripts/resolvers/preamble.ts`, the relevant skill template, the judge helper, or any mode-posture fixture. Cost: ~$0.50-$1.50 per triggered PR. Sonnet judge is cheap; Opus generator for the plan-ceo-review case dominates. Known V1.1 tradeoff: judges test prose markers more than deep behavior. V1.2 candidate is a cross-provider (Codex) adversarial judge on the same output to decouple house-style bias. * test: update golden ship baselines + touchfile count for mode-posture entries Mechanical test updates after the mode-posture work: - Golden ship SKILL.md baselines (claude + codex + factory hosts) regenerate with the rewritten Writing Style rule 2-4 examples from preamble.ts. - Touchfile selection test expects 6 matches for a plan-ceo-review/ change (was 5) because E2E_TOUCHFILES now includes plan-ceo-review-expansion-energy. * chore: bump version and changelog (v1.1.2.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-19 05:44:39 +08:00
..
fixtures	feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )	2026-04-19 05:44:39 +08:00
helpers	feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )	2026-04-19 05:44:39 +08:00
analytics.test.ts	feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )	2026-03-18 23:57:59 -05:00
audit-compliance.test.ts	fix: security audit round 2 (v0.13.4.0) (#640 )	2026-03-29 22:46:33 -06:00
builder-profile.test.ts	feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937 )	2026-04-08 22:21:28 -10:00
codex-e2e.test.ts	feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425 )	2026-03-23 23:05:22 -07:00
codex-hardening.test.ts	codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )	2026-04-18 12:30:54 +08:00
diff-scope.test.ts	feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )	2026-03-30 22:07:50 -06:00
explain-level-config.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
gemini-e2e.test.ts	feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )	2026-04-16 10:41:38 -07:00
gen-skill-docs.test.ts	codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )	2026-04-18 12:30:54 +08:00
global-discover.test.ts	fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817 )	2026-04-05 02:02:06 -07:00
gstack-developer-profile.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
gstack-question-log.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
gstack-question-preference.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
hook-scripts.test.ts	feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )	2026-03-18 23:57:59 -05:00
host-config.test.ts	community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )	2026-04-17 00:45:13 -07:00
jargon-list.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
learnings-injection.test.ts	fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )	2026-04-06 00:47:04 -07:00
learnings.test.ts	feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622 )	2026-03-29 17:02:01 -06:00
openclaw-native-skills.test.ts	community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )	2026-04-17 00:45:13 -07:00
plan-tune.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
readme-throughput.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
relink.test.ts	fix: headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) (#1025 )	2026-04-16 15:39:44 -07:00
review-log.test.ts	fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552 )	2026-03-26 23:21:27 -06:00
setup-codesign.test.ts	codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )	2026-04-18 12:30:54 +08:00
ship-version-sync.test.ts	fix(ship): detect + repair VERSION/package.json drift in Step 12 (v1.1.1.0) (#1063 )	2026-04-18 23:58:59 +08:00
skill-e2e-autoplan-dual-voice.test.ts	codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )	2026-04-18 12:30:54 +08:00
skill-e2e-bws.test.ts	fix: cookie picker auth token leak (v0.15.17.0) (#904 )	2026-04-08 10:10:13 -07:00
skill-e2e-cso.test.ts	feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384 )	2026-03-23 06:57:22 -07:00
skill-e2e-deploy.test.ts	feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518 )	2026-03-26 11:08:31 -07:00
skill-e2e-design.test.ts	feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )	2026-03-23 10:17:33 -07:00
skill-e2e-learnings.test.ts	feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )	2026-03-31 23:08:22 -06:00
skill-e2e-office-hours.test.ts	feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )	2026-04-19 05:44:39 +08:00
skill-e2e-plan-tune.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
skill-e2e-plan.test.ts	feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )	2026-04-19 05:44:39 +08:00
skill-e2e-qa-bugs.test.ts	feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )	2026-03-23 10:17:33 -07:00
skill-e2e-qa-workflow.test.ts	feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )	2026-03-23 10:17:33 -07:00
skill-e2e-review-army.test.ts	feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )	2026-03-30 22:07:50 -06:00
skill-e2e-review.test.ts	feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )	2026-04-16 10:41:38 -07:00
skill-e2e-session-intelligence.test.ts	feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733 )	2026-04-01 00:50:42 -06:00
skill-e2e-sidebar.test.ts	feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793 )	2026-04-04 15:32:20 -07:00
skill-e2e-workflow.test.ts	refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )	2026-04-07 00:23:36 -07:00
skill-e2e.test.ts	feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )	2026-03-31 23:08:22 -06:00
skill-llm-eval.test.ts	feat: voice directive for all skills (v0.12.3.0) (#520 )	2026-03-26 17:31:53 -06:00
skill-parser.test.ts	feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )	2026-03-13 21:08:12 -07:00
skill-routing-e2e.test.ts	feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )	2026-04-16 10:41:38 -07:00
skill-validation.test.ts	feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030 )	2026-04-16 23:14:03 -07:00
team-mode.test.ts	feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )	2026-04-16 10:41:38 -07:00
telemetry.test.ts	feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641 )	2026-03-29 21:43:36 -06:00
timeline.test.ts	feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733 )	2026-04-01 00:50:42 -06:00
touchfiles.test.ts	feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )	2026-04-19 05:44:39 +08:00
uninstall.test.ts	feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561 )	2026-03-27 00:44:37 -06:00
upgrade-migration-v1.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
v0-dormancy.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00
worktree.test.ts	feat: content security — 4-layer prompt injection defense for pair-agent (#815 )	2026-04-06 14:41:06 -07:00
writing-style-resolver.test.ts	feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )	2026-04-18 15:05:42 +08:00