feat(plan-tune): explicit-consent surface + setup gate for question_tuning

Step 0 grows two implicit gates that run before user-intent routing:
- Consent gate: question_tuning=false + no marker → offer opt-in (contributor-specific copy variant)
- Setup gate: question_tuning=true + declared empty + no marker → run 5-Q wizard

Markers (~/.gstack/.question-tuning-prompted, ~/.gstack/.declared-setup-prompted)
ensure each user is asked at most once. The Enable+setup section split into
"Consent + opt-in" (with contributor framing) and standalone "5-Q setup"
reachable from both the consent flow and the setup gate.

Also aligns the calibration gate across three docs (V0 said 90+ days, TODOS
said 2+ weeks, binary uses 7 days). The fix distinguishes:
- Display gate (sample_size>=20, skills>=3, question_ids>=8, days_span>=7):
  for rendering inferred values in /plan-tune output
- Promotion gate (90+ days stable across 3+ skills): for shipping E1
  behavior-adapting defaults

TODOS.md E1 card updated to reference 90+ days, plus Codex's substrate risk
note: generated skill prose is agent-compliance-based, so E1 ships as
advisory annotations on AskUserQuestion recommendations, not silent
AUTO_DECIDE. Tests can verify templates contain right reads but can't
prove agents obey them.

Per /plan-eng-review + Codex outside-voice 2026-05-26.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Garry Tan 2026-05-26 22:58:05 -07:00
parent 22f8c7f4e1
commit 9cc211f66a
No known key found for this signature in database
GPG Key ID: C1F69E85C74EFE1D
3 changed files with 216 additions and 49 deletions

View File

@ -582,7 +582,24 @@ reads it yet.
**Effort:** L (human: ~1 week / CC: ~4h)
**Priority:** P0
**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
Distinct from the lighter-weight diversity-display gate
(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
AND days_span >= 7`) used in /plan-tune to render the inferred column —
display is a UI affordance, promotion to E1 needs a much higher bar
because behavioral adaptation is consequential and hard to revert. Prior
versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.
**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
skill prose is agent-compliance-based. Tests can verify templates contain the
right reads of `~/.gstack/developer-profile.json` and the right decision
points, but tests cannot prove agents obey them at runtime. E1 ships
adaptations as **advisory annotations on AskUserQuestion recommendations**
("Recommended via your profile: <choice>") until there's a hard runtime
execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
of E1; explicit per-question preferences remain the only AUTO_DECIDE
source.
### E3 — `/plan-tune narrative` + `/plan-tune vibe`

View File

@ -698,50 +698,77 @@ Canonical reference: `docs/designs/PLAN_TUNING_V0.md`.
## Step 0: Detect what the user wants
Read the user's message. Route based on plain-English intent, not keywords:
Read the user's message. Route based on plain-English intent, not keywords.
1. **First-time use** (config says `question_tuning` is not yet set to `true`) →
run `Enable + setup` below.
2. **"Show my profile" / "what do you know about me" / "show my vibe"** →
**Implicit gates run first** (before user-intent routing). These exist so first-time
users see the consent prompt and so explicit opt-ins eventually run the 5-Q setup.
Each gate is guarded by a marker file so the user is asked at most once.
1. **Consent gate.** If `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing → run `Consent + opt-in`
below. Honor the answer with a marker write either way; do not re-prompt.
2. **Setup gate.** If `question_tuning` is `true` AND
`~/.gstack/developer-profile.json`'s `declared` object is empty AND
`~/.gstack/.declared-setup-prompted` is missing → run `5-Q setup` below.
Touch the marker after setup completes OR is declined.
When neither implicit gate fires, route by user intent:
3. **"Show my profile" / "what do you know about me" / "show my vibe"** →
run `Inspect profile`.
3. **"Review questions" / "what have I been asked" / "show recent"** →
4. **"Review questions" / "what have I been asked" / "show recent"** →
run `Review question log`.
4. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
5. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
run `Set a preference`.
5. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
6. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
my mind"** → run `Edit declared profile` (confirm before writing).
6. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
7. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
8. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true`
9. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, or (e) turn it off?"
7. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
8. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
9. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true && touch ~/.gstack/.question-tuning-prompted`
10. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, or (e) turn it off?"
Power-user shortcuts (one-word invocations) — handle these too:
`profile`, `vibe`, `gap`, `stats`, `review`, `enable`, `disable`, `setup`.
---
## Enable + setup (first-time flow)
## Consent + opt-in
**When this fires.** The user invokes `/plan-tune` and the preamble shows
`QUESTION_TUNING: false` (the default).
**When this fires.** Step 0's consent gate: `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing. The user has never been
asked.
**Privacy note.** gstack defaults `question_tuning` to `false` for every user.
There is no auto-flip for any cohort. The consent prompt is the only path to
enabling, and the answer is honored with a marker file so the user is never
re-asked. Contributors are not auto-enrolled (see
`docs/designs/PLAN_TUNING_V1.md` §"Decisions log" for the privacy posture
rationale). If the user is a contributor (`gstack_contributor: true`), the
prompt can mention it as additional context, but the decision is still
explicit.
**Flow:**
1. Read the current state:
1. Detect contributor state (for prompt framing only, not for auto-action):
```bash
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || echo "false")
echo "QUESTION_TUNING: $_QT"
echo "CONTRIBUTOR: $_CONTRIB"
```
2. If `false`, use AskUserQuestion:
2. AskUserQuestion (use the contributor-specific framing only if `_CONTRIB=true`,
otherwise use the general framing):
**General framing:**
> Question tuning is off. gstack can learn which of its prompts you find
> valuable vs noisy — so over time, gstack stops asking questions you've
> already answered the same way. It takes about 2 minutes to set up your
> initial profile. v1 is observational: gstack tracks your preferences
> and shows you a profile, but doesn't silently change skill behavior yet.
> Logs stay local (`~/.gstack/projects/<slug>/question-log.jsonl`).
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
@ -749,13 +776,47 @@ Power-user shortcuts (one-word invocations) — handle these too:
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. If A or B: enable:
**Contributor framing (only if `_CONTRIB=true`):**
> You're a gstack contributor. Question tuning isn't on by default for
> anyone, but contributors are the cohort whose data most helps v2 work
> (skills adapting to your steering style). Enabling logs every
> AskUserQuestion outcome locally to
> `~/.gstack/projects/<slug>/question-log.jsonl` — nothing leaves your
> machine. v1 is observational only.
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
> A) Enable + set up (recommended for contributors, ~2 min)
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. ALWAYS touch the marker, regardless of choice:
```bash
touch ~/.gstack/.question-tuning-prompted
```
4. If A or B: enable:
```bash
~/.claude/skills/gstack/bin/gstack-config set question_tuning true
```
4. If A (full setup), ask FIVE one-per-dimension declaration questions via
individual AskUserQuestion calls (one at a time). Use plain English, no jargon:
5. If C: do nothing else. Tell the user: "Question tuning stays off. Re-enable
any time with `/plan-tune enable` or `gstack-config set question_tuning true`."
## 5-Q setup (post-consent, or via Setup gate)
**When this fires.** Two paths:
- Right after the consent prompt above accepts option A.
- Standalone via Step 0's setup gate: `question_tuning` is already `true`
(user opted in via gstack-config or earlier `/plan-tune enable`) AND
`declared` is empty AND `~/.gstack/.declared-setup-prompted` is missing.
This catches users who set `question_tuning: true` directly without
running the wizard.
**Flow:**
1. Ask FIVE one-per-dimension declaration questions via individual
AskUserQuestion calls (one at a time). Use plain English, no jargon:
**Q1 — scope_appetite:** "When you're planning a feature, do you lean toward
shipping the smallest useful version fast, or building the complete, edge-
@ -808,10 +869,18 @@ Power-user shortcuts (one-word invocations) — handle these too:
"
```
5. Tell the user: "Profile set. Question tuning is now on. Use `/plan-tune`
2. Touch the marker so the Setup gate doesn't re-fire:
```bash
touch ~/.gstack/.declared-setup-prompted
```
Touch it even if the user bails out partway — they were asked; they chose
not to complete. The Setup gate respects that. They can rerun the 5-Q
anytime with `/plan-tune setup` (Step 0 power-user shortcut).
3. Tell the user: "Profile set. Question tuning is on. Use `/plan-tune`
again any time to inspect, adjust, or turn it off."
6. Show the profile inline as a confirmation (see `Inspect profile` below).
4. Show the profile inline as a confirmation (see `Inspect profile` below).
---
@ -832,12 +901,18 @@ Parse the JSON. Present in **plain English**, not raw floats:
Format: "**scope_appetite:** 0.8 (boil the ocean — you prefer the complete
version with edge cases covered)"
- If `inferred.diversity` passes the calibration gate (`sample_size >= 20 AND
- If `inferred.diversity` passes the **display gate** (`sample_size >= 20 AND
skills_covered >= 3 AND question_ids_covered >= 8 AND days_span >= 7`), show
the inferred column next to declared:
"**scope_appetite:** declared 0.8 (boil the ocean) ↔ observed 0.72 (close)"
Use words for the gap: 0.0-0.1 "close", 0.1-0.3 "drift", 0.3+ "mismatch".
This display gate is intentionally lower than the E1 **promotion gate**
(90+ days stable across 3+ skills, per `docs/designs/PLAN_TUNING_V0.md`).
Displaying inferred values is a UI affordance; shipping behavior-adapting
defaults based on the profile is consequential and needs a much higher
bar. Do NOT use the display gate as a green light for v2 E1 work.
- If the calibration gate isn't met, say: "Not enough observed data yet —
need N more events across M more skills before we can show your observed
profile."

View File

@ -52,50 +52,77 @@ Canonical reference: `docs/designs/PLAN_TUNING_V0.md`.
## Step 0: Detect what the user wants
Read the user's message. Route based on plain-English intent, not keywords:
Read the user's message. Route based on plain-English intent, not keywords.
1. **First-time use** (config says `question_tuning` is not yet set to `true`) →
run `Enable + setup` below.
2. **"Show my profile" / "what do you know about me" / "show my vibe"** →
**Implicit gates run first** (before user-intent routing). These exist so first-time
users see the consent prompt and so explicit opt-ins eventually run the 5-Q setup.
Each gate is guarded by a marker file so the user is asked at most once.
1. **Consent gate.** If `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing → run `Consent + opt-in`
below. Honor the answer with a marker write either way; do not re-prompt.
2. **Setup gate.** If `question_tuning` is `true` AND
`~/.gstack/developer-profile.json`'s `declared` object is empty AND
`~/.gstack/.declared-setup-prompted` is missing → run `5-Q setup` below.
Touch the marker after setup completes OR is declined.
When neither implicit gate fires, route by user intent:
3. **"Show my profile" / "what do you know about me" / "show my vibe"** →
run `Inspect profile`.
3. **"Review questions" / "what have I been asked" / "show recent"** →
4. **"Review questions" / "what have I been asked" / "show recent"** →
run `Review question log`.
4. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
5. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
run `Set a preference`.
5. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
6. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
my mind"** → run `Edit declared profile` (confirm before writing).
6. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
7. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
8. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true`
9. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, or (e) turn it off?"
7. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
8. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
9. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true && touch ~/.gstack/.question-tuning-prompted`
10. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, or (e) turn it off?"
Power-user shortcuts (one-word invocations) — handle these too:
`profile`, `vibe`, `gap`, `stats`, `review`, `enable`, `disable`, `setup`.
---
## Enable + setup (first-time flow)
## Consent + opt-in
**When this fires.** The user invokes `/plan-tune` and the preamble shows
`QUESTION_TUNING: false` (the default).
**When this fires.** Step 0's consent gate: `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing. The user has never been
asked.
**Privacy note.** gstack defaults `question_tuning` to `false` for every user.
There is no auto-flip for any cohort. The consent prompt is the only path to
enabling, and the answer is honored with a marker file so the user is never
re-asked. Contributors are not auto-enrolled (see
`docs/designs/PLAN_TUNING_V1.md` §"Decisions log" for the privacy posture
rationale). If the user is a contributor (`gstack_contributor: true`), the
prompt can mention it as additional context, but the decision is still
explicit.
**Flow:**
1. Read the current state:
1. Detect contributor state (for prompt framing only, not for auto-action):
```bash
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || echo "false")
echo "QUESTION_TUNING: $_QT"
echo "CONTRIBUTOR: $_CONTRIB"
```
2. If `false`, use AskUserQuestion:
2. AskUserQuestion (use the contributor-specific framing only if `_CONTRIB=true`,
otherwise use the general framing):
**General framing:**
> Question tuning is off. gstack can learn which of its prompts you find
> valuable vs noisy — so over time, gstack stops asking questions you've
> already answered the same way. It takes about 2 minutes to set up your
> initial profile. v1 is observational: gstack tracks your preferences
> and shows you a profile, but doesn't silently change skill behavior yet.
> Logs stay local (`~/.gstack/projects/<slug>/question-log.jsonl`).
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
@ -103,13 +130,47 @@ Power-user shortcuts (one-word invocations) — handle these too:
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. If A or B: enable:
**Contributor framing (only if `_CONTRIB=true`):**
> You're a gstack contributor. Question tuning isn't on by default for
> anyone, but contributors are the cohort whose data most helps v2 work
> (skills adapting to your steering style). Enabling logs every
> AskUserQuestion outcome locally to
> `~/.gstack/projects/<slug>/question-log.jsonl` — nothing leaves your
> machine. v1 is observational only.
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
> A) Enable + set up (recommended for contributors, ~2 min)
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. ALWAYS touch the marker, regardless of choice:
```bash
touch ~/.gstack/.question-tuning-prompted
```
4. If A or B: enable:
```bash
~/.claude/skills/gstack/bin/gstack-config set question_tuning true
```
4. If A (full setup), ask FIVE one-per-dimension declaration questions via
individual AskUserQuestion calls (one at a time). Use plain English, no jargon:
5. If C: do nothing else. Tell the user: "Question tuning stays off. Re-enable
any time with `/plan-tune enable` or `gstack-config set question_tuning true`."
## 5-Q setup (post-consent, or via Setup gate)
**When this fires.** Two paths:
- Right after the consent prompt above accepts option A.
- Standalone via Step 0's setup gate: `question_tuning` is already `true`
(user opted in via gstack-config or earlier `/plan-tune enable`) AND
`declared` is empty AND `~/.gstack/.declared-setup-prompted` is missing.
This catches users who set `question_tuning: true` directly without
running the wizard.
**Flow:**
1. Ask FIVE one-per-dimension declaration questions via individual
AskUserQuestion calls (one at a time). Use plain English, no jargon:
**Q1 — scope_appetite:** "When you're planning a feature, do you lean toward
shipping the smallest useful version fast, or building the complete, edge-
@ -162,10 +223,18 @@ Power-user shortcuts (one-word invocations) — handle these too:
"
```
5. Tell the user: "Profile set. Question tuning is now on. Use `/plan-tune`
2. Touch the marker so the Setup gate doesn't re-fire:
```bash
touch ~/.gstack/.declared-setup-prompted
```
Touch it even if the user bails out partway — they were asked; they chose
not to complete. The Setup gate respects that. They can rerun the 5-Q
anytime with `/plan-tune setup` (Step 0 power-user shortcut).
3. Tell the user: "Profile set. Question tuning is on. Use `/plan-tune`
again any time to inspect, adjust, or turn it off."
6. Show the profile inline as a confirmation (see `Inspect profile` below).
4. Show the profile inline as a confirmation (see `Inspect profile` below).
---
@ -186,12 +255,18 @@ Parse the JSON. Present in **plain English**, not raw floats:
Format: "**scope_appetite:** 0.8 (boil the ocean — you prefer the complete
version with edge cases covered)"
- If `inferred.diversity` passes the calibration gate (`sample_size >= 20 AND
- If `inferred.diversity` passes the **display gate** (`sample_size >= 20 AND
skills_covered >= 3 AND question_ids_covered >= 8 AND days_span >= 7`), show
the inferred column next to declared:
"**scope_appetite:** declared 0.8 (boil the ocean) ↔ observed 0.72 (close)"
Use words for the gap: 0.0-0.1 "close", 0.1-0.3 "drift", 0.3+ "mismatch".
This display gate is intentionally lower than the E1 **promotion gate**
(90+ days stable across 3+ skills, per `docs/designs/PLAN_TUNING_V0.md`).
Displaying inferred values is a UI affordance; shipping behavior-adapting
defaults based on the profile is consequential and needs a much higher
bar. Do NOT use the display gate as a green light for v2 E1 work.
- If the calibration gate isn't met, say: "Not enough observed data yet —
need N more events across M more skills before we can show your observed
profile."