gstack/scripts/resolvers/confidence.ts

/**
 * Confidence calibration resolver
 *
 * Adds confidence scoring rubric to review-producing skills.
 * Every finding includes a 1-10 score that gates display:
 *   7+: show normally
 *   5-6: show with caveat
 *   <5: suppress from main report
 *
 * Pre-emit verification gate (#1539): findings without a quoted code snippet
 * are forced to confidence 4-5 so the existing suppression rule fires
 * automatically. Kills the "field doesn't exist on the model" FP class on
 * mature frameworks like Django/Rails — the model code resolves it in <5min,
 * and the gate forces the reviewer to do that lookup before promoting the
 * finding to the report.
 */
import type { TemplateContext } from './types';

export function generateConfidenceCalibration(_ctx: TemplateContext): string {
  return `## Confidence Calibration

Every finding MUST include a confidence score (1-10):

| Score | Meaning | Display rule |
|-------|---------|-------------|
| 9-10 | Verified by reading specific code. Concrete bug or exploit demonstrated. | Show normally |
| 7-8 | High confidence pattern match. Very likely correct. | Show normally |
| 5-6 | Moderate. Could be a false positive. | Show with caveat: "Medium confidence, verify this is actually an issue" |
| 3-4 | Low confidence. Pattern is suspicious but may be fine. | Suppress from main report. Include in appendix only. |
| 1-2 | Speculation. | Only report if severity would be P0. |

**Finding format:**

\\\`[SEVERITY] (confidence: N/10) file:line — description\\\`

Example:
\\\`[P1] (confidence: 9/10) app/models/user.rb:42 — SQL injection via string interpolation in where clause\\\`
\\\`[P2] (confidence: 5/10) app/controllers/api/v1/users_controller.rb:18 — Possible N+1 query, verify with production logs\\\`

### Pre-emit verification gate (#1539 — kills the "field doesn't exist" FP class)

Before any finding is promoted to the report, the gate requires:

1. **Quote the specific code line that motivates the finding** — file:line plus
   the verbatim text of the line(s) that triggered it. If the finding is "field
   X doesn't exist on model Y", quote the lines of class Y where the field
   would live. If "dict.get() might return None", quote the dict initialization.
   If "race condition between A and B", quote both A and B.

2. **If you cannot quote the motivating line(s), the finding is unverified.**
   Force its confidence to 4-5 (suppressed from the main report). It still goes
   into the appendix so reviewers can audit calibration, but the user does NOT
   see it in the critical-pass output. Do not work around this by inventing
   speculative confidence 7+ — that defeats the gate.

**Framework-meta nudge:** When the symbol is generated by a framework
metaclass, descriptor, ORM Meta inner-class, or migration history (Django
\`Meta\`, Rails \`has_many\`/\`scope\`, SQLAlchemy \`relationship\`/\`Column\`,
TypeORM decorators, Sequelize \`init\`/\`belongsTo\`, Prisma generated client),
quote the meta-construct (the \`Meta\` block, the migration, the decorator,
the schema file) instead of expecting the literal name in the class body.
The verification is "I read the source that creates this symbol", not "I
grep'd for the name and didn't find it." Deeper framework-aware verification
(model introspection, migration-history-aware checks, ORM dialect detection)
is deliberately out of scope for the lighter gate — see the deferred
\`~/.gstack-dev/plans/1539-framework-aware-review.md\` design doc.

The FP classes the gate kills (measured against Django Sprint 2.5 #1539):

| FP class | Why the gate catches it |
|---|---|
| "field doesn't exist on model" | Requires quoting the model class body or Meta; the field's absence becomes obvious |
| "dict.get() might be None" | Requires quoting the dict initialization (e.g. Django form's \`cleaned_data\` is \`{}\`-initialized) |
| "save() might lose fields" | Requires quoting the ORM signature or model definition |
| "update_fields might miss X" | Requires quoting the field set; if X doesn't exist, the FP is self-evident |

**Calibration learning:** If you report a finding with confidence < 7 and the user
confirms it IS a real issue, that is a calibration event. Your initial confidence was
too low. Log the corrected pattern as a learning so future reviews catch it with
higher confidence.`;
}