mirror of https://github.com/garrytan/gstack.git
82 lines
4.2 KiB
TypeScript
82 lines
4.2 KiB
TypeScript
/**
|
|
* Confidence calibration resolver
|
|
*
|
|
* Adds confidence scoring rubric to review-producing skills.
|
|
* Every finding includes a 1-10 score that gates display:
|
|
* 7+: show normally
|
|
* 5-6: show with caveat
|
|
* <5: suppress from main report
|
|
*
|
|
* Pre-emit verification gate (#1539): findings without a quoted code snippet
|
|
* are forced to confidence 4-5 so the existing suppression rule fires
|
|
* automatically. Kills the "field doesn't exist on the model" FP class on
|
|
* mature frameworks like Django/Rails — the model code resolves it in <5min,
|
|
* and the gate forces the reviewer to do that lookup before promoting the
|
|
* finding to the report.
|
|
*/
|
|
import type { TemplateContext } from './types';
|
|
|
|
export function generateConfidenceCalibration(_ctx: TemplateContext): string {
|
|
return `## Confidence Calibration
|
|
|
|
Every finding MUST include a confidence score (1-10):
|
|
|
|
| Score | Meaning | Display rule |
|
|
|-------|---------|-------------|
|
|
| 9-10 | Verified by reading specific code. Concrete bug or exploit demonstrated. | Show normally |
|
|
| 7-8 | High confidence pattern match. Very likely correct. | Show normally |
|
|
| 5-6 | Moderate. Could be a false positive. | Show with caveat: "Medium confidence, verify this is actually an issue" |
|
|
| 3-4 | Low confidence. Pattern is suspicious but may be fine. | Suppress from main report. Include in appendix only. |
|
|
| 1-2 | Speculation. | Only report if severity would be P0. |
|
|
|
|
**Finding format:**
|
|
|
|
\\\`[SEVERITY] (confidence: N/10) file:line — description\\\`
|
|
|
|
Example:
|
|
\\\`[P1] (confidence: 9/10) app/models/user.rb:42 — SQL injection via string interpolation in where clause\\\`
|
|
\\\`[P2] (confidence: 5/10) app/controllers/api/v1/users_controller.rb:18 — Possible N+1 query, verify with production logs\\\`
|
|
|
|
### Pre-emit verification gate (#1539 — kills the "field doesn't exist" FP class)
|
|
|
|
Before any finding is promoted to the report, the gate requires:
|
|
|
|
1. **Quote the specific code line that motivates the finding** — file:line plus
|
|
the verbatim text of the line(s) that triggered it. If the finding is "field
|
|
X doesn't exist on model Y", quote the lines of class Y where the field
|
|
would live. If "dict.get() might return None", quote the dict initialization.
|
|
If "race condition between A and B", quote both A and B.
|
|
|
|
2. **If you cannot quote the motivating line(s), the finding is unverified.**
|
|
Force its confidence to 4-5 (suppressed from the main report). It still goes
|
|
into the appendix so reviewers can audit calibration, but the user does NOT
|
|
see it in the critical-pass output. Do not work around this by inventing
|
|
speculative confidence 7+ — that defeats the gate.
|
|
|
|
**Framework-meta nudge:** When the symbol is generated by a framework
|
|
metaclass, descriptor, ORM Meta inner-class, or migration history (Django
|
|
\`Meta\`, Rails \`has_many\`/\`scope\`, SQLAlchemy \`relationship\`/\`Column\`,
|
|
TypeORM decorators, Sequelize \`init\`/\`belongsTo\`, Prisma generated client),
|
|
quote the meta-construct (the \`Meta\` block, the migration, the decorator,
|
|
the schema file) instead of expecting the literal name in the class body.
|
|
The verification is "I read the source that creates this symbol", not "I
|
|
grep'd for the name and didn't find it." Deeper framework-aware verification
|
|
(model introspection, migration-history-aware checks, ORM dialect detection)
|
|
is deliberately out of scope for the lighter gate — see the deferred
|
|
\`~/.gstack-dev/plans/1539-framework-aware-review.md\` design doc.
|
|
|
|
The FP classes the gate kills (measured against Django Sprint 2.5 #1539):
|
|
|
|
| FP class | Why the gate catches it |
|
|
|---|---|
|
|
| "field doesn't exist on model" | Requires quoting the model class body or Meta; the field's absence becomes obvious |
|
|
| "dict.get() might be None" | Requires quoting the dict initialization (e.g. Django form's \`cleaned_data\` is \`{}\`-initialized) |
|
|
| "save() might lose fields" | Requires quoting the ORM signature or model definition |
|
|
| "update_fields might miss X" | Requires quoting the field set; if X doesn't exist, the FP is self-evident |
|
|
|
|
**Calibration learning:** If you report a finding with confidence < 7 and the user
|
|
confirms it IS a real issue, that is a calibration event. Your initial confidence was
|
|
too low. Log the corrected pattern as a learning so future reviews catch it with
|
|
higher confidence.`;
|
|
}
|