mirror of https://github.com/garrytan/gstack.git
test(coverage): close 5 remaining v1.46.0.0 test gaps (A-E)
Five behaviors that v1.46 ships but had no test coverage. All now pinned. A) --host all idempotency (test/gen-skill-docs-idempotency.test.ts) The default test ran Claude host only. Non-Claude hosts (Codex, Factory, Cursor, OpenClaw, GBrain, Slate, OpenCode, Hermes, Kiro) each have their own output paths and could carry their own non-deterministic fields. We hit a "--host all needed for freshness check" mid-/ship. Now: two consecutive `bun run gen:skill-docs --host all` runs must produce byte-identical outputs across a per-host sample (.agents/, .cursor/, .factory/, .gbrain/). Catches per-host adapter regressions before CI. B) --catalog-mode=full opt-out (test/catalog-mode-full.test.ts) The legacy escape hatch had zero tests. 6 new tests across two layers: static (CATALOG_MODE_ARG parsed; conditional gate present; default is "trim"; invalid value throws) + smoke (actual --catalog-mode=full run produces a multi-line `description: |` block + omits "## When to invoke" body section; mutates the working tree then restores in a finally block). C) parity-baseline-v1.44.1.json integrity (test/parity-baseline-integrity.test.ts) The baseline is the source of every v1→v2 number cited in the CHANGELOG v1.46.0.0 entry. Anyone could edit it without test failure until now. 8 new tests pin: existence, tag, capturedFromCommit allowlist, expected v1.44 numbers (51 skills, ~2,915 KB, ~9,319 catalog tokens), CHANGELOG references this file by path, per-skill shape, and a SHA256 byte-stability hash. Any edit fails with a clear "if intentional, update EXPECTED_HASH AND the CHANGELOG numbers" signal. D) Live appliesTo gate end-to-end (test/resolver-entry.test.ts extended) The unwrapResolver unit tests covered the function; the gen-skill-docs.ts substitution loop that USES the gate had no integration coverage. 6 new tests simulate the exact 4-line shape from gen-skill-docs.ts:457-467 against synthetic registries: plain-function fires unconditionally, gated fires when true / empty-string when false, mixed registries compose, parameterized resolvers respect gates, unknown resolvers throw. E) Per-skill min-size floor (test/skill-size-budget.test.ts extended) The existing 200-byte body coverage-floor is a noise floor — a skill that lost 99.75% of content still passes. 1 new test asserts every skill stays ≥80% of its v1.44.1 baseline size (the parity-suite content invariants only covered 10 of 51 skills; the remaining 41 were uncovered). SECTIONS_EXTRACTED hook in place for v2.0.0.0 when the sections/ pattern legitimately shrinks ship/plan-ceo/etc. past the floor. Test plan: - bun test focused 17-file suite: 1202 pass, 0 fail (+23 new tests vs the pre-fill 1179 baseline) - catalog-mode=full mutates working tree then restores cleanly - --host all idempotency runs two full gen passes in <1s on this machine Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
8b94e6d993
commit
b5f75c18c7
|
|
@ -0,0 +1,118 @@
|
||||||
|
/**
|
||||||
|
* Gap B (v1.46.0.0): --catalog-mode=full opt-out behavior.
|
||||||
|
*
|
||||||
|
* The catalog trim is the default. The opt-out (`--catalog-mode=full`)
|
||||||
|
* preserves v1.44 multi-line frontmatter descriptions for users / hosts
|
||||||
|
* that depend on the legacy fat catalog. Without this test, someone could
|
||||||
|
* break the conditional `if (host === 'claude' && CATALOG_MODE === 'trim')`
|
||||||
|
* and silently turn the opt-out path into a no-op — users with the flag
|
||||||
|
* still get trim'd output, the v1.44 behavior is gone.
|
||||||
|
*
|
||||||
|
* Two layers:
|
||||||
|
* 1. Static: the CATALOG_MODE flag is wired into gen-skill-docs.ts and
|
||||||
|
* the conditional gate is in the pipeline.
|
||||||
|
* 2. Smoke: running with --catalog-mode=full produces a frontmatter
|
||||||
|
* `description: |` block (multi-line) instead of the trim'd one-line
|
||||||
|
* `description: ...(gstack)` form.
|
||||||
|
*
|
||||||
|
* The smoke test mutates the working tree mid-run. It restores the default
|
||||||
|
* trim'd state in a finally block so a crash mid-test still leaves a clean
|
||||||
|
* working tree.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { describe, test, expect } from 'bun:test';
|
||||||
|
import { spawnSync } from 'child_process';
|
||||||
|
import * as fs from 'fs';
|
||||||
|
import * as path from 'path';
|
||||||
|
|
||||||
|
const REPO_ROOT = path.resolve(import.meta.dir, '..');
|
||||||
|
const GEN_SKILL_DOCS = path.join(REPO_ROOT, 'scripts', 'gen-skill-docs.ts');
|
||||||
|
const SHIP_SKILL = path.join(REPO_ROOT, 'ship', 'SKILL.md');
|
||||||
|
|
||||||
|
describe('--catalog-mode=full opt-out wiring (static)', () => {
|
||||||
|
test('CATALOG_MODE_ARG parsing is wired into gen-skill-docs.ts', () => {
|
||||||
|
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
|
||||||
|
expect(src).toContain('CATALOG_MODE_ARG');
|
||||||
|
expect(src).toContain("a.startsWith('--catalog-mode')");
|
||||||
|
});
|
||||||
|
|
||||||
|
test('CATALOG_MODE accepts only "trim" or "full" — anything else throws', () => {
|
||||||
|
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
|
||||||
|
expect(src).toMatch(/val !== 'trim' && val !== 'full'/);
|
||||||
|
expect(src).toContain('Unknown catalog mode');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('catalog trim only fires when CATALOG_MODE === "trim"', () => {
|
||||||
|
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
|
||||||
|
// The applyCatalogTrim call is gated by both host and CATALOG_MODE checks.
|
||||||
|
expect(src).toMatch(/CATALOG_MODE === 'trim'/);
|
||||||
|
expect(src).toContain('applyCatalogTrim(content, skillName)');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('default CATALOG_MODE is "trim" (opt-out, not opt-in)', () => {
|
||||||
|
const src = fs.readFileSync(GEN_SKILL_DOCS, 'utf-8');
|
||||||
|
// The const initializer falls back to 'trim' when --catalog-mode is unset.
|
||||||
|
expect(src).toMatch(/if \(!CATALOG_MODE_ARG\) return 'trim'/);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('--catalog-mode=full opt-out behavior (smoke)', () => {
|
||||||
|
test('--catalog-mode=full produces multi-line description in frontmatter', () => {
|
||||||
|
// Save the trim'd state so we can restore it.
|
||||||
|
const trimmedShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
|
||||||
|
expect(trimmedShip).toMatch(/^description: Ship workflow:[^\n]*\(gstack\)\n/m);
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Run with --catalog-mode=full. Mutates working tree.
|
||||||
|
const result = spawnSync('bun', ['run', 'gen:skill-docs', '--catalog-mode=full'], {
|
||||||
|
cwd: REPO_ROOT,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
timeout: 60_000,
|
||||||
|
});
|
||||||
|
expect(result.status).toBe(0);
|
||||||
|
|
||||||
|
// After --catalog-mode=full, frontmatter description is the legacy
|
||||||
|
// multi-line block, not the trim'd one-line form.
|
||||||
|
const fullShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
|
||||||
|
expect(fullShip).toMatch(/^description: \|\s*$/m); // YAML block scalar
|
||||||
|
// Legacy multi-line content includes "Use when asked to..." in the
|
||||||
|
// frontmatter (in trim mode this lives in the body section).
|
||||||
|
const fmEnd = fullShip.indexOf('\n---', 4);
|
||||||
|
const fm = fullShip.slice(0, fmEnd);
|
||||||
|
expect(fm).toMatch(/Use when asked to/i);
|
||||||
|
|
||||||
|
// "When to invoke" body section should NOT be present in full mode
|
||||||
|
// (because the routing prose stayed in frontmatter).
|
||||||
|
const body = fullShip.slice(fmEnd);
|
||||||
|
expect(body).not.toContain('## When to invoke this skill');
|
||||||
|
} finally {
|
||||||
|
// Restore default trim state regardless of test outcome.
|
||||||
|
const restore = spawnSync('bun', ['run', 'gen:skill-docs'], {
|
||||||
|
cwd: REPO_ROOT,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
timeout: 60_000,
|
||||||
|
});
|
||||||
|
if (restore.status !== 0) {
|
||||||
|
// eslint-disable-next-line no-console
|
||||||
|
console.error(
|
||||||
|
'CRITICAL: failed to restore default trim state. Run `bun run gen:skill-docs` to clean up.',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// Sanity-check the restored state matches what we saw at the start.
|
||||||
|
const restoredShip = fs.readFileSync(SHIP_SKILL, 'utf-8');
|
||||||
|
expect(restoredShip).toMatch(/^description: Ship workflow:[^\n]*\(gstack\)\n/m);
|
||||||
|
}
|
||||||
|
}, 180_000);
|
||||||
|
|
||||||
|
test('--catalog-mode=invalid throws a clear error', () => {
|
||||||
|
const result = spawnSync('bun', ['run', 'gen:skill-docs', '--catalog-mode=invalid'], {
|
||||||
|
cwd: REPO_ROOT,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
timeout: 30_000,
|
||||||
|
});
|
||||||
|
expect(result.status).not.toBe(0);
|
||||||
|
const stderr = result.stderr?.toString() ?? '';
|
||||||
|
expect(stderr).toMatch(/Unknown catalog mode/);
|
||||||
|
expect(stderr).toMatch(/invalid/);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
@ -33,11 +33,27 @@ const STABLE_OUTPUTS = [
|
||||||
'gstack/llms.txt',
|
'gstack/llms.txt',
|
||||||
];
|
];
|
||||||
|
|
||||||
function runGen(): { exitCode: number; stderr: string } {
|
/**
|
||||||
const result = spawnSync('bun', ['run', 'gen:skill-docs'], {
|
* Sampled outputs from EVERY non-Claude host. The full host-all run touches
|
||||||
|
* .agents/, .cursor/, .factory/, .gbrain/, .hermes/, .kiro/, .openclaw/,
|
||||||
|
* .opencode/, .slate/ — picking one canonical file per host catches per-host
|
||||||
|
* non-determinism without paying the cost of snapshotting hundreds of files.
|
||||||
|
*/
|
||||||
|
const STABLE_HOST_ALL_OUTPUTS = [
|
||||||
|
'scripts/proactive-suggestions.json',
|
||||||
|
'SKILL.md',
|
||||||
|
'ship/SKILL.md',
|
||||||
|
'.agents/skills/gstack-ship/SKILL.md',
|
||||||
|
'.cursor/skills/gstack-ship/SKILL.md',
|
||||||
|
'.factory/skills/gstack-ship/SKILL.md',
|
||||||
|
'.gbrain/skills/gstack-ship/SKILL.md',
|
||||||
|
];
|
||||||
|
|
||||||
|
function runGen(extraArgs: string[] = []): { exitCode: number; stderr: string } {
|
||||||
|
const result = spawnSync('bun', ['run', 'gen:skill-docs', ...extraArgs], {
|
||||||
cwd: REPO_ROOT,
|
cwd: REPO_ROOT,
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: 60_000,
|
timeout: 120_000,
|
||||||
});
|
});
|
||||||
return {
|
return {
|
||||||
exitCode: result.status ?? -1,
|
exitCode: result.status ?? -1,
|
||||||
|
|
@ -45,9 +61,9 @@ function runGen(): { exitCode: number; stderr: string } {
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
function snapshot(): Map<string, string> {
|
function snapshot(files: string[] = STABLE_OUTPUTS): Map<string, string> {
|
||||||
const m = new Map<string, string>();
|
const m = new Map<string, string>();
|
||||||
for (const rel of STABLE_OUTPUTS) {
|
for (const rel of files) {
|
||||||
const full = path.join(REPO_ROOT, rel);
|
const full = path.join(REPO_ROOT, rel);
|
||||||
if (fs.existsSync(full)) {
|
if (fs.existsSync(full)) {
|
||||||
m.set(rel, fs.readFileSync(full, 'utf-8'));
|
m.set(rel, fs.readFileSync(full, 'utf-8'));
|
||||||
|
|
@ -107,4 +123,37 @@ describe('gen-skill-docs idempotency', () => {
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}, 90_000);
|
}, 90_000);
|
||||||
|
|
||||||
|
test('--host all idempotency: every host output is byte-stable across two runs', () => {
|
||||||
|
// Gap A: the default test above runs Claude host only. Non-Claude hosts
|
||||||
|
// (Codex, Factory, Cursor, OpenClaw, GBrain, Slate, OpenCode, Hermes,
|
||||||
|
// Kiro) have their own output paths and could carry their own
|
||||||
|
// non-deterministic fields. We hit a "--host all needed for freshness
|
||||||
|
// check" mid-/ship; this test pins the contract across every host.
|
||||||
|
const firstRun = runGen(['--host', 'all']);
|
||||||
|
expect(firstRun.exitCode).toBe(0);
|
||||||
|
|
||||||
|
const after1 = snapshot(STABLE_HOST_ALL_OUTPUTS);
|
||||||
|
expect(after1.size).toBeGreaterThan(0);
|
||||||
|
|
||||||
|
const secondRun = runGen(['--host', 'all']);
|
||||||
|
expect(secondRun.exitCode).toBe(0);
|
||||||
|
|
||||||
|
const after2 = snapshot(STABLE_HOST_ALL_OUTPUTS);
|
||||||
|
|
||||||
|
const flapping: string[] = [];
|
||||||
|
for (const [file, before] of after1.entries()) {
|
||||||
|
const now = after2.get(file);
|
||||||
|
if (now !== before) flapping.push(file);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (flapping.length > 0) {
|
||||||
|
throw new Error(
|
||||||
|
`${flapping.length} file(s) changed between two consecutive --host all gen runs:\n` +
|
||||||
|
flapping.map(f => ` - ${f}`).join('\n') +
|
||||||
|
`\nLikely cause: a non-deterministic field leaked into a non-Claude host adapter ` +
|
||||||
|
`(scripts/host-adapters/*.ts). CI freshness checks for that host will flap.`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}, 300_000); // ~5 min budget for two host-all runs
|
||||||
});
|
});
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,145 @@
|
||||||
|
/**
|
||||||
|
* Gap C (v1.46.0.0): parity-baseline-v1.44.1.json integrity check.
|
||||||
|
*
|
||||||
|
* The v1.44.1 baseline file is the source of every "v1 was X bytes" claim
|
||||||
|
* in CHANGELOG.md (v1.46.0.0 entry) and the reference for the per-skill
|
||||||
|
* size-budget gate, the parity-suite content invariants, and the published
|
||||||
|
* compression numbers. If a contributor (or a sloppy rebase) edits the
|
||||||
|
* file, every downstream claim silently becomes unverifiable.
|
||||||
|
*
|
||||||
|
* This test pins:
|
||||||
|
* 1. The file exists.
|
||||||
|
* 2. Its top-level `tag` is "v1.44.1" (rejects a rename-by-edit).
|
||||||
|
* 3. Its `capturedFromCommit` is the v1.44.1.0 release commit (or earlier
|
||||||
|
* commit on the slim-skill-tokens branch where the baseline was
|
||||||
|
* captured — both are immutable historic SHAs).
|
||||||
|
* 4. The headline numbers reported in CHANGELOG.md are present in the
|
||||||
|
* baseline JSON. If someone "fixes" the JSON numbers without updating
|
||||||
|
* CHANGELOG (or vice versa), this surfaces the mismatch.
|
||||||
|
* 5. A whitelist of known stable commits — anything else means someone
|
||||||
|
* regenerated the baseline against fresh-current-state, which defeats
|
||||||
|
* the v1→v2 reference contract.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { describe, test, expect } from 'bun:test';
|
||||||
|
import * as fs from 'fs';
|
||||||
|
import * as path from 'path';
|
||||||
|
import * as crypto from 'crypto';
|
||||||
|
|
||||||
|
const REPO_ROOT = path.resolve(import.meta.dir, '..');
|
||||||
|
const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.44.1.json');
|
||||||
|
const CHANGELOG_PATH = path.join(REPO_ROOT, 'CHANGELOG.md');
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The baseline was captured at this commit on the slim-skill-tokens branch
|
||||||
|
* (commit 74bc8054, just after v2_PLAN.md landed and before any compression
|
||||||
|
* work). If the baseline is ever regenerated, this whitelist must change AND
|
||||||
|
* the v1.46.0.0 CHANGELOG numbers table must be updated to reflect the new
|
||||||
|
* v1.x baseline.
|
||||||
|
*/
|
||||||
|
const ALLOWED_BASELINE_COMMITS = new Set([
|
||||||
|
'74bc8054',
|
||||||
|
]);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Headline numbers from the v1.46.0.0 CHANGELOG entry. If the baseline JSON
|
||||||
|
* is edited, these no longer match and the user's published claims become
|
||||||
|
* unverifiable. We assert the baseline still contains these values.
|
||||||
|
*/
|
||||||
|
const EXPECTED_v144_NUMBERS = {
|
||||||
|
totalSkills: 51,
|
||||||
|
totalCorpusBytesMin: 2_900_000, // CHANGELOG says ~2,847 KB (uses Math.round(/1024)); allow ±10K slack
|
||||||
|
totalCorpusBytesMax: 2_930_000,
|
||||||
|
estTotalCatalogTokensMin: 9_300,
|
||||||
|
estTotalCatalogTokensMax: 9_340, // CHANGELOG cites ~9,319
|
||||||
|
};
|
||||||
|
|
||||||
|
describe('parity-baseline-v1.44.1.json integrity (v1→v2 reference)', () => {
|
||||||
|
test('file exists at the canonical path', () => {
|
||||||
|
expect(fs.existsSync(BASELINE_PATH)).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('tag is "v1.44.1" — file was not renamed by edit', () => {
|
||||||
|
const baseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
expect(baseline.tag).toBe('v1.44.1');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('capturedFromCommit is on the allowlist (rejects ad-hoc regeneration)', () => {
|
||||||
|
const baseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
if (!ALLOWED_BASELINE_COMMITS.has(baseline.capturedFromCommit)) {
|
||||||
|
throw new Error(
|
||||||
|
`parity-baseline-v1.44.1.json was captured at commit ${baseline.capturedFromCommit}, ` +
|
||||||
|
`not on the allowlist (${[...ALLOWED_BASELINE_COMMITS].join(', ')}).\n` +
|
||||||
|
`If you intentionally regenerated the baseline, add the new commit to ` +
|
||||||
|
`ALLOWED_BASELINE_COMMITS in test/parity-baseline-integrity.test.ts AND ` +
|
||||||
|
`update the v1.46.0.0 CHANGELOG numbers table to match the new baseline.\n` +
|
||||||
|
`If you didn't intend to regenerate it, restore the file from git history.`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
test('totalSkills matches expected (51)', () => {
|
||||||
|
const baseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
expect(baseline.totalSkills).toBe(EXPECTED_v144_NUMBERS.totalSkills);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('totalCorpusBytes is within the CHANGELOG-cited range (~2,847 KB)', () => {
|
||||||
|
const baseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
expect(baseline.totalCorpusBytes).toBeGreaterThanOrEqual(EXPECTED_v144_NUMBERS.totalCorpusBytesMin);
|
||||||
|
expect(baseline.totalCorpusBytes).toBeLessThanOrEqual(EXPECTED_v144_NUMBERS.totalCorpusBytesMax);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('estTotalCatalogTokens matches the CHANGELOG-cited ~9,319', () => {
|
||||||
|
const baseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
expect(baseline.estTotalCatalogTokens).toBeGreaterThanOrEqual(EXPECTED_v144_NUMBERS.estTotalCatalogTokensMin);
|
||||||
|
expect(baseline.estTotalCatalogTokens).toBeLessThanOrEqual(EXPECTED_v144_NUMBERS.estTotalCatalogTokensMax);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('CHANGELOG v1.46.0.0 entry references this baseline file by path', () => {
|
||||||
|
const changelog = fs.readFileSync(CHANGELOG_PATH, 'utf-8');
|
||||||
|
// The CHANGELOG entry must mention the baseline file so reviewers know
|
||||||
|
// where the numbers come from. If someone edits one without the other,
|
||||||
|
// this test surfaces the drift.
|
||||||
|
expect(changelog).toContain('parity-baseline-v1.44.1.json');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('every per-skill entry has the required shape', () => {
|
||||||
|
const baseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
for (const [skill, entry] of Object.entries(baseline.skills)) {
|
||||||
|
const e = entry as Record<string, unknown>;
|
||||||
|
expect(typeof e.skill).toBe('string');
|
||||||
|
expect(e.skill).toBe(skill);
|
||||||
|
expect(typeof e.skillMdBytes).toBe('number');
|
||||||
|
expect(typeof e.skillMdLines).toBe('number');
|
||||||
|
expect(typeof e.estTokens).toBe('number');
|
||||||
|
expect(typeof e.descriptionLen).toBe('number');
|
||||||
|
expect(e.skillMdBytes as number).toBeGreaterThan(0);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
test('content hash is stable (catches any byte-level edit)', () => {
|
||||||
|
// Pinning the SHA256 of the file content is the strongest possible
|
||||||
|
// integrity check. When the baseline file LEGITIMATELY needs to change
|
||||||
|
// (rare — e.g. adding new skills since v1.44.1), this test fails with
|
||||||
|
// a clear "the hash changed from X to Y; update the constant if
|
||||||
|
// intentional" signal. The commit that updates the hash MUST also
|
||||||
|
// explain why and update the v1.46.0.0 CHANGELOG numbers if any
|
||||||
|
// headline changes.
|
||||||
|
//
|
||||||
|
// To re-capture: `shasum -a 256 test/fixtures/parity-baseline-v1.44.1.json`
|
||||||
|
const buf = fs.readFileSync(BASELINE_PATH);
|
||||||
|
const hash = crypto.createHash('sha256').update(buf).digest('hex');
|
||||||
|
const EXPECTED_HASH = '29da01be6493bb2c7308b072f3066c09bdeb0397cb79ae1c708b5a38850efe46';
|
||||||
|
if (hash !== EXPECTED_HASH) {
|
||||||
|
throw new Error(
|
||||||
|
`parity-baseline-v1.44.1.json content hash changed.\n` +
|
||||||
|
` expected: ${EXPECTED_HASH}\n` +
|
||||||
|
` current: ${hash}\n` +
|
||||||
|
`If you intentionally regenerated the baseline, update EXPECTED_HASH in ` +
|
||||||
|
`test/parity-baseline-integrity.test.ts AND justify the change in the ` +
|
||||||
|
`commit message AND update the v1.46.0.0 CHANGELOG numbers table.\n` +
|
||||||
|
`If you didn't intend to regenerate it, restore the file from git history.`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
@ -91,3 +91,96 @@ describe('RESOLVERS registry still loads with mixed shapes', () => {
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Gap D (v1.46.0.0): live appliesTo gate end-to-end integration.
|
||||||
|
*
|
||||||
|
* The ResolverEntry / unwrapResolver machinery has unit coverage above. The
|
||||||
|
* remaining gap: does the gen-skill-docs.ts:444 substitution loop actually
|
||||||
|
* USE the gate? A refactor that drops the `if (appliesTo && !appliesTo(ctx))`
|
||||||
|
* check would silently break every future gated resolver.
|
||||||
|
*
|
||||||
|
* This test simulates the exact 4-line shape the live pipeline uses against
|
||||||
|
* a synthetic registry. If gen-skill-docs.ts is refactored and someone
|
||||||
|
* forgets to keep the gate check in sync, this assertion fails.
|
||||||
|
*/
|
||||||
|
describe('gen-skill-docs substitution loop respects the appliesTo gate', () => {
|
||||||
|
function simulateGenSubstitution(
|
||||||
|
template: string,
|
||||||
|
registry: Record<string, import('../scripts/resolvers/types').ResolverValue>,
|
||||||
|
ctx: TemplateContext,
|
||||||
|
): string {
|
||||||
|
// Mirrors scripts/gen-skill-docs.ts:457-467 (the {{NAME}} substitution
|
||||||
|
// loop). Keep this in sync with the real loop. Drift here is what the
|
||||||
|
// test is designed to catch.
|
||||||
|
return template.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (_match, fullKey) => {
|
||||||
|
const parts = fullKey.split(':');
|
||||||
|
const resolverName = parts[0];
|
||||||
|
const args = parts.slice(1);
|
||||||
|
const entry = registry[resolverName];
|
||||||
|
if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}}`);
|
||||||
|
const { resolve, appliesTo } = unwrapResolver(entry);
|
||||||
|
if (appliesTo && !appliesTo(ctx)) return '';
|
||||||
|
return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
test('plain-function resolver fires unconditionally', () => {
|
||||||
|
const tpl = '{{ALWAYS}}';
|
||||||
|
const out = simulateGenSubstitution(tpl, {
|
||||||
|
ALWAYS: () => 'fired',
|
||||||
|
}, makeCtx({ skillName: 'whatever' }));
|
||||||
|
expect(out).toBe('fired');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('gated resolver fires only when appliesTo returns true', () => {
|
||||||
|
const tpl = 'before-{{GATED}}-after';
|
||||||
|
const out = simulateGenSubstitution(tpl, {
|
||||||
|
GATED: {
|
||||||
|
resolve: () => 'CONTENT',
|
||||||
|
appliesTo: (ctx) => ctx.skillName === 'allowed',
|
||||||
|
},
|
||||||
|
}, makeCtx({ skillName: 'allowed' }));
|
||||||
|
expect(out).toBe('before-CONTENT-after');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('gated resolver is substituted with empty string when appliesTo returns false', () => {
|
||||||
|
const tpl = 'before-{{GATED}}-after';
|
||||||
|
const out = simulateGenSubstitution(tpl, {
|
||||||
|
GATED: {
|
||||||
|
resolve: () => 'CONTENT',
|
||||||
|
appliesTo: (ctx) => ctx.skillName === 'allowed',
|
||||||
|
},
|
||||||
|
}, makeCtx({ skillName: 'something-else' }));
|
||||||
|
expect(out).toBe('before--after');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('mixed registry: gated + plain resolvers in the same template', () => {
|
||||||
|
const tpl = '{{PLAIN}} / {{GATED_ON}} / {{GATED_OFF}}';
|
||||||
|
const ctx = makeCtx({ skillName: 'ship' });
|
||||||
|
const out = simulateGenSubstitution(tpl, {
|
||||||
|
PLAIN: () => 'plain',
|
||||||
|
GATED_ON: { resolve: () => 'on', appliesTo: () => true },
|
||||||
|
GATED_OFF: { resolve: () => 'off', appliesTo: () => false },
|
||||||
|
}, ctx);
|
||||||
|
expect(out).toBe('plain / on / ');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('parameterized resolver still respects gate', () => {
|
||||||
|
const tpl = '{{GATED:arg1:arg2}}';
|
||||||
|
const ctx = makeCtx({ skillName: 'no' });
|
||||||
|
const out = simulateGenSubstitution(tpl, {
|
||||||
|
GATED: {
|
||||||
|
resolve: (_c, args) => `fired-with-${(args ?? []).join('-')}`,
|
||||||
|
appliesTo: (c) => c.skillName === 'yes',
|
||||||
|
},
|
||||||
|
}, ctx);
|
||||||
|
expect(out).toBe(''); // gated off, args ignored
|
||||||
|
});
|
||||||
|
|
||||||
|
test('unknown resolver throws (matches real gen-skill-docs error contract)', () => {
|
||||||
|
expect(() =>
|
||||||
|
simulateGenSubstitution('{{NEVER_DEFINED}}', {}, makeCtx()),
|
||||||
|
).toThrow(/Unknown placeholder/);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
|
||||||
|
|
@ -126,6 +126,75 @@ describe('SKILL.md size budget regression (gate, free)', () => {
|
||||||
);
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Gap E (v1.46.0.0): per-skill min-size floor.
|
||||||
|
*
|
||||||
|
* The existing skill-coverage-floor enforces body ≥ 200 bytes, which is
|
||||||
|
* a tiny noise floor. A skill that was 100 KB at v1.44.1 and shrinks to
|
||||||
|
* 250 bytes passes that check despite losing 99.75% of content. The
|
||||||
|
* parity-suite content invariants cover this for 10 hand-picked skills
|
||||||
|
* (cso, ship, plan-ceo, etc.); the remaining 41 skills had no per-skill
|
||||||
|
* shrinkage floor.
|
||||||
|
*
|
||||||
|
* Floor: 80% of the v1.44.1 baseline. v1.46 actual shrinkage is <1% per
|
||||||
|
* skill, so this is a comfortable ceiling that still catches accidental
|
||||||
|
* mass deletion (e.g., a refactor that strips the body of a skill).
|
||||||
|
*
|
||||||
|
* v2.0.0.0 will introduce the sections/ pattern for 5 heavyweights
|
||||||
|
* (ship, plan-ceo-review, office-hours, plan-eng-review,
|
||||||
|
* plan-design-review). Those skills will legitimately shrink to ~15 KB
|
||||||
|
* skeletons. When that lands, add them to SECTIONS_EXTRACTED so the floor
|
||||||
|
* relaxes for them.
|
||||||
|
*/
|
||||||
|
test('no skill shrinks past 80% of v1.44.1 baseline (catches accidental body strip)', () => {
|
||||||
|
const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||||
|
const current = captureBaseline({ repoRoot: REPO_ROOT });
|
||||||
|
const MIN_RATIO = 0.80; // a skill at <80% of its v1.44 size signals mass-deletion
|
||||||
|
const SECTIONS_EXTRACTED = new Set<string>(); // populate in v2.0.0.0 when sections/ lands
|
||||||
|
|
||||||
|
const undershoots: Array<{
|
||||||
|
skill: string; beforeBytes: number; afterBytes: number; ratio: number;
|
||||||
|
}> = [];
|
||||||
|
for (const [skill, before] of Object.entries(baseline.skills)) {
|
||||||
|
if (SECTIONS_EXTRACTED.has(skill)) continue;
|
||||||
|
const after = current.skills[skill];
|
||||||
|
if (!after) continue; // skill removed since baseline — separate concern
|
||||||
|
const ratio = after.skillMdBytes / before.skillMdBytes;
|
||||||
|
if (ratio < MIN_RATIO) {
|
||||||
|
undershoots.push({
|
||||||
|
skill, beforeBytes: before.skillMdBytes, afterBytes: after.skillMdBytes, ratio,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (undershoots.length === 0) return;
|
||||||
|
|
||||||
|
const overrideReason = process.env.GSTACK_SIZE_BUDGET_OVERRIDE_REASON?.trim();
|
||||||
|
if (overrideReason) {
|
||||||
|
logBudgetOverride({
|
||||||
|
scope: 'skill-size-budget-floor',
|
||||||
|
reason: overrideReason,
|
||||||
|
details: { min_ratio: MIN_RATIO, undershoots },
|
||||||
|
});
|
||||||
|
// eslint-disable-next-line no-console
|
||||||
|
console.warn(
|
||||||
|
`[skill-size-budget-floor] OVERRIDE APPLIED (${overrideReason}) — ${undershoots.length} undershoot(s) allowed`,
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const msg = undershoots.map(u =>
|
||||||
|
` ${u.skill}: ${u.beforeBytes} → ${u.afterBytes} bytes (×${u.ratio.toFixed(2)} — below ${MIN_RATIO} floor)`,
|
||||||
|
).join('\n');
|
||||||
|
throw new Error(
|
||||||
|
`${undershoots.length} skill(s) shrunk past v1.44.1 × ${MIN_RATIO} floor:\n${msg}\n` +
|
||||||
|
`This usually signals accidental body strip (e.g., a resolver returning empty, a ` +
|
||||||
|
`template losing a section). If the shrinkage is intentional (e.g., the skill moved ` +
|
||||||
|
`to the sections/ pattern), add it to SECTIONS_EXTRACTED in this test. Override: ` +
|
||||||
|
`GSTACK_SIZE_BUDGET_OVERRIDE_REASON="why" allows + audit-logs.`,
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
test('catalog token estimate stays compressed (v1.45 target ≤ 7000)', () => {
|
test('catalog token estimate stays compressed (v1.45 target ≤ 7000)', () => {
|
||||||
const current = captureBaseline({ repoRoot: REPO_ROOT });
|
const current = captureBaseline({ repoRoot: REPO_ROOT });
|
||||||
const v145Target = 7000;
|
const v145Target = 7000;
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue