Commit Graph

2 Commits

Author SHA1 Message Date
Garry Tan a164597847
build(skills): T7 — atomic regenerate + capture v1.45.0.0 baseline
Final regen pass across all hosts after T1-T6 work landed. Captures the
v1.45.0.0 parity baseline at test/fixtures/parity-baseline-v1.45.0.0.json
for diffing against the v1.44.1 reference.

Measured deltas (real numbers from test/helpers/capture-parity-baseline.ts):

  Total SKILL.md corpus       2,847 KB → 2,813 KB        (-1.2%)
  Catalog tokens (always-loaded) ~9,319 → ~4,045 tokens   (-56.6%)
  Top 10 heaviest skills      0.5-1.0% drop each

The catalog token cut is the headline. It's the always-loaded surface,
i.e. tokens charged on every session start. Per-skill SKILL.md sizes
barely moved because T4 catalog trim MOVES routing prose from frontmatter
to a body "## When to invoke" section rather than deleting it — the
catalog wins without amputating discoverability.

The bigger per-skill compression lands in v2.0.0.0 (Phase B sections/
pattern on the 5 heavyweights). v1.45 is the foundation: eval-first
infrastructure + cheap wins.

scripts/proactive-suggestions.json regenerated with the latest 52 skills
listed (one-time write per gen-skill-docs run; aggregated catalog parts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:38:52 -07:00
Garry Tan 0d68ef1a39
feat(catalog): T4 — catalog trim + proactive-suggestions.json (Phase A.4)
Shortens frontmatter `description:` in every Claude SKILL.md to a single
lead sentence + (gstack) tag. The routing prose ("Use when asked to...",
"Proactively suggest...") and voice triggers move to a "## When to invoke"
body section so they remain discoverable inside the skill. A per-run
registry at scripts/proactive-suggestions.json aggregates the routing/
voice text for all 52 skills so agents can pull guidance on demand
without paying for it in the always-loaded catalog.

Build flag --catalog-mode=full restores v1.44 legacy behavior (full
multi-line descriptions in frontmatter). Default is trim.

splitCatalogDescription() extracts: lead sentence, routing paragraphs,
voice-triggers line, (gstack) tag presence. Short descriptions (<120
chars, already trimmed) are skipped via a guard so re-runs are idempotent.

Measured impact (vs v1.44.1 baseline):
- Catalog tokens (sum of description bytes / 4): 9,319 → 4,045  (-56.6%)
- Total SKILL.md corpus bytes:                   2,915 KB → 2,880 KB (-1.2%)
- Routing prose preserved as in-skill "## When to invoke" sections
- 52 skill entries in scripts/proactive-suggestions.json (on-demand registry)

The corpus drop is small because catalog trim MOVES text from frontmatter
to body, it doesn't delete it. The headline win is the catalog: the
always-loaded system prompt surface drops by more than half.

Test plan:
- bun test test/gen-skill-docs.test.ts: 389 pass, 0 fail
- Manual: ship/SKILL.md frontmatter description is now ONE line ending
  with `(gstack)`; allowed-tools field on next line (YAML well-formed)
- Manual: scripts/proactive-suggestions.json contains 52 entries
- bun run gen:skill-docs --catalog-mode=full restores legacy behavior

53 files changed (52 SKILL.md across hosts + the new proactive-suggestions.json).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:35:21 -07:00