# Design — i18n-e2e-english-verification ## Overview **Purpose**: This spec produces a deterministic, re-runnable verification pass that proves (or disproves) the MiroFish 5-step pipeline runs cleanly in English, and posts a structured report on issue #10 with a `pass` / `gap` / `manual-pending` status per checklist item. **Users**: i18n maintainers reviewing the epic (#11), and any future verifier re-running the audit after subsequent merges. The deliverable is read by humans on GitHub (issue comment) and re-run by humans (or CI in a future iteration) to confirm parity. **Impact**: No production code is modified. The repository gains one new directory tree (`.kiro/specs/i18n-e2e-english-verification/`) containing the spec, the audit scripts, and the captured outputs. One GitHub comment is posted on #10. Up to four follow-up issues are filed. ### Goals - Static-audit `backend/app`, `frontend/src`, `locales/en.json` for CJK characters; classify every match. - Verify EN / ZH locale catalogue parity and flag suspect untranslated entries. - Verify LLM-prompt assets respect the requested locale. - Document locale-propagation gaps across Flask → `Task` → OASIS subprocess → ReACT agent. - Post a single canonical comment on issue #10 with per-checklist statuses. - File follow-up issues for every gap (no inline fixes). - Make the audit re-runnable by capturing artefacts under `.kiro/specs/.../audit//`. ### Non-Goals - Patching any `gap` discovered (R7.3 — strictly verification). - Performance / load testing. - Adding new locales beyond EN / ZH. - Building a permanent CI guard (filed as a follow-up issue, not implemented here). - Live UI / Docker walkthrough — captured as `manual-pending` in this run's report. ## Boundary Commitments ### This Spec Owns - The audit scripts and the captured audit outputs under `.kiro/specs/i18n-e2e-english-verification/audit/`. - The `gap-report.md` artefact and the comment body posted on issue #10. - The grouping rule for follow-up issues (one per category — UI strings, backend log strings, backend LLM-prompt labels, suggested CI guard). - The `pass` / `gap` / `manual-pending` / `review-needed` classification scheme. ### Out of Boundary - Any modification of files under `backend/app/`, `frontend/src/`, or `locales/`. - Fixing the gaps the audit discovers — those land in their own follow-up issues. - Live UI walkthrough, Docker run, or LLM execution. - A permanent CI check — filed as a separate follow-up issue. ### Allowed Dependencies - `git` (for `git grep`, capturing HEAD sha). - `gh` CLI (for the comment + follow-up issues; with documented fallback when unavailable). - `python3` (for the catalogue parity diff). - The repo working tree at HEAD of the working branch. ### Revalidation Triggers - Any merge to `main` that touches `locales/`, `backend/app/`, or `frontend/src/` invalidates the captured audit; a re-run should produce a new `audit//` directory. - A change to issue #10's checklist body (e.g. a new sub-item) requires re-mapping in `gap-report.md`. - A change to the four follow-up categories (e.g. project decides to file one issue per file) requires re-running the issue-filing script with new grouping. ## Architecture ### Existing Architecture Analysis - The MiroFish backend is Flask + Python `Task` workers + an OASIS subprocess (per CLAUDE.md). i18n surfaces are: `vue-i18n` for the SPA, `locales/*.json` shared by both ends, a backend logger that resolves keys per locale, and inline LLM prompts in `backend/app/services/*.py`. - The verification pass does **not** hook into any of these — it reads files only. No Flask blueprint, no `Task` model, no Neo4j query. ### Architecture Pattern & Boundary Map ```mermaid graph TB Verifier[Verifier shell entrypoint] Audit[audit_cjk.sh] Parity[check_parity.py] Classify[classify.py] Report[render_report.py] Comment[post_comment.sh] FollowUp[file_followups.sh] Repo[Working tree] Captures[audit slash sha slash] GH[GitHub via gh CLI] Verifier --> Audit Verifier --> Parity Audit --> Classify Parity --> Classify Classify --> Report Report --> Captures Report --> Comment Report --> FollowUp Audit --> Repo Parity --> Repo Comment --> GH FollowUp --> GH ``` **Architecture Integration**: - **Selected pattern**: Linear pipeline of read-only scripts that each emit a single artefact, composed by a thin shell entrypoint. No mutable state outside `audit//`. - **Domain boundaries**: `audit_cjk.sh` owns the raw grep; `check_parity.py` owns the catalogue diff; `classify.py` owns the four-class labels; `render_report.py` owns the comment body; `post_comment.sh` and `file_followups.sh` own GitHub side effects. - **Existing patterns preserved**: Shell + Python script pair (matches the project's existing `setup`/`run` style); no new test runner, no new linter. - **New components rationale**: Each script is single-purpose so failures (e.g. `gh` permission issues) are isolated and the pipeline can resume from the failed step. - **Steering compliance**: No production-code touch (R7.3); 4-space indent in any committed Python; double quotes; `snake_case`; reserved Bash exits with a non-zero status on any uncaught error. ### Technology Stack | Layer | Choice / Version | Role in Feature | Notes | |-------|------------------|-----------------|-------| | CLI / Audit runner | Bash 5+, `git grep -P` (PCRE) | Run the canonical CJK audit | `\x{...}` ranges require PCRE — `git grep -E` will fail on this regex (verified). | | Static checks | Python 3.11 (project minimum per CLAUDE.md) | Catalogue parity + classification + report rendering | Standard library only — no new deps. | | GitHub integration | `gh` CLI | Post the comment, file follow-ups | Falls back to `audit//PENDING-*` files when missing. | | Output formats | Plain text + Markdown | Captures + comment body | No HTML, no JSON beyond `gh`'s own. | ## File Structure Plan ### Directory Structure ``` .kiro/specs/i18n-e2e-english-verification/ ├── spec.json ├── requirements.md ├── gap-analysis.md ├── research.md ├── design.md ├── tasks.md ├── HANDOFF.md # only if implementation hits the 3-cycle remediation cap └── audit/ ├── scripts/ │ ├── run_audit.sh # entrypoint - chains the steps below │ ├── audit_cjk.sh # git grep PCRE + bucket counts │ ├── check_parity.py # locales/en.json vs zh.json key + identical-value diff │ ├── classify.py # apply 4-class labels to grep matches │ ├── render_report.py # produce gap-report.md + comment-body.md │ ├── post_comment.sh # gh issue comment 10 with comment-body.md (or PENDING-*) │ └── file_followups.sh # gh issue create per category (or PENDING-*) └── / # captured outputs of one verification run ├── cjk-grep.txt # raw `git grep -nP ...` output ├── cjk-grep-bucketed.txt # the same, partitioned by top-level path ├── parity.txt # en/zh diff summary ├── classified.csv # match-by-match label ├── gap-report.md # the canonical structured report ├── comment-body.md # the markdown posted to issue #10 ├── PENDING-issue-10-comment.md # only if gh comment failed └── PENDING-followups/ # only if gh issue create failed ├── 01-frontend-ui-strings.md ├── 02-backend-log-strings.md ├── 03-backend-prompt-labels.md └── 04-permanent-ci-guard.md ``` ### Modified Files - *(None.)* The spec explicitly forbids touching production source. ## System Flows ```mermaid sequenceDiagram participant V as Verifier participant Run as run_audit.sh participant FS as Working tree participant GH as GitHub V->>Run: bash run_audit.sh Run->>FS: git grep -nP, git rev-parse HEAD FS-->>Run: cjk-grep.txt + sha Run->>FS: read locales json FS-->>Run: en/zh dicts Run->>Run: classify Run->>FS: write audit slash sha slash artefacts Run->>GH: gh issue comment 10 alt gh succeeds GH-->>Run: comment URL Run->>GH: gh issue create x N follow-ups GH-->>Run: issue URLs else gh fails Run->>FS: write PENDING markdown to audit slash sha slash end Run-->>V: exit 0 success or exit 2 PENDING ``` **Key flow decisions**: - The audit always writes the captured artefacts to disk first (idempotent, re-runnable). The GitHub side effects are the *last* steps so any earlier failure leaves a complete capture for inspection. - A non-zero `gh` exit shifts the pipeline to PENDING mode rather than failing the whole run; the script exits `2` to flag "audit ran but GitHub side-effects didn't apply". ## Requirements Traceability | Requirement | Summary | Components | Interfaces / Artefacts | Flows | |-------------|---------|------------|------------------------|-------| | 1.1 | Run canonical `git grep` | audit_cjk.sh | `cjk-grep.txt` | Audit step | | 1.2 | Classify each match | classify.py | `classified.csv` | Audit step | | 1.3 | Record file:line + step tag for `gap` | classify.py | `classified.csv` (`step` column) | Audit step | | 1.4 | No file modifications during audit | run_audit.sh | scripts are read-only | — | | 1.5 | `en.json` CJK = always `gap` | classify.py | hard rule in classifier | Audit step | | 2.1 | Enumerate keys recursively | check_parity.py | `parity.txt` | Audit step | | 2.2 | Missing-key gaps recorded | check_parity.py | `parity.txt` (missing-key block) | Audit step | | 2.3 | EN catalogue CJK = `gap` | check_parity.py | `parity.txt` (cjk-in-en block) | Audit step | | 2.4 | EN/ZH identical = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | Audit step | | 2.5 | No catalogue edits | check_parity.py | read-only stdlib JSON load | — | | 3.1 | Enumerate prompt files | classify.py (heuristic — known files list) | `gap-report.md` Section 3 | — | | 3.2 | Confirm locale-aware or EN-only | classify.py | `gap-report.md` Section 3 | — | | 3.3 | Hard-coded ZH directive = `gap` | classify.py | `classified.csv` (`category=prompt-label`) | — | | 3.4 | #3, #4, #5 prompts post-merge check | classify.py | `gap-report.md` Section 3 | — | | 4.1 | Identify handoff boundaries | render_report.py | `gap-report.md` Section 4 | — | | 4.2 | Confirm explicit or re-derived locale | render_report.py | `gap-report.md` Section 4 | — | | 4.3 | Silent default = `gap` | classify.py | `classified.csv` (`category=propagation`) | — | | 4.4 | Backend logger EN under EN | classify.py | `classified.csv` (`category=backend-log`) | — | | 5.1 | Comment lists every checklist item | render_report.py | `comment-body.md` | Comment-post | | 5.2 | Each `gap` includes file:line + follow-up link | render_report.py | `comment-body.md` | Comment-post | | 5.3 | `manual-pending` items state repro steps | render_report.py | `comment-body.md` | Comment-post | | 5.4 | Comment includes raw audit (or path) | render_report.py | `comment-body.md` (path reference) | Comment-post | | 5.5 | Post via `gh issue comment 10` | post_comment.sh | `comment-body.md` | Comment-post | | 6.1 | ZH covers every EN key | check_parity.py | (already passes per gap-analysis) | — | | 6.2 | Locale-aware prompts symmetric | render_report.py | `gap-report.md` Section 6 | — | | 6.3 | EN-only ZH value = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | — | | 6.4 | ZH regression filed as gap | classify.py | `classified.csv` | — | | 7.1 | File issue per gap | file_followups.sh | `gh issue create` | Follow-up | | 7.2 | Group by category | file_followups.sh | one body per category in `PENDING-followups/` | Follow-up | | 7.3 | No production-code edits | run_audit.sh | only writes under `.kiro/specs/.../` | — | | 7.4 | Label follow-ups `i18n` | file_followups.sh | `gh issue create --label i18n` | Follow-up | | 7.5 | Fallback inline list when no `gh` | file_followups.sh | `PENDING-followups/*.md` | Follow-up | | 8.1 | Capture raw output | run_audit.sh | `audit//` directory | Audit step | | 8.2 | Preserve previous run | run_audit.sh | `` subdirectory naming | Audit step | | 8.3 | Record HEAD sha | run_audit.sh | `git rev-parse HEAD` | Audit step | | 8.4 | Idempotent re-run | run_audit.sh | re-running on same sha overwrites that sha's dir | Audit step | ## Components and Interfaces | Component | Domain | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts | |-----------|--------|--------|--------------|--------------------------|-----------| | run_audit.sh | Verification pipeline | Compose the audit and route artefacts | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 | git (P0), python3 (P0), gh (P1) | Batch | | audit_cjk.sh | Static audit | Run `git grep -nP` and bucket | 1.1, 1.5 | git (P0) | Batch | | check_parity.py | Catalogue diff | Diff en/zh + identical-value heuristic | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 | python3 stdlib (P0) | Batch | | classify.py | Classification | Apply the 4-class label per match | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 | cjk-grep.txt (P0), parity.txt (P0) | Batch | | render_report.py | Report assembly | Produce gap-report.md + comment-body.md | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 | classified.csv (P0) | Batch | | post_comment.sh | GitHub side-effect | Post the comment on #10 | 5.5 | gh (P0), comment-body.md (P0) | Service | | file_followups.sh | GitHub side-effect | Open follow-up issues | 7.1, 7.2, 7.4, 7.5 | gh (P0), PENDING-followups/* (P0) | Service | ### Verification pipeline #### `run_audit.sh` | Field | Detail | |-------|--------| | Intent | Single shell entrypoint that runs every step in order and persists artefacts under `audit//` | | Requirements | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 | **Responsibilities & Constraints** - Must NOT modify any file outside `.kiro/specs/i18n-e2e-english-verification/`. - Must capture HEAD sha before any other step (so the artefact path is set). - Must exit `0` on full success (audit + GitHub side effects) and `2` on PENDING (audit succeeded, side effects didn't). - Must be safely re-runnable on the same sha (overwriting that sha's directory is acceptable). **Dependencies** - Inbound: invoked manually by the verifier (`bash run_audit.sh`) — Criticality: P0. - Outbound: `audit_cjk.sh`, `check_parity.py`, `classify.py`, `render_report.py`, `post_comment.sh`, `file_followups.sh` — Criticality: P0 each. - External: `git`, `python3`, `gh` (P1 — fallback supported). **Contracts**: Service [ ] / API [ ] / Event [ ] / Batch [x] / State [ ] ##### Batch / Job Contract - **Trigger**: manual `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`. - **Input / validation**: working tree at any commit; rejects detached non-clean trees? — no, the audit reads tracked files only via `git grep`, so unstaged edits are ignored deliberately. - **Output / destination**: `.kiro/specs/i18n-e2e-english-verification/audit//`. - **Idempotency & recovery**: Re-running on the same sha overwrites that sha's directory. PENDING outputs survive across runs until a `gh`-enabled run replaces them. **Implementation Notes** - Integration: invoked by humans only — no CI hookup in this spec. - Validation: confirm `gh auth status` before attempting comment/issue posts; on failure, branch to PENDING. - Risks: shell quoting around the PCRE pattern (`[\x{4e00}-\x{9fff}]`) — use single-quoted argument to `git grep -P`. #### `audit_cjk.sh` | Field | Detail | |-------|--------| | Intent | Run the canonical PCRE grep + per-bucket counts | | Requirements | 1.1, 1.5 | **Responsibilities & Constraints** - Output: `cjk-grep.txt` (raw `git grep -nP` lines) and `cjk-grep-bucketed.txt` (one section per top-level path: `backend/app`, `frontend/src`, `locales/en.json`). - Excludes binary file matches (e.g. `.jpeg` false positives). **Dependencies** - Inbound: `run_audit.sh` (P0). - External: `git` 2.x (P0 — must support `-P` for PCRE). **Contracts**: Batch [x] ##### Batch / Job Contract - **Trigger**: invoked by `run_audit.sh`. - **Input / validation**: receives the target output directory as argv[1]; aborts if missing. - **Output / destination**: `cjk-grep.txt`, `cjk-grep-bucketed.txt` in `/`. - **Idempotency & recovery**: deterministic — same tree → same output. **Implementation Notes** - Integration: pure read-only against `git`. - Validation: `git --version` precondition; abort with a clear error if PCRE unsupported. - Risks: ripgrep is NOT used (avoids a hard `rg` dependency); `git grep -P` is built-in to git's PCRE2 binding. #### `check_parity.py` | Field | Detail | |-------|--------| | Intent | Compare `locales/en.json` and `locales/zh.json`: key parity, CJK in EN, identical-value heuristic | | Requirements | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 | **Responsibilities & Constraints** - Recursively flattens nested-dict keys with dotted paths. - Reports three blocks: `missing-keys`, `cjk-in-en`, `identical-values`. - Treats values as `review-needed` only if (a) en value == zh value, (b) value is non-empty, (c) value is more than two ASCII words. **Dependencies** - Inbound: `run_audit.sh` (P0). - External: `json` from Python stdlib (P0). **Contracts**: Batch [x] ##### Batch / Job Contract - **Trigger**: invoked by `run_audit.sh` with the `` directory as argv[1]. - **Input / validation**: reads `locales/en.json` and `locales/zh.json` from cwd (must be invoked from repo root); fails fast on JSON parse error. - **Output / destination**: `parity.txt` in `/`. - **Idempotency & recovery**: pure function of catalogue contents. **Implementation Notes** - Integration: invoked from repo root so relative paths resolve. - Validation: parse-on-load, both files must be objects. - Risks: the "more than two ASCII words" heuristic may produce noise — `review-needed` is intentionally a soft label not a `gap`. #### `classify.py` | Field | Detail | |-------|--------| | Intent | Apply the 4-class label (`deliberate` / `gap` / `non-applicable` / `review-needed`) and a category tag per match | | Requirements | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 | **Responsibilities & Constraints** - Reads `cjk-grep.txt` and `parity.txt`; emits `classified.csv` with columns: `file`, `line`, `match`, `class`, `category`, `pipeline_step`. - Categories (closed set): `frontend-ui-string`, `frontend-regex-parser`, `backend-docstring`, `backend-comment`, `backend-log`, `backend-prompt-label`, `propagation`, `catalogue-parity`, `binary-false-positive`. - Pipeline-step tags (closed set): `Graph Build`, `Env Setup`, `Simulation`, `Report`, `Interaction`, `Logs`, `UI`, `n/a`. - Classification rules: - `locales/en.json` CJK → always `gap` / `catalogue-parity` / `n/a` (R1.5). - File path under `frontend/src/views/` or `frontend/src/components/` AND match is inside a string literal (heuristic: enclosed in `'…'`/`"…"`/`` `…` ``) → `gap` / `frontend-ui-string`. - Match inside a `text.match(/.../)` call in a `.vue` file → `frontend-regex-parser` / `gap` (cause: backend emits CJK). - Backend `.py` file, line starts with `#` or appears inside a triple-quoted docstring → `deliberate-blocked-by-#7` / `backend-docstring` (or `backend-comment`) — counted but not filed as a fresh follow-up since #7 already covers it. - Backend `.py` file, line contains `logger.`, `log.`, `print(` and CJK in a string literal → `gap` / `backend-log` / appropriate step tag. - Backend `.py` file in `services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py` and CJK appears inside an LLM-prompt context label (heuristic: a string literal not preceded by `#`) → `gap` / `backend-prompt-label`. - Binary files (e.g. `.jpeg` ripgrep matches): `non-applicable` / `binary-false-positive`. - Anything else: `review-needed` (forces a human look). **Dependencies** - Inbound: `audit_cjk.sh`, `check_parity.py` (P0). - External: `csv` from Python stdlib. **Contracts**: Batch [x] ##### Batch / Job Contract - **Trigger**: invoked by `run_audit.sh` after the two preceding steps. - **Input / validation**: `cjk-grep.txt` and `parity.txt` must exist in `/`. - **Output / destination**: `classified.csv`. - **Idempotency & recovery**: deterministic — same inputs → same csv. **Implementation Notes** - Integration: classification rules are heuristics, not a parser; correctness is bounded by careful regexes and an explicit "fallthrough = `review-needed`" rule. - Validation: every input row produces an output row (no silent drops); a count-equality assertion runs at the end. - Risks: false negatives (e.g. a Chinese log string that doesn't contain `logger.` on the same line) — `review-needed` fallthrough catches these. #### `render_report.py` | Field | Detail | |-------|--------| | Intent | Produce `gap-report.md` and `comment-body.md` | | Requirements | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 | **Responsibilities & Constraints** - `gap-report.md`: Sections: Overview, Section 1 (static audit), Section 2 (parity), Section 3 (prompt verification), Section 4 (propagation), Section 5 (issue-#10 checklist mapping), Section 6 (ZH regression), Section 7 (follow-up plan). - `comment-body.md`: Markdown comment for issue #10 — mirrors the issue's checklist with `pass` / `gap` / `manual-pending` for each line, plus a "How to re-run" footer. - Reads `classified.csv` and the issue body (snapshot at `.ticket/10.md`). **Dependencies** - Inbound: `classify.py` (P0), `.ticket/10.md` (P0). - External: Python stdlib only. **Contracts**: Batch [x] ##### Batch / Job Contract - **Trigger**: `run_audit.sh` after `classify.py`. - **Input / validation**: `classified.csv` and `.ticket/10.md` must exist. - **Output / destination**: `gap-report.md`, `comment-body.md` in `/`. - **Idempotency & recovery**: deterministic. **Implementation Notes** - Integration: the comment body must include a `Run on commit ` header so the comment is traceable. - Validation: confirm every issue-body checkbox has been mapped (count check). - Risks: rendering CJK characters in markdown — Python writes UTF-8 by default; comment body is verified to round-trip via `gh`. #### `post_comment.sh` | Field | Detail | |-------|--------| | Intent | Post `comment-body.md` as a comment on issue #10 | | Requirements | 5.5 | **Responsibilities & Constraints** - `gh issue comment 10 --repo salestech-group/MiroFish --body-file /comment-body.md`. - On non-zero exit, copies the body to `/PENDING-issue-10-comment.md` and exits non-zero. **Dependencies** - External: `gh` (P0; degrades to PENDING when missing). **Contracts**: Service [x] ##### Service Interface ```text post_comment.sh precondition: /comment-body.md exists postcondition (success): comment posted; URL printed to stdout postcondition (failure): /PENDING-issue-10-comment.md present; exit code 2 ``` **Implementation Notes** - Integration: must be the second-to-last step (so failures don't block the issue-filing fallback). - Validation: parses `gh`'s URL output and writes it to `/comment-url.txt` on success. - Risks: PR-time rate limits — unlikely for a single comment. #### `file_followups.sh` | Field | Detail | |-------|--------| | Intent | Open one follow-up issue per gap category | | Requirements | 7.1, 7.2, 7.4, 7.5 | **Responsibilities & Constraints** - Iterates `/PENDING-followups/*.md` (which `render_report.py` always writes; the ones whose category had zero gaps stay empty placeholders). - For each non-empty body, runs `gh issue create --repo salestech-group/MiroFish --title --body-file <body> --label i18n`. - On `gh` failure for any single category, leaves the corresponding `PENDING-followups/<n>-*.md` in place and exits non-zero at the end (after attempting all categories). **Dependencies** - External: `gh` (P0; degrades to PENDING). **Contracts**: Service [x] ##### Service Interface ```text file_followups.sh <sha-dir> precondition: <sha-dir>/PENDING-followups/*.md exist (possibly empty placeholders) postcondition (success): all non-empty bodies posted; URLs appended to <sha-dir>/followup-urls.txt; bodies removed from PENDING-followups/ postcondition (partial): URLs in followup-urls.txt for the ones that posted; the rest stay in PENDING-followups/; exit code 2 ``` **Implementation Notes** - Integration: must be the last step. - Validation: post-hoc count check (`gh` URLs + remaining PENDING bodies = total categories). - Risks: a category that the spec already considers covered (e.g. backend docstrings → blocked by #7) is not re-filed; the spec's category list is closed and excludes that case. ## Data Models ### Domain Model The audit operates on three logical concepts: - **Match** — a single line of `git grep` output. `(file, line, raw_text)`. - **Classification** — `(match, class ∈ {deliberate, gap, non-applicable, review-needed}, category ∈ closed-set, pipeline_step ∈ closed-set)`. - **Follow-up** — `(category, title, body, status ∈ {posted, pending}, url?)`. Invariant: every `Match` produces exactly one `Classification`; every `Classification` with `class == gap` belongs to exactly one `Follow-up` category (which may aggregate multiple gaps). ### Logical Data Model **`classified.csv` schema** (CSV, UTF-8, header row): | Column | Type | Notes | |--------|------|-------| | `file` | string | repo-relative path | | `line` | int | 1-indexed | | `match` | string | trimmed grep line | | `class` | enum | `deliberate` / `gap` / `non-applicable` / `review-needed` | | `category` | enum | closed set listed in classify.py rules | | `pipeline_step` | enum | closed set listed in classify.py rules | Natural key: `(file, line)`. **`parity.txt` structure** (text, three labelled blocks): ``` [missing-keys] en-only: <key.path> zh-only: <key.path> [cjk-in-en] <key.path>: <value snippet> [identical-values] <key.path>: <value> # review-needed if non-trivial English prose ``` ### Data Contracts & Integration - **`comment-body.md`** must be valid GitHub-flavoured Markdown; checkbox lines preserve the issue's original ordering. - **Follow-up issue body** must be valid GitHub-flavoured Markdown; first line is a one-sentence summary; subsequent sections are: `## Evidence` (file:line list), `## Linked from` (#10 + comment URL), `## Acceptance` (a small checklist). ## Error Handling ### Error Strategy - **Read-only operations** (steps 1–4): on any uncaught error (missing file, JSON parse error), the script aborts with a non-zero exit before any artefact is half-written. The orchestrator uses `set -euo pipefail`. - **GitHub side effects** (steps 5–6): wrapped — failure routes to PENDING outputs and the orchestrator exits `2`. ### Error Categories and Responses - **User errors**: invoked from wrong directory → fail fast with "must be run from repo root". - **System errors**: `git`/`python3`/`gh` missing → fail fast with "install <tool>"; `gh auth status` not OK → branch to PENDING. - **Business errors**: classification produces 0 matches but `cjk-grep.txt` non-empty → assertion failure (count-equality bug). ### Monitoring - The orchestrator prints a one-line status per step. - Final summary block to stdout: total matches, gaps, `manual-pending`, follow-ups posted vs PENDING. ## Testing Strategy - **Unit tests**: not introduced — the scripts are simple enough that a one-shot dry run on the live tree is the canonical validation. - **Integration test**: a single `bash run_audit.sh` against the working tree; success criteria below. - **Validation checklist** (run during implementation): - The audit produces a non-empty `cjk-grep.txt`. - `parity.txt` reports 0 missing keys (matches the live state at HEAD). - `classified.csv` row count == `cjk-grep.txt` line count. - `gap-report.md` and `comment-body.md` parse as valid markdown (manual eyeball — no toolchain required). - The classifier marks every `locales/en.json` CJK as `gap` (currently zero such matches, so this asserts the negative). - With `gh` available: a comment is posted on #10 and follow-up issues are created. - With `gh` simulated as absent (e.g. `PATH=/dev/null`): PENDING outputs appear under `<sha>/`. ### Out of scope for testing - The live UI walkthrough is `manual-pending` (R5.3) and not part of the test plan. - Performance, scalability, security: nothing to test — read-only single-shot scripts.