MicroFish/.kiro/specs/i18n-e2e-english-verification/design.md

# Design — i18n-e2e-english-verification

## Overview

**Purpose**: This spec produces a deterministic, re-runnable verification pass that proves (or disproves) the MiroFish 5-step pipeline runs cleanly in English, and posts a structured report on issue #10 with a `pass` / `gap` / `manual-pending` status per checklist item.

**Users**: i18n maintainers reviewing the epic (#11), and any future verifier re-running the audit after subsequent merges. The deliverable is read by humans on GitHub (issue comment) and re-run by humans (or CI in a future iteration) to confirm parity.

**Impact**: No production code is modified. The repository gains one new directory tree (`.kiro/specs/i18n-e2e-english-verification/`) containing the spec, the audit scripts, and the captured outputs. One GitHub comment is posted on #10. Up to four follow-up issues are filed.

### Goals

- Static-audit `backend/app`, `frontend/src`, `locales/en.json` for CJK characters; classify every match.
- Verify EN / ZH locale catalogue parity and flag suspect untranslated entries.
- Verify LLM-prompt assets respect the requested locale.
- Document locale-propagation gaps across Flask → `Task` → OASIS subprocess → ReACT agent.
- Post a single canonical comment on issue #10 with per-checklist statuses.
- File follow-up issues for every gap (no inline fixes).
- Make the audit re-runnable by capturing artefacts under `.kiro/specs/.../audit/<commit-sha>/`.

### Non-Goals

- Patching any `gap` discovered (R7.3 — strictly verification).
- Performance / load testing.
- Adding new locales beyond EN / ZH.
- Building a permanent CI guard (filed as a follow-up issue, not implemented here).
- Live UI / Docker walkthrough — captured as `manual-pending` in this run's report.

## Boundary Commitments

### This Spec Owns

- The audit scripts and the captured audit outputs under `.kiro/specs/i18n-e2e-english-verification/audit/`.
- The `gap-report.md` artefact and the comment body posted on issue #10.
- The grouping rule for follow-up issues (one per category — UI strings, backend log strings, backend LLM-prompt labels, suggested CI guard).
- The `pass` / `gap` / `manual-pending` / `review-needed` classification scheme.

### Out of Boundary

- Any modification of files under `backend/app/`, `frontend/src/`, or `locales/`.
- Fixing the gaps the audit discovers — those land in their own follow-up issues.
- Live UI walkthrough, Docker run, or LLM execution.
- A permanent CI check — filed as a separate follow-up issue.

### Allowed Dependencies

- `git` (for `git grep`, capturing HEAD sha).
- `gh` CLI (for the comment + follow-up issues; with documented fallback when unavailable).
- `python3` (for the catalogue parity diff).
- The repo working tree at HEAD of the working branch.

### Revalidation Triggers

- Any merge to `main` that touches `locales/`, `backend/app/`, or `frontend/src/` invalidates the captured audit; a re-run should produce a new `audit/<commit-sha>/` directory.
- A change to issue #10's checklist body (e.g. a new sub-item) requires re-mapping in `gap-report.md`.
- A change to the four follow-up categories (e.g. project decides to file one issue per file) requires re-running the issue-filing script with new grouping.

## Architecture

### Existing Architecture Analysis

- The MiroFish backend is Flask + Python `Task` workers + an OASIS subprocess (per CLAUDE.md). i18n surfaces are: `vue-i18n` for the SPA, `locales/*.json` shared by both ends, a backend logger that resolves keys per locale, and inline LLM prompts in `backend/app/services/*.py`.
- The verification pass does **not** hook into any of these — it reads files only. No Flask blueprint, no `Task` model, no Neo4j query.

### Architecture Pattern & Boundary Map

```mermaid
graph TB
    Verifier[Verifier shell entrypoint]
    Audit[audit_cjk.sh]
    Parity[check_parity.py]
    Classify[classify.py]
    Report[render_report.py]
    Comment[post_comment.sh]
    FollowUp[file_followups.sh]

    Repo[Working tree]
    Captures[audit slash sha slash]
    GH[GitHub via gh CLI]

    Verifier --> Audit
    Verifier --> Parity
    Audit --> Classify
    Parity --> Classify
    Classify --> Report
    Report --> Captures
    Report --> Comment
    Report --> FollowUp
    Audit --> Repo
    Parity --> Repo
    Comment --> GH
    FollowUp --> GH
```

**Architecture Integration**:

- **Selected pattern**: Linear pipeline of read-only scripts that each emit a single artefact, composed by a thin shell entrypoint. No mutable state outside `audit/<sha>/`.
- **Domain boundaries**: `audit_cjk.sh` owns the raw grep; `check_parity.py` owns the catalogue diff; `classify.py` owns the four-class labels; `render_report.py` owns the comment body; `post_comment.sh` and `file_followups.sh` own GitHub side effects.
- **Existing patterns preserved**: Shell + Python script pair (matches the project's existing `setup`/`run` style); no new test runner, no new linter.
- **New components rationale**: Each script is single-purpose so failures (e.g. `gh` permission issues) are isolated and the pipeline can resume from the failed step.
- **Steering compliance**: No production-code touch (R7.3); 4-space indent in any committed Python; double quotes; `snake_case`; reserved Bash exits with a non-zero status on any uncaught error.

### Technology Stack

| Layer | Choice / Version | Role in Feature | Notes |
|-------|------------------|-----------------|-------|
| CLI / Audit runner | Bash 5+, `git grep -P` (PCRE) | Run the canonical CJK audit | `\x{...}` ranges require PCRE — `git grep -E` will fail on this regex (verified). |
| Static checks | Python 3.11 (project minimum per CLAUDE.md) | Catalogue parity + classification + report rendering | Standard library only — no new deps. |
| GitHub integration | `gh` CLI | Post the comment, file follow-ups | Falls back to `audit/<sha>/PENDING-*` files when missing. |
| Output formats | Plain text + Markdown | Captures + comment body | No HTML, no JSON beyond `gh`'s own. |

## File Structure Plan

### Directory Structure

```
.kiro/specs/i18n-e2e-english-verification/
├── spec.json
├── requirements.md
├── gap-analysis.md
├── research.md
├── design.md
├── tasks.md
├── HANDOFF.md          # only if implementation hits the 3-cycle remediation cap
└── audit/
    ├── scripts/
    │   ├── run_audit.sh          # entrypoint - chains the steps below
    │   ├── audit_cjk.sh          # git grep PCRE + bucket counts
    │   ├── check_parity.py       # locales/en.json vs zh.json key + identical-value diff
    │   ├── classify.py           # apply 4-class labels to grep matches
    │   ├── render_report.py      # produce gap-report.md + comment-body.md
    │   ├── post_comment.sh       # gh issue comment 10 with comment-body.md (or PENDING-*)
    │   └── file_followups.sh     # gh issue create per category (or PENDING-*)
    └── <commit-sha>/             # captured outputs of one verification run
        ├── cjk-grep.txt          # raw `git grep -nP ...` output
        ├── cjk-grep-bucketed.txt # the same, partitioned by top-level path
        ├── parity.txt            # en/zh diff summary
        ├── classified.csv        # match-by-match label
        ├── gap-report.md         # the canonical structured report
        ├── comment-body.md       # the markdown posted to issue #10
        ├── PENDING-issue-10-comment.md          # only if gh comment failed
        └── PENDING-followups/                   # only if gh issue create failed
            ├── 01-frontend-ui-strings.md
            ├── 02-backend-log-strings.md
            ├── 03-backend-prompt-labels.md
            └── 04-permanent-ci-guard.md
```

### Modified Files

- *(None.)* The spec explicitly forbids touching production source.

## System Flows

```mermaid
sequenceDiagram
    participant V as Verifier
    participant Run as run_audit.sh
    participant FS as Working tree
    participant GH as GitHub

    V->>Run: bash run_audit.sh
    Run->>FS: git grep -nP, git rev-parse HEAD
    FS-->>Run: cjk-grep.txt + sha
    Run->>FS: read locales json
    FS-->>Run: en/zh dicts
    Run->>Run: classify
    Run->>FS: write audit slash sha slash artefacts
    Run->>GH: gh issue comment 10
    alt gh succeeds
        GH-->>Run: comment URL
        Run->>GH: gh issue create x N follow-ups
        GH-->>Run: issue URLs
    else gh fails
        Run->>FS: write PENDING markdown to audit slash sha slash
    end
    Run-->>V: exit 0 success or exit 2 PENDING
```

**Key flow decisions**:

- The audit always writes the captured artefacts to disk first (idempotent, re-runnable). The GitHub side effects are the *last* steps so any earlier failure leaves a complete capture for inspection.
- A non-zero `gh` exit shifts the pipeline to PENDING mode rather than failing the whole run; the script exits `2` to flag "audit ran but GitHub side-effects didn't apply".

## Requirements Traceability

| Requirement | Summary | Components | Interfaces / Artefacts | Flows |
|-------------|---------|------------|------------------------|-------|
| 1.1 | Run canonical `git grep` | audit_cjk.sh | `cjk-grep.txt` | Audit step |
| 1.2 | Classify each match | classify.py | `classified.csv` | Audit step |
| 1.3 | Record file:line + step tag for `gap` | classify.py | `classified.csv` (`step` column) | Audit step |
| 1.4 | No file modifications during audit | run_audit.sh | scripts are read-only | — |
| 1.5 | `en.json` CJK = always `gap` | classify.py | hard rule in classifier | Audit step |
| 2.1 | Enumerate keys recursively | check_parity.py | `parity.txt` | Audit step |
| 2.2 | Missing-key gaps recorded | check_parity.py | `parity.txt` (missing-key block) | Audit step |
| 2.3 | EN catalogue CJK = `gap` | check_parity.py | `parity.txt` (cjk-in-en block) | Audit step |
| 2.4 | EN/ZH identical = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | Audit step |
| 2.5 | No catalogue edits | check_parity.py | read-only stdlib JSON load | — |
| 3.1 | Enumerate prompt files | classify.py (heuristic — known files list) | `gap-report.md` Section 3 | — |
| 3.2 | Confirm locale-aware or EN-only | classify.py | `gap-report.md` Section 3 | — |
| 3.3 | Hard-coded ZH directive = `gap` | classify.py | `classified.csv` (`category=prompt-label`) | — |
| 3.4 | #3, #4, #5 prompts post-merge check | classify.py | `gap-report.md` Section 3 | — |
| 4.1 | Identify handoff boundaries | render_report.py | `gap-report.md` Section 4 | — |
| 4.2 | Confirm explicit or re-derived locale | render_report.py | `gap-report.md` Section 4 | — |
| 4.3 | Silent default = `gap` | classify.py | `classified.csv` (`category=propagation`) | — |
| 4.4 | Backend logger EN under EN | classify.py | `classified.csv` (`category=backend-log`) | — |
| 5.1 | Comment lists every checklist item | render_report.py | `comment-body.md` | Comment-post |
| 5.2 | Each `gap` includes file:line + follow-up link | render_report.py | `comment-body.md` | Comment-post |
| 5.3 | `manual-pending` items state repro steps | render_report.py | `comment-body.md` | Comment-post |
| 5.4 | Comment includes raw audit (or path) | render_report.py | `comment-body.md` (path reference) | Comment-post |
| 5.5 | Post via `gh issue comment 10` | post_comment.sh | `comment-body.md` | Comment-post |
| 6.1 | ZH covers every EN key | check_parity.py | (already passes per gap-analysis) | — |
| 6.2 | Locale-aware prompts symmetric | render_report.py | `gap-report.md` Section 6 | — |
| 6.3 | EN-only ZH value = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | — |
| 6.4 | ZH regression filed as gap | classify.py | `classified.csv` | — |
| 7.1 | File issue per gap | file_followups.sh | `gh issue create` | Follow-up |
| 7.2 | Group by category | file_followups.sh | one body per category in `PENDING-followups/` | Follow-up |
| 7.3 | No production-code edits | run_audit.sh | only writes under `.kiro/specs/.../` | — |
| 7.4 | Label follow-ups `i18n` | file_followups.sh | `gh issue create --label i18n` | Follow-up |
| 7.5 | Fallback inline list when no `gh` | file_followups.sh | `PENDING-followups/*.md` | Follow-up |
| 8.1 | Capture raw output | run_audit.sh | `audit/<sha>/` directory | Audit step |
| 8.2 | Preserve previous run | run_audit.sh | `<sha>` subdirectory naming | Audit step |
| 8.3 | Record HEAD sha | run_audit.sh | `git rev-parse HEAD` | Audit step |
| 8.4 | Idempotent re-run | run_audit.sh | re-running on same sha overwrites that sha's dir | Audit step |

## Components and Interfaces

| Component | Domain | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|-----------|--------|--------|--------------|--------------------------|-----------|
| run_audit.sh | Verification pipeline | Compose the audit and route artefacts | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 | git (P0), python3 (P0), gh (P1) | Batch |
| audit_cjk.sh | Static audit | Run `git grep -nP` and bucket | 1.1, 1.5 | git (P0) | Batch |
| check_parity.py | Catalogue diff | Diff en/zh + identical-value heuristic | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 | python3 stdlib (P0) | Batch |
| classify.py | Classification | Apply the 4-class label per match | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 | cjk-grep.txt (P0), parity.txt (P0) | Batch |
| render_report.py | Report assembly | Produce gap-report.md + comment-body.md | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 | classified.csv (P0) | Batch |
| post_comment.sh | GitHub side-effect | Post the comment on #10 | 5.5 | gh (P0), comment-body.md (P0) | Service |
| file_followups.sh | GitHub side-effect | Open follow-up issues | 7.1, 7.2, 7.4, 7.5 | gh (P0), PENDING-followups/* (P0) | Service |

### Verification pipeline

#### `run_audit.sh`

| Field | Detail |
|-------|--------|
| Intent | Single shell entrypoint that runs every step in order and persists artefacts under `audit/<commit-sha>/` |
| Requirements | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 |

**Responsibilities & Constraints**

- Must NOT modify any file outside `.kiro/specs/i18n-e2e-english-verification/`.
- Must capture HEAD sha before any other step (so the artefact path is set).
- Must exit `0` on full success (audit + GitHub side effects) and `2` on PENDING (audit succeeded, side effects didn't).
- Must be safely re-runnable on the same sha (overwriting that sha's directory is acceptable).

**Dependencies**

- Inbound: invoked manually by the verifier (`bash run_audit.sh`) — Criticality: P0.
- Outbound: `audit_cjk.sh`, `check_parity.py`, `classify.py`, `render_report.py`, `post_comment.sh`, `file_followups.sh` — Criticality: P0 each.
- External: `git`, `python3`, `gh` (P1 — fallback supported).

**Contracts**: Service [ ] / API [ ] / Event [ ] / Batch [x] / State [ ]

##### Batch / Job Contract

- **Trigger**: manual `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.
- **Input / validation**: working tree at any commit; rejects detached non-clean trees? — no, the audit reads tracked files only via `git grep`, so unstaged edits are ignored deliberately.
- **Output / destination**: `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.
- **Idempotency & recovery**: Re-running on the same sha overwrites that sha's directory. PENDING outputs survive across runs until a `gh`-enabled run replaces them.

**Implementation Notes**

- Integration: invoked by humans only — no CI hookup in this spec.
- Validation: confirm `gh auth status` before attempting comment/issue posts; on failure, branch to PENDING.
- Risks: shell quoting around the PCRE pattern (`[\x{4e00}-\x{9fff}]`) — use single-quoted argument to `git grep -P`.

#### `audit_cjk.sh`

| Field | Detail |
|-------|--------|
| Intent | Run the canonical PCRE grep + per-bucket counts |
| Requirements | 1.1, 1.5 |

**Responsibilities & Constraints**

- Output: `cjk-grep.txt` (raw `git grep -nP` lines) and `cjk-grep-bucketed.txt` (one section per top-level path: `backend/app`, `frontend/src`, `locales/en.json`).
- Excludes binary file matches (e.g. `.jpeg` false positives).

**Dependencies**

- Inbound: `run_audit.sh` (P0).
- External: `git` 2.x (P0 — must support `-P` for PCRE).

**Contracts**: Batch [x]

##### Batch / Job Contract

- **Trigger**: invoked by `run_audit.sh`.
- **Input / validation**: receives the target output directory as argv[1]; aborts if missing.
- **Output / destination**: `cjk-grep.txt`, `cjk-grep-bucketed.txt` in `<sha>/`.
- **Idempotency & recovery**: deterministic — same tree → same output.

**Implementation Notes**

- Integration: pure read-only against `git`.
- Validation: `git --version` precondition; abort with a clear error if PCRE unsupported.
- Risks: ripgrep is NOT used (avoids a hard `rg` dependency); `git grep -P` is built-in to git's PCRE2 binding.

#### `check_parity.py`

| Field | Detail |
|-------|--------|
| Intent | Compare `locales/en.json` and `locales/zh.json`: key parity, CJK in EN, identical-value heuristic |
| Requirements | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 |

**Responsibilities & Constraints**

- Recursively flattens nested-dict keys with dotted paths.
- Reports three blocks: `missing-keys`, `cjk-in-en`, `identical-values`.
- Treats values as `review-needed` only if (a) en value == zh value, (b) value is non-empty, (c) value is more than two ASCII words.

**Dependencies**

- Inbound: `run_audit.sh` (P0).
- External: `json` from Python stdlib (P0).

**Contracts**: Batch [x]

##### Batch / Job Contract

- **Trigger**: invoked by `run_audit.sh` with the `<sha>` directory as argv[1].
- **Input / validation**: reads `locales/en.json` and `locales/zh.json` from cwd (must be invoked from repo root); fails fast on JSON parse error.
- **Output / destination**: `parity.txt` in `<sha>/`.
- **Idempotency & recovery**: pure function of catalogue contents.

**Implementation Notes**

- Integration: invoked from repo root so relative paths resolve.
- Validation: parse-on-load, both files must be objects.
- Risks: the "more than two ASCII words" heuristic may produce noise — `review-needed` is intentionally a soft label not a `gap`.

#### `classify.py`

| Field | Detail |
|-------|--------|
| Intent | Apply the 4-class label (`deliberate` / `gap` / `non-applicable` / `review-needed`) and a category tag per match |
| Requirements | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 |

**Responsibilities & Constraints**

- Reads `cjk-grep.txt` and `parity.txt`; emits `classified.csv` with columns: `file`, `line`, `match`, `class`, `category`, `pipeline_step`.
- Categories (closed set): `frontend-ui-string`, `frontend-regex-parser`, `backend-docstring`, `backend-comment`, `backend-log`, `backend-prompt-label`, `propagation`, `catalogue-parity`, `binary-false-positive`.
- Pipeline-step tags (closed set): `Graph Build`, `Env Setup`, `Simulation`, `Report`, `Interaction`, `Logs`, `UI`, `n/a`.
- Classification rules:
  - `locales/en.json` CJK → always `gap` / `catalogue-parity` / `n/a` (R1.5).
  - File path under `frontend/src/views/` or `frontend/src/components/` AND match is inside a string literal (heuristic: enclosed in `'…'`/`"…"`/`` `…` ``) → `gap` / `frontend-ui-string`.
  - Match inside a `text.match(/.../)` call in a `.vue` file → `frontend-regex-parser` / `gap` (cause: backend emits CJK).
  - Backend `.py` file, line starts with `#` or appears inside a triple-quoted docstring → `deliberate-blocked-by-#7` / `backend-docstring` (or `backend-comment`) — counted but not filed as a fresh follow-up since #7 already covers it.
  - Backend `.py` file, line contains `logger.`, `log.`, `print(` and CJK in a string literal → `gap` / `backend-log` / appropriate step tag.
  - Backend `.py` file in `services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py` and CJK appears inside an LLM-prompt context label (heuristic: a string literal not preceded by `#`) → `gap` / `backend-prompt-label`.
  - Binary files (e.g. `.jpeg` ripgrep matches): `non-applicable` / `binary-false-positive`.
  - Anything else: `review-needed` (forces a human look).

**Dependencies**

- Inbound: `audit_cjk.sh`, `check_parity.py` (P0).
- External: `csv` from Python stdlib.

**Contracts**: Batch [x]

##### Batch / Job Contract

- **Trigger**: invoked by `run_audit.sh` after the two preceding steps.
- **Input / validation**: `cjk-grep.txt` and `parity.txt` must exist in `<sha>/`.
- **Output / destination**: `classified.csv`.
- **Idempotency & recovery**: deterministic — same inputs → same csv.

**Implementation Notes**

- Integration: classification rules are heuristics, not a parser; correctness is bounded by careful regexes and an explicit "fallthrough = `review-needed`" rule.
- Validation: every input row produces an output row (no silent drops); a count-equality assertion runs at the end.
- Risks: false negatives (e.g. a Chinese log string that doesn't contain `logger.` on the same line) — `review-needed` fallthrough catches these.

#### `render_report.py`

| Field | Detail |
|-------|--------|
| Intent | Produce `gap-report.md` and `comment-body.md` |
| Requirements | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 |

**Responsibilities & Constraints**

- `gap-report.md`: Sections: Overview, Section 1 (static audit), Section 2 (parity), Section 3 (prompt verification), Section 4 (propagation), Section 5 (issue-#10 checklist mapping), Section 6 (ZH regression), Section 7 (follow-up plan).
- `comment-body.md`: Markdown comment for issue #10 — mirrors the issue's checklist with `pass` / `gap` / `manual-pending` for each line, plus a "How to re-run" footer.
- Reads `classified.csv` and the issue body (snapshot at `.ticket/10.md`).

**Dependencies**

- Inbound: `classify.py` (P0), `.ticket/10.md` (P0).
- External: Python stdlib only.

**Contracts**: Batch [x]

##### Batch / Job Contract

- **Trigger**: `run_audit.sh` after `classify.py`.
- **Input / validation**: `classified.csv` and `.ticket/10.md` must exist.
- **Output / destination**: `gap-report.md`, `comment-body.md` in `<sha>/`.
- **Idempotency & recovery**: deterministic.

**Implementation Notes**

- Integration: the comment body must include a `Run on commit <sha>` header so the comment is traceable.
- Validation: confirm every issue-body checkbox has been mapped (count check).
- Risks: rendering CJK characters in markdown — Python writes UTF-8 by default; comment body is verified to round-trip via `gh`.

#### `post_comment.sh`

| Field | Detail |
|-------|--------|
| Intent | Post `comment-body.md` as a comment on issue #10 |
| Requirements | 5.5 |

**Responsibilities & Constraints**

- `gh issue comment 10 --repo salestech-group/MiroFish --body-file <sha>/comment-body.md`.
- On non-zero exit, copies the body to `<sha>/PENDING-issue-10-comment.md` and exits non-zero.

**Dependencies**

- External: `gh` (P0; degrades to PENDING when missing).

**Contracts**: Service [x]

##### Service Interface

```text
post_comment.sh <sha-dir>
  precondition: <sha-dir>/comment-body.md exists
  postcondition (success): comment posted; URL printed to stdout
  postcondition (failure): <sha-dir>/PENDING-issue-10-comment.md present; exit code 2
```

**Implementation Notes**

- Integration: must be the second-to-last step (so failures don't block the issue-filing fallback).
- Validation: parses `gh`'s URL output and writes it to `<sha>/comment-url.txt` on success.
- Risks: PR-time rate limits — unlikely for a single comment.

#### `file_followups.sh`

| Field | Detail |
|-------|--------|
| Intent | Open one follow-up issue per gap category |
| Requirements | 7.1, 7.2, 7.4, 7.5 |

**Responsibilities & Constraints**

- Iterates `<sha>/PENDING-followups/*.md` (which `render_report.py` always writes; the ones whose category had zero gaps stay empty placeholders).
- For each non-empty body, runs `gh issue create --repo salestech-group/MiroFish --title <title> --body-file <body> --label i18n`.
- On `gh` failure for any single category, leaves the corresponding `PENDING-followups/<n>-*.md` in place and exits non-zero at the end (after attempting all categories).

**Dependencies**

- External: `gh` (P0; degrades to PENDING).

**Contracts**: Service [x]

##### Service Interface

```text
file_followups.sh <sha-dir>
  precondition: <sha-dir>/PENDING-followups/*.md exist (possibly empty placeholders)
  postcondition (success): all non-empty bodies posted; URLs appended to <sha-dir>/followup-urls.txt; bodies removed from PENDING-followups/
  postcondition (partial): URLs in followup-urls.txt for the ones that posted; the rest stay in PENDING-followups/; exit code 2
```

**Implementation Notes**

- Integration: must be the last step.
- Validation: post-hoc count check (`gh` URLs + remaining PENDING bodies = total categories).
- Risks: a category that the spec already considers covered (e.g. backend docstrings → blocked by #7) is not re-filed; the spec's category list is closed and excludes that case.

## Data Models

### Domain Model

The audit operates on three logical concepts:

- **Match** — a single line of `git grep` output. `(file, line, raw_text)`.
- **Classification** — `(match, class ∈ {deliberate, gap, non-applicable, review-needed}, category ∈ closed-set, pipeline_step ∈ closed-set)`.
- **Follow-up** — `(category, title, body, status ∈ {posted, pending}, url?)`.

Invariant: every `Match` produces exactly one `Classification`; every `Classification` with `class == gap` belongs to exactly one `Follow-up` category (which may aggregate multiple gaps).

### Logical Data Model

**`classified.csv` schema** (CSV, UTF-8, header row):

| Column | Type | Notes |
|--------|------|-------|
| `file` | string | repo-relative path |
| `line` | int | 1-indexed |
| `match` | string | trimmed grep line |
| `class` | enum | `deliberate` / `gap` / `non-applicable` / `review-needed` |
| `category` | enum | closed set listed in classify.py rules |
| `pipeline_step` | enum | closed set listed in classify.py rules |

Natural key: `(file, line)`.

**`parity.txt` structure** (text, three labelled blocks):

```
[missing-keys]
en-only:  <key.path>
zh-only:  <key.path>
[cjk-in-en]
<key.path>: <value snippet>
[identical-values]
<key.path>: <value>   # review-needed if non-trivial English prose
```

### Data Contracts & Integration

- **`comment-body.md`** must be valid GitHub-flavoured Markdown; checkbox lines preserve the issue's original ordering.
- **Follow-up issue body** must be valid GitHub-flavoured Markdown; first line is a one-sentence summary; subsequent sections are: `## Evidence` (file:line list), `## Linked from` (#10 + comment URL), `## Acceptance` (a small checklist).

## Error Handling

### Error Strategy

- **Read-only operations** (steps 1–4): on any uncaught error (missing file, JSON parse error), the script aborts with a non-zero exit before any artefact is half-written. The orchestrator uses `set -euo pipefail`.
- **GitHub side effects** (steps 5–6): wrapped — failure routes to PENDING outputs and the orchestrator exits `2`.

### Error Categories and Responses

- **User errors**: invoked from wrong directory → fail fast with "must be run from repo root".
- **System errors**: `git`/`python3`/`gh` missing → fail fast with "install <tool>"; `gh auth status` not OK → branch to PENDING.
- **Business errors**: classification produces 0 matches but `cjk-grep.txt` non-empty → assertion failure (count-equality bug).

### Monitoring

- The orchestrator prints a one-line status per step.
- Final summary block to stdout: total matches, gaps, `manual-pending`, follow-ups posted vs PENDING.

## Testing Strategy

- **Unit tests**: not introduced — the scripts are simple enough that a one-shot dry run on the live tree is the canonical validation.
- **Integration test**: a single `bash run_audit.sh` against the working tree; success criteria below.
- **Validation checklist** (run during implementation):
  - The audit produces a non-empty `cjk-grep.txt`.
  - `parity.txt` reports 0 missing keys (matches the live state at HEAD).
  - `classified.csv` row count == `cjk-grep.txt` line count.
  - `gap-report.md` and `comment-body.md` parse as valid markdown (manual eyeball — no toolchain required).
  - The classifier marks every `locales/en.json` CJK as `gap` (currently zero such matches, so this asserts the negative).
  - With `gh` available: a comment is posted on #10 and follow-up issues are created.
  - With `gh` simulated as absent (e.g. `PATH=/dev/null`): PENDING outputs appear under `<sha>/`.

### Out of scope for testing

- The live UI walkthrough is `manual-pending` (R5.3) and not part of the test plan.
- Performance, scalability, security: nothing to test — read-only single-shot scripts.