28 KiB

Raw Blame History

Design — i18n-e2e-english-verification

Overview

Purpose: This spec produces a deterministic, re-runnable verification pass that proves (or disproves) the MiroFish 5-step pipeline runs cleanly in English, and posts a structured report on issue #10 with a pass / gap / manual-pending status per checklist item.

Users: i18n maintainers reviewing the epic (#11), and any future verifier re-running the audit after subsequent merges. The deliverable is read by humans on GitHub (issue comment) and re-run by humans (or CI in a future iteration) to confirm parity.

Impact: No production code is modified. The repository gains one new directory tree (.kiro/specs/i18n-e2e-english-verification/) containing the spec, the audit scripts, and the captured outputs. One GitHub comment is posted on #10. Up to four follow-up issues are filed.

Goals

Static-audit backend/app, frontend/src, locales/en.json for CJK characters; classify every match.
Verify EN / ZH locale catalogue parity and flag suspect untranslated entries.
Verify LLM-prompt assets respect the requested locale.
Document locale-propagation gaps across Flask → Task → OASIS subprocess → ReACT agent.
Post a single canonical comment on issue #10 with per-checklist statuses.
File follow-up issues for every gap (no inline fixes).
Make the audit re-runnable by capturing artefacts under .kiro/specs/.../audit/<commit-sha>/.

Non-Goals

Patching any gap discovered (R7.3 — strictly verification).
Performance / load testing.
Adding new locales beyond EN / ZH.
Building a permanent CI guard (filed as a follow-up issue, not implemented here).
Live UI / Docker walkthrough — captured as manual-pending in this run's report.

Boundary Commitments

This Spec Owns

The audit scripts and the captured audit outputs under .kiro/specs/i18n-e2e-english-verification/audit/.
The gap-report.md artefact and the comment body posted on issue #10.
The grouping rule for follow-up issues (one per category — UI strings, backend log strings, backend LLM-prompt labels, suggested CI guard).
The pass / gap / manual-pending / review-needed classification scheme.

Out of Boundary

Any modification of files under backend/app/, frontend/src/, or locales/.
Fixing the gaps the audit discovers — those land in their own follow-up issues.
Live UI walkthrough, Docker run, or LLM execution.
A permanent CI check — filed as a separate follow-up issue.

Allowed Dependencies

git (for git grep, capturing HEAD sha).
gh CLI (for the comment + follow-up issues; with documented fallback when unavailable).
python3 (for the catalogue parity diff).
The repo working tree at HEAD of the working branch.

Revalidation Triggers

Any merge to main that touches locales/, backend/app/, or frontend/src/ invalidates the captured audit; a re-run should produce a new audit/<commit-sha>/ directory.
A change to issue #10's checklist body (e.g. a new sub-item) requires re-mapping in gap-report.md.
A change to the four follow-up categories (e.g. project decides to file one issue per file) requires re-running the issue-filing script with new grouping.

Architecture

Existing Architecture Analysis

The MiroFish backend is Flask + Python Task workers + an OASIS subprocess (per CLAUDE.md). i18n surfaces are: vue-i18n for the SPA, locales/*.json shared by both ends, a backend logger that resolves keys per locale, and inline LLM prompts in backend/app/services/*.py.
The verification pass does not hook into any of these — it reads files only. No Flask blueprint, no Task model, no Neo4j query.

Architecture Pattern & Boundary Map

graph TB
    Verifier[Verifier shell entrypoint]
    Audit[audit_cjk.sh]
    Parity[check_parity.py]
    Classify[classify.py]
    Report[render_report.py]
    Comment[post_comment.sh]
    FollowUp[file_followups.sh]

    Repo[Working tree]
    Captures[audit slash sha slash]
    GH[GitHub via gh CLI]

    Verifier --> Audit
    Verifier --> Parity
    Audit --> Classify
    Parity --> Classify
    Classify --> Report
    Report --> Captures
    Report --> Comment
    Report --> FollowUp
    Audit --> Repo
    Parity --> Repo
    Comment --> GH
    FollowUp --> GH

Architecture Integration:

Selected pattern: Linear pipeline of read-only scripts that each emit a single artefact, composed by a thin shell entrypoint. No mutable state outside audit/<sha>/.
Domain boundaries: audit_cjk.sh owns the raw grep; check_parity.py owns the catalogue diff; classify.py owns the four-class labels; render_report.py owns the comment body; post_comment.sh and file_followups.sh own GitHub side effects.
Existing patterns preserved: Shell + Python script pair (matches the project's existing setup/run style); no new test runner, no new linter.
New components rationale: Each script is single-purpose so failures (e.g. gh permission issues) are isolated and the pipeline can resume from the failed step.
Steering compliance: No production-code touch (R7.3); 4-space indent in any committed Python; double quotes; snake_case; reserved Bash exits with a non-zero status on any uncaught error.

Technology Stack

Layer	Choice / Version	Role in Feature	Notes
CLI / Audit runner	Bash 5+, `git grep -P` (PCRE)	Run the canonical CJK audit	`\x{...}` ranges require PCRE — `git grep -E` will fail on this regex (verified).
Static checks	Python 3.11 (project minimum per CLAUDE.md)	Catalogue parity + classification + report rendering	Standard library only — no new deps.
GitHub integration	`gh` CLI	Post the comment, file follow-ups	Falls back to `audit/<sha>/PENDING-*` files when missing.
Output formats	Plain text + Markdown	Captures + comment body	No HTML, no JSON beyond `gh`'s own.

File Structure Plan

Directory Structure

.kiro/specs/i18n-e2e-english-verification/
├── spec.json
├── requirements.md
├── gap-analysis.md
├── research.md
├── design.md
├── tasks.md
├── HANDOFF.md          # only if implementation hits the 3-cycle remediation cap
└── audit/
    ├── scripts/
    │   ├── run_audit.sh          # entrypoint - chains the steps below
    │   ├── audit_cjk.sh          # git grep PCRE + bucket counts
    │   ├── check_parity.py       # locales/en.json vs zh.json key + identical-value diff
    │   ├── classify.py           # apply 4-class labels to grep matches
    │   ├── render_report.py      # produce gap-report.md + comment-body.md
    │   ├── post_comment.sh       # gh issue comment 10 with comment-body.md (or PENDING-*)
    │   └── file_followups.sh     # gh issue create per category (or PENDING-*)
    └── <commit-sha>/             # captured outputs of one verification run
        ├── cjk-grep.txt          # raw `git grep -nP ...` output
        ├── cjk-grep-bucketed.txt # the same, partitioned by top-level path
        ├── parity.txt            # en/zh diff summary
        ├── classified.csv        # match-by-match label
        ├── gap-report.md         # the canonical structured report
        ├── comment-body.md       # the markdown posted to issue #10
        ├── PENDING-issue-10-comment.md          # only if gh comment failed
        └── PENDING-followups/                   # only if gh issue create failed
            ├── 01-frontend-ui-strings.md
            ├── 02-backend-log-strings.md
            ├── 03-backend-prompt-labels.md
            └── 04-permanent-ci-guard.md

Modified Files

(None.) The spec explicitly forbids touching production source.

System Flows

sequenceDiagram
    participant V as Verifier
    participant Run as run_audit.sh
    participant FS as Working tree
    participant GH as GitHub

    V->>Run: bash run_audit.sh
    Run->>FS: git grep -nP, git rev-parse HEAD
    FS-->>Run: cjk-grep.txt + sha
    Run->>FS: read locales json
    FS-->>Run: en/zh dicts
    Run->>Run: classify
    Run->>FS: write audit slash sha slash artefacts
    Run->>GH: gh issue comment 10
    alt gh succeeds
        GH-->>Run: comment URL
        Run->>GH: gh issue create x N follow-ups
        GH-->>Run: issue URLs
    else gh fails
        Run->>FS: write PENDING markdown to audit slash sha slash
    end
    Run-->>V: exit 0 success or exit 2 PENDING

Key flow decisions:

The audit always writes the captured artefacts to disk first (idempotent, re-runnable). The GitHub side effects are the last steps so any earlier failure leaves a complete capture for inspection.
A non-zero gh exit shifts the pipeline to PENDING mode rather than failing the whole run; the script exits 2 to flag "audit ran but GitHub side-effects didn't apply".

Requirements Traceability

Requirement	Summary	Components	Interfaces / Artefacts	Flows
1.1	Run canonical `git grep`	audit_cjk.sh	`cjk-grep.txt`	Audit step
1.2	Classify each match	classify.py	`classified.csv`	Audit step
1.3	Record file:line + step tag for `gap`	classify.py	`classified.csv` (`step` column)	Audit step
1.4	No file modifications during audit	run_audit.sh	scripts are read-only	—
1.5	`en.json` CJK = always `gap`	classify.py	hard rule in classifier	Audit step
2.1	Enumerate keys recursively	check_parity.py	`parity.txt`	Audit step
2.2	Missing-key gaps recorded	check_parity.py	`parity.txt` (missing-key block)	Audit step
2.3	EN catalogue CJK = `gap`	check_parity.py	`parity.txt` (cjk-in-en block)	Audit step
2.4	EN/ZH identical = `review-needed`	check_parity.py	`parity.txt` (identical-value block)	Audit step
2.5	No catalogue edits	check_parity.py	read-only stdlib JSON load	—
3.1	Enumerate prompt files	classify.py (heuristic — known files list)	`gap-report.md` Section 3	—
3.2	Confirm locale-aware or EN-only	classify.py	`gap-report.md` Section 3	—
3.3	Hard-coded ZH directive = `gap`	classify.py	`classified.csv` (`category=prompt-label`)	—
3.4	#3, #4, #5 prompts post-merge check	classify.py	`gap-report.md` Section 3	—
4.1	Identify handoff boundaries	render_report.py	`gap-report.md` Section 4	—
4.2	Confirm explicit or re-derived locale	render_report.py	`gap-report.md` Section 4	—
4.3	Silent default = `gap`	classify.py	`classified.csv` (`category=propagation`)	—
4.4	Backend logger EN under EN	classify.py	`classified.csv` (`category=backend-log`)	—
5.1	Comment lists every checklist item	render_report.py	`comment-body.md`	Comment-post
5.2	Each `gap` includes file:line + follow-up link	render_report.py	`comment-body.md`	Comment-post
5.3	`manual-pending` items state repro steps	render_report.py	`comment-body.md`	Comment-post
5.4	Comment includes raw audit (or path)	render_report.py	`comment-body.md` (path reference)	Comment-post
5.5	Post via `gh issue comment 10`	post_comment.sh	`comment-body.md`	Comment-post
6.1	ZH covers every EN key	check_parity.py	(already passes per gap-analysis)	—
6.2	Locale-aware prompts symmetric	render_report.py	`gap-report.md` Section 6	—
6.3	EN-only ZH value = `review-needed`	check_parity.py	`parity.txt` (identical-value block)	—
6.4	ZH regression filed as gap	classify.py	`classified.csv`	—
7.1	File issue per gap	file_followups.sh	`gh issue create`	Follow-up
7.2	Group by category	file_followups.sh	one body per category in `PENDING-followups/`	Follow-up
7.3	No production-code edits	run_audit.sh	only writes under `.kiro/specs/.../`	—
7.4	Label follow-ups `i18n`	file_followups.sh	`gh issue create --label i18n`	Follow-up
7.5	Fallback inline list when no `gh`	file_followups.sh	`PENDING-followups/*.md`	Follow-up
8.1	Capture raw output	run_audit.sh	`audit/<sha>/` directory	Audit step
8.2	Preserve previous run	run_audit.sh	`<sha>` subdirectory naming	Audit step
8.3	Record HEAD sha	run_audit.sh	`git rev-parse HEAD`	Audit step
8.4	Idempotent re-run	run_audit.sh	re-running on same sha overwrites that sha's dir	Audit step

Components and Interfaces

Component	Domain	Intent	Req Coverage	Key Dependencies (P0/P1)	Contracts
run_audit.sh	Verification pipeline	Compose the audit and route artefacts	1.4, 7.3, 8.1, 8.2, 8.3, 8.4	git (P0), python3 (P0), gh (P1)	Batch
audit_cjk.sh	Static audit	Run `git grep -nP` and bucket	1.1, 1.5	git (P0)	Batch
check_parity.py	Catalogue diff	Diff en/zh + identical-value heuristic	2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3	python3 stdlib (P0)	Batch
classify.py	Classification	Apply the 4-class label per match	1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4	cjk-grep.txt (P0), parity.txt (P0)	Batch
render_report.py	Report assembly	Produce gap-report.md + comment-body.md	4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2	classified.csv (P0)	Batch
post_comment.sh	GitHub side-effect	Post the comment on #10	5.5	gh (P0), comment-body.md (P0)	Service
file_followups.sh	GitHub side-effect	Open follow-up issues	7.1, 7.2, 7.4, 7.5	gh (P0), PENDING-followups/* (P0)	Service

Verification pipeline

`run_audit.sh`

Field	Detail
Intent	Single shell entrypoint that runs every step in order and persists artefacts under `audit/<commit-sha>/`
Requirements	1.4, 7.3, 8.1, 8.2, 8.3, 8.4

Responsibilities & Constraints

Must NOT modify any file outside .kiro/specs/i18n-e2e-english-verification/.
Must capture HEAD sha before any other step (so the artefact path is set).
Must exit 0 on full success (audit + GitHub side effects) and 2 on PENDING (audit succeeded, side effects didn't).
Must be safely re-runnable on the same sha (overwriting that sha's directory is acceptable).

Dependencies

Inbound: invoked manually by the verifier (bash run_audit.sh) — Criticality: P0.
Outbound: audit_cjk.sh, check_parity.py, classify.py, render_report.py, post_comment.sh, file_followups.sh — Criticality: P0 each.
External: git, python3, gh (P1 — fallback supported).

Contracts: Service [ ] / API [ ] / Event [ ] / Batch [x] / State [ ]

Batch / Job Contract

Trigger: manual bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh.
Input / validation: working tree at any commit; rejects detached non-clean trees? — no, the audit reads tracked files only via git grep, so unstaged edits are ignored deliberately.
Output / destination: .kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/.
Idempotency & recovery: Re-running on the same sha overwrites that sha's directory. PENDING outputs survive across runs until a gh-enabled run replaces them.

Implementation Notes

Integration: invoked by humans only — no CI hookup in this spec.
Validation: confirm gh auth status before attempting comment/issue posts; on failure, branch to PENDING.
Risks: shell quoting around the PCRE pattern ([\x{4e00}-\x{9fff}]) — use single-quoted argument to git grep -P.

`audit_cjk.sh`

Field	Detail
Intent	Run the canonical PCRE grep + per-bucket counts
Requirements	1.1, 1.5

Responsibilities & Constraints

Output: cjk-grep.txt (raw git grep -nP lines) and cjk-grep-bucketed.txt (one section per top-level path: backend/app, frontend/src, locales/en.json).
Excludes binary file matches (e.g. .jpeg false positives).

Dependencies

Inbound: run_audit.sh (P0).
External: git 2.x (P0 — must support -P for PCRE).

Contracts: Batch [x]

Batch / Job Contract

Trigger: invoked by run_audit.sh.
Input / validation: receives the target output directory as argv[1]; aborts if missing.
Output / destination: cjk-grep.txt, cjk-grep-bucketed.txt in <sha>/.
Idempotency & recovery: deterministic — same tree → same output.

Implementation Notes

Integration: pure read-only against git.
Validation: git --version precondition; abort with a clear error if PCRE unsupported.
Risks: ripgrep is NOT used (avoids a hard rg dependency); git grep -P is built-in to git's PCRE2 binding.

`check_parity.py`

Field	Detail
Intent	Compare `locales/en.json` and `locales/zh.json`: key parity, CJK in EN, identical-value heuristic
Requirements	2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3

Responsibilities & Constraints

Recursively flattens nested-dict keys with dotted paths.
Reports three blocks: missing-keys, cjk-in-en, identical-values.
Treats values as review-needed only if (a) en value == zh value, (b) value is non-empty, (c) value is more than two ASCII words.

Dependencies

Inbound: run_audit.sh (P0).
External: json from Python stdlib (P0).

Contracts: Batch [x]

Batch / Job Contract

Trigger: invoked by run_audit.sh with the <sha> directory as argv[1].
Input / validation: reads locales/en.json and locales/zh.json from cwd (must be invoked from repo root); fails fast on JSON parse error.
Output / destination: parity.txt in <sha>/.
Idempotency & recovery: pure function of catalogue contents.

Implementation Notes

Integration: invoked from repo root so relative paths resolve.
Validation: parse-on-load, both files must be objects.
Risks: the "more than two ASCII words" heuristic may produce noise — review-needed is intentionally a soft label not a gap.

`classify.py`

Field	Detail
Intent	Apply the 4-class label (`deliberate` / `gap` / `non-applicable` / `review-needed`) and a category tag per match
Requirements	1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4

Responsibilities & Constraints

Reads cjk-grep.txt and parity.txt; emits classified.csv with columns: file, line, match, class, category, pipeline_step.
Categories (closed set): frontend-ui-string, frontend-regex-parser, backend-docstring, backend-comment, backend-log, backend-prompt-label, propagation, catalogue-parity, binary-false-positive.
Pipeline-step tags (closed set): Graph Build, Env Setup, Simulation, Report, Interaction, Logs, UI, n/a.
Classification rules:
- locales/en.json CJK → always gap / catalogue-parity / n/a (R1.5).
- File path under frontend/src/views/ or frontend/src/components/ AND match is inside a string literal (heuristic: enclosed in '…'/"…"/`…`) → gap / frontend-ui-string.
- Match inside a text.match(/.../) call in a .vue file → frontend-regex-parser / gap (cause: backend emits CJK).
- Backend .py file, line starts with # or appears inside a triple-quoted docstring → deliberate-blocked-by-#7 / backend-docstring (or backend-comment) — counted but not filed as a fresh follow-up since #7 already covers it.
- Backend .py file, line contains logger., log., print( and CJK in a string literal → gap / backend-log / appropriate step tag.
- Backend .py file in services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py and CJK appears inside an LLM-prompt context label (heuristic: a string literal not preceded by #) → gap / backend-prompt-label.
- Binary files (e.g. .jpeg ripgrep matches): non-applicable / binary-false-positive.
- Anything else: review-needed (forces a human look).

Dependencies

Inbound: audit_cjk.sh, check_parity.py (P0).
External: csv from Python stdlib.

Contracts: Batch [x]

Batch / Job Contract

Trigger: invoked by run_audit.sh after the two preceding steps.
Input / validation: cjk-grep.txt and parity.txt must exist in <sha>/.
Output / destination: classified.csv.
Idempotency & recovery: deterministic — same inputs → same csv.

Implementation Notes

Integration: classification rules are heuristics, not a parser; correctness is bounded by careful regexes and an explicit "fallthrough = review-needed" rule.
Validation: every input row produces an output row (no silent drops); a count-equality assertion runs at the end.
Risks: false negatives (e.g. a Chinese log string that doesn't contain logger. on the same line) — review-needed fallthrough catches these.

`render_report.py`

Field	Detail
Intent	Produce `gap-report.md` and `comment-body.md`
Requirements	4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2

Responsibilities & Constraints

gap-report.md: Sections: Overview, Section 1 (static audit), Section 2 (parity), Section 3 (prompt verification), Section 4 (propagation), Section 5 (issue-#10 checklist mapping), Section 6 (ZH regression), Section 7 (follow-up plan).
comment-body.md: Markdown comment for issue #10 — mirrors the issue's checklist with pass / gap / manual-pending for each line, plus a "How to re-run" footer.
Reads classified.csv and the issue body (snapshot at .ticket/10.md).

Dependencies

Inbound: classify.py (P0), .ticket/10.md (P0).
External: Python stdlib only.

Contracts: Batch [x]

Batch / Job Contract

Trigger: run_audit.sh after classify.py.
Input / validation: classified.csv and .ticket/10.md must exist.
Output / destination: gap-report.md, comment-body.md in <sha>/.
Idempotency & recovery: deterministic.

Implementation Notes

Integration: the comment body must include a Run on commit <sha> header so the comment is traceable.
Validation: confirm every issue-body checkbox has been mapped (count check).
Risks: rendering CJK characters in markdown — Python writes UTF-8 by default; comment body is verified to round-trip via gh.

`post_comment.sh`

Field	Detail
Intent	Post `comment-body.md` as a comment on issue #10
Requirements	5.5

Responsibilities & Constraints

gh issue comment 10 --repo salestech-group/MiroFish --body-file <sha>/comment-body.md.
On non-zero exit, copies the body to <sha>/PENDING-issue-10-comment.md and exits non-zero.

Dependencies

External: gh (P0; degrades to PENDING when missing).

Contracts: Service [x]

Service Interface

post_comment.sh <sha-dir>
  precondition: <sha-dir>/comment-body.md exists
  postcondition (success): comment posted; URL printed to stdout
  postcondition (failure): <sha-dir>/PENDING-issue-10-comment.md present; exit code 2

Implementation Notes

Integration: must be the second-to-last step (so failures don't block the issue-filing fallback).
Validation: parses gh's URL output and writes it to <sha>/comment-url.txt on success.
Risks: PR-time rate limits — unlikely for a single comment.

`file_followups.sh`

Field	Detail
Intent	Open one follow-up issue per gap category
Requirements	7.1, 7.2, 7.4, 7.5

Responsibilities & Constraints

Iterates <sha>/PENDING-followups/*.md (which render_report.py always writes; the ones whose category had zero gaps stay empty placeholders).
For each non-empty body, runs gh issue create --repo salestech-group/MiroFish --title <title> --body-file <body> --label i18n.
On gh failure for any single category, leaves the corresponding PENDING-followups/<n>-*.md in place and exits non-zero at the end (after attempting all categories).

Dependencies

External: gh (P0; degrades to PENDING).

Contracts: Service [x]

Service Interface

file_followups.sh <sha-dir>
  precondition: <sha-dir>/PENDING-followups/*.md exist (possibly empty placeholders)
  postcondition (success): all non-empty bodies posted; URLs appended to <sha-dir>/followup-urls.txt; bodies removed from PENDING-followups/
  postcondition (partial): URLs in followup-urls.txt for the ones that posted; the rest stay in PENDING-followups/; exit code 2

Implementation Notes

Integration: must be the last step.
Validation: post-hoc count check (gh URLs + remaining PENDING bodies = total categories).
Risks: a category that the spec already considers covered (e.g. backend docstrings → blocked by #7) is not re-filed; the spec's category list is closed and excludes that case.

Data Models

Domain Model

The audit operates on three logical concepts:

Match — a single line of git grep output. (file, line, raw_text).
Classification — (match, class ∈ {deliberate, gap, non-applicable, review-needed}, category ∈ closed-set, pipeline_step ∈ closed-set).
Follow-up — (category, title, body, status ∈ {posted, pending}, url?).

Invariant: every Match produces exactly one Classification; every Classification with class == gap belongs to exactly one Follow-up category (which may aggregate multiple gaps).

Logical Data Model

classified.csv schema (CSV, UTF-8, header row):

Column	Type	Notes
`file`	string	repo-relative path
`line`	int	1-indexed
`match`	string	trimmed grep line
`class`	enum	`deliberate` / `gap` / `non-applicable` / `review-needed`
`category`	enum	closed set listed in classify.py rules
`pipeline_step`	enum	closed set listed in classify.py rules

Natural key: (file, line).

parity.txt structure (text, three labelled blocks):

[missing-keys]
en-only:  <key.path>
zh-only:  <key.path>
[cjk-in-en]
<key.path>: <value snippet>
[identical-values]
<key.path>: <value>   # review-needed if non-trivial English prose

Data Contracts & Integration

comment-body.md must be valid GitHub-flavoured Markdown; checkbox lines preserve the issue's original ordering.
Follow-up issue body must be valid GitHub-flavoured Markdown; first line is a one-sentence summary; subsequent sections are: ## Evidence (file:line list), ## Linked from (#10 + comment URL), ## Acceptance (a small checklist).

Error Handling

Error Strategy

Read-only operations (steps 1–4): on any uncaught error (missing file, JSON parse error), the script aborts with a non-zero exit before any artefact is half-written. The orchestrator uses set -euo pipefail.
GitHub side effects (steps 5–6): wrapped — failure routes to PENDING outputs and the orchestrator exits 2.

Error Categories and Responses

User errors: invoked from wrong directory → fail fast with "must be run from repo root".
System errors: git/python3/gh missing → fail fast with "install "; gh auth status not OK → branch to PENDING.
Business errors: classification produces 0 matches but cjk-grep.txt non-empty → assertion failure (count-equality bug).

Monitoring

The orchestrator prints a one-line status per step.
Final summary block to stdout: total matches, gaps, manual-pending, follow-ups posted vs PENDING.

Testing Strategy

Unit tests: not introduced — the scripts are simple enough that a one-shot dry run on the live tree is the canonical validation.
Integration test: a single bash run_audit.sh against the working tree; success criteria below.
Validation checklist (run during implementation):
- The audit produces a non-empty cjk-grep.txt.
- parity.txt reports 0 missing keys (matches the live state at HEAD).
- classified.csv row count == cjk-grep.txt line count.
- gap-report.md and comment-body.md parse as valid markdown (manual eyeball — no toolchain required).
- The classifier marks every locales/en.json CJK as gap (currently zero such matches, so this asserts the negative).
- With gh available: a comment is posted on #10 and follow-up issues are created.
- With gh simulated as absent (e.g. PATH=/dev/null): PENDING outputs appear under <sha>/.

Out of scope for testing

The live UI walkthrough is manual-pending (R5.3) and not part of the test plan.
Performance, scalability, security: nothing to test — read-only single-shot scripts.

28 KiB Raw Blame History Unescape Escape

Design — i18n-e2e-english-verification

Overview

Goals

Non-Goals

Boundary Commitments

This Spec Owns

Out of Boundary

Allowed Dependencies

Revalidation Triggers

Architecture

Existing Architecture Analysis

Architecture Pattern & Boundary Map

Technology Stack

File Structure Plan

Directory Structure

Modified Files

System Flows

Requirements Traceability

Components and Interfaces

Verification pipeline

run_audit.sh

Batch / Job Contract

audit_cjk.sh

Batch / Job Contract

check_parity.py

Batch / Job Contract

classify.py

Batch / Job Contract

render_report.py

Batch / Job Contract

post_comment.sh

Service Interface

file_followups.sh

Service Interface

Data Models

Domain Model

Logical Data Model

Data Contracts & Integration

Error Handling

Error Strategy

Error Categories and Responses

Monitoring

Testing Strategy

Out of scope for testing

28 KiB

Raw Blame History

`run_audit.sh`

`audit_cjk.sh`

`check_parity.py`

`classify.py`

`render_report.py`

`post_comment.sh`

`file_followups.sh`