Merge pull request #27 from salestech-group/chore/i18n-10-e2e-english-verification

chore(i18n): add e2e english verification spec, audit, and report
2026-05-08 11:06:46 +02:00 · 2026-05-08 11:06:46 +02:00 · d53f3110dd
parent 777302bc61 348140859d
commit d53f3110dd
21 changed files with 11002 additions and 0 deletions
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep-bucketed.txt
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep-bucketed.txt
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep.txt
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep.txt
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/classified.csv
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/classified.csv
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/comment-body.md
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/comment-body.md
@ -0,0 +1,60 @@
+### Verification report - run on commit `9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd`
+
+This run was produced by `.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.
+Captured artefacts live under `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.
+
+
+**Audit summary:** 2916 CJK matches across the auditable paths.
+- 237 `gap` (actionable, see follow-ups)
+- 380 `review-needed` (soft signal; needs human eyeball)
+- 2299 `deliberate` (mostly backend docstrings/comments - covered by issue #7)
+- 0 `non-applicable` (binary file false positives - excluded)
+
+**Gap-category breakdown:** backend-prompt-label=143, frontend-ui-string=49, frontend-regex-parser=36, backend-log=9
+
+---
+
+#### Issue checklist mapping
+
+## Section 5 - Issue #10 checklist mapping
+
+Each line below is taken from the ticket body, with an explicit status.
+
+- [ ] **GAP** - **Frontend UI** — every label, button, modal, error toast, and tooltip in EN. No Chinese strings on screen. - 29 hard-coded CJK literal(s) in `frontend/src/views|components/`
+- [ ] **GAP** - **Step 1 — Graph Build** - 5 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: Status messages in EN - not verifiable statically; awaiting live run
+  - GAP: Ontology JSON descriptions in EN (depends on #2) - 14 gap(s) classified, see Section 1/3
+  - GAP: Backend logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Step 2 — Env Setup** - 61 gap(s) classified, see Section 1/3
+  - GAP: Generated agent profiles (`bio`, `persona`, `profession`, `interested_topics`) in EN (depends on #3) - 61 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: `gender` still the English enum (`male` / `female` / `other`) - not verifiable statically; awaiting live run
+- [ ] **GAP** - **Step 3 — Simulation** - 14 gap(s) classified, see Section 1/3
+  - GAP: Sim config `content`, `narrative_direction`, `hot_topics`, `reasoning` in EN (depends on #4) - 14 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: `poster_type` still PascalCase English - not verifiable statically; awaiting live run
+  - MANUAL-PENDING: `stance` still one of `supportive` / `opposing` / `neutral` / `observer` - not verifiable statically; awaiting live run
+  - GAP: Generated tweets / Reddit posts in EN (depends on #3 personas + #4 sim config) - 14 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Step 4 — Report** - 70 gap(s) classified, see Section 1/3
+  - GAP: Report sections, headings, prose in EN (depends on #5) - 70 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: ReACT thinking trace in EN - requires live walkthrough
+  - MANUAL-PENDING: Tool-call results render correctly - requires live walkthrough
+- [ ] **GAP** - **Step 5 — Interaction** - 2 gap(s) classified, see Section 1/3
+  - GAP: Interview chat replies in EN (depends on #3) - 2 gap(s) classified, see Section 1/3
+  - GAP: Report Agent chat replies in EN (depends on #5) - 72 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Backend logs** — full pipeline-run logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Locale propagation** — confirm `Accept-Language: en` (or thread-local locale set via `set_locale`) reaches background tasks and survives the OASIS subprocess boundary. - 9 CJK log strings on EN code path
+- [ ] **MANUAL-PENDING** - Every touchpoint above renders in Chinese; no English regressions. - requires live walkthrough
+- [ ] **MANUAL-PENDING** - zh.json backfill (#8) covered: Step 3, Step 4, Step 5, and graph panel labels are all Chinese. - not verifiable statically; awaiting live run
+
+---
+
+#### How to re-run
+
+```bash
+# from the repository root, on any commit:
+bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh
+# artefacts at .kiro/specs/i18n-e2e-english-verification/audit/<HEAD-sha>/
+```
+
+If `gh` is not authenticated when re-running, the comment body and follow-up bodies are written to `PENDING-issue-10-comment.md` / `PENDING-followups/` for a human to post.
+
+Out of scope for this run (per R5.3 / R7.3): live UI walkthrough, full Docker-Compose pipeline run, and any inline gap fixes.
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/comment-url.txt
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/comment-url.txt
@ -0,0 +1 @@
+https://github.com/salestech-group/MiroFish/issues/10#issuecomment-4400060417
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/followup-urls.txt
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/followup-urls.txt
@ -0,0 +1,4 @@
+https://github.com/salestech-group/MiroFish/issues/23
+https://github.com/salestech-group/MiroFish/issues/24
+https://github.com/salestech-group/MiroFish/issues/25
+https://github.com/salestech-group/MiroFish/issues/26
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/gap-report.md
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/gap-report.md
@ -0,0 +1,143 @@
+# Verification gap report - i18n-e2e-english-verification
+
+**Commit:** `9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd`
+
+
+## Overview
+
+- Total CJK matches audited: **2916**
+- Class distribution: deliberate=2299, review-needed=380, gap=237
+- Gap categories: backend-prompt-label=143, frontend-ui-string=49, frontend-regex-parser=36, backend-log=9
+- Gap pipeline steps: Report=70, Env Setup=61, n/a=47, UI=29, Simulation=14, Logs=9, Graph Build=5, Interaction=2
+
+## Section 1 - Static CJK audit
+
+Canonical command (PCRE):
+
+```
+git grep -nIP "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json
+```
+
+Raw output captured at `audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep.txt` and bucketed at `audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep-bucketed.txt`.
+
+`locales/en.json` CJK matches: **0** (acceptance: zero).
+
+Top files by gap count:
+
+| File | Gap count |
+|------|-----------|
+| `backend/app/services/oasis_profile_generator.py` | 60 |
+| `frontend/src/components/Step4Report.vue` | 50 |
+| `backend/app/services/zep_graph_memory_updater.py` | 47 |
+| `frontend/src/views/Process.vue` | 29 |
+| `backend/app/services/report_agent.py` | 20 |
+| `backend/app/services/simulation_config_generator.py` | 13 |
+| `backend/app/services/ontology_generator.py` | 5 |
+| `backend/app/utils/retry.py` | 4 |
+| `backend/app/api/graph.py` | 3 |
+| `frontend/src/components/Step2EnvSetup.vue` | 3 |
+| `frontend/src/components/Step5Interaction.vue` | 2 |
+| `frontend/src/components/Step3Simulation.vue` | 1 |
+
+## Section 2 - Locale catalogue parity
+
+```
+# Locale parity for HEAD
+# en keys: 953
+# zh keys: 953
+
+[missing-keys]
+# (none)
+
+[cjk-in-en]
+# (none)
+
+[identical-values]
+# (none)
+```
+
+## Section 3 - LLM-prompt locale verification
+
+Backend prompt-label gaps (CJK string literals inside services that compose LLM prompts): **143**
+
+First 10 examples (file:line - match):
+
+- `backend/app/services/oasis_profile_generator.py:65` - "username": self.user_name,  # OASIS 库要求字段名为 username（无下划线）
+- `backend/app/services/oasis_profile_generator.py:93` - "username": self.user_name,  # OASIS 库要求字段名为 username（无下划线）
+- `backend/app/services/oasis_profile_generator.py:194` - raise ValueError("LLM_API_KEY 未配置")
+- `backend/app/services/oasis_profile_generator.py:384` - all_summaries.add(f"相关实体: {node.name}")
+- `backend/app/services/oasis_profile_generator.py:390` - context_parts.append("事实信息:\n" + "\n".join(f"- {f}" for f in results["facts"][:20]))
+- `backend/app/services/oasis_profile_generator.py:392` - context_parts.append("相关实体:\n" + "\n".join(f"- {s}" for s in results["node_summaries"][:10]))
+- `backend/app/services/oasis_profile_generator.py:422` - context_parts.append("### 实体属性\n" + "\n".join(attrs))
+- `backend/app/services/oasis_profile_generator.py:438` - relationships.append(f"- {entity.name} --[{edge_name}]--> (相关实体)")
+- `backend/app/services/oasis_profile_generator.py:440` - relationships.append(f"- (相关实体) --[{edge_name}]--> {entity.name}")
+- `backend/app/services/oasis_profile_generator.py:443` - context_parts.append("### 相关事实和关系\n" + "\n".join(relationships))
+- ... and 133 more (see `classified.csv`)
+
+These prompts feed the LLM verbatim; CJK labels bias the model toward Chinese output even when the requested locale is English.
+
+## Section 4 - Locale propagation surface
+
+| Boundary | Status | Evidence |
+|----------|--------|----------|
+| HTTP -> Flask handler | manual-pending | runtime not exercised in sandbox; static review showed no per-request locale carrier |
+| Flask handler -> Task worker | manual-pending | thread-local `set_locale` referenced in CLAUDE.md but not statically verified end-to-end |
+| Task worker -> OASIS subprocess | manual-pending | subprocess boundary requires live run |
+| Backend logger | gap | 9 hard-coded CJK log line(s) on EN code path |
+
+First 10 backend-log gap examples:
+
+- `backend/app/api/graph.py:385` - build_logger.info(f"[{task_id}] 开始构建图谱...")
+- `backend/app/api/graph.py:494` - build_logger.info(f"[{task_id}] 图谱构建完成: graph_id={graph_id}, 节点={node_count}, 边={edge_count}")
+- `backend/app/api/graph.py:513` - build_logger.error(f"[{task_id}] 图谱构建失败: {str(e)}")
+- `backend/app/services/oasis_profile_generator.py:945` - print(f"开始生成Agent人设 - 共 {total} 个实体，并行数: {parallel_count}")
+- `backend/app/services/oasis_profile_generator.py:1001` - print(f"人设生成完成！共生成 {len([p for p in profiles if p])} 个Agent")
+- `backend/app/utils/retry.py:55` - logger.error(f"函数 {func.__name__} 在 {max_retries} 次重试后仍失败: {str(e)}")
+- `backend/app/utils/retry.py:108` - logger.error(f"异步函数 {func.__name__} 在 {max_retries} 次重试后仍失败: {str(e)}")
+- `backend/app/utils/retry.py:179` - logger.error(f"API调用在 {self.max_retries} 次重试后仍失败: {str(e)}")
+- `backend/app/utils/retry.py:227` - logger.error(f"处理第 {idx + 1} 项失败: {str(e)}")
+
+## Section 5 - Issue #10 checklist mapping
+
+Each line below is taken from the ticket body, with an explicit status.
+
+- [ ] **GAP** - **Frontend UI** — every label, button, modal, error toast, and tooltip in EN. No Chinese strings on screen. - 29 hard-coded CJK literal(s) in `frontend/src/views|components/`
+- [ ] **GAP** - **Step 1 — Graph Build** - 5 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: Status messages in EN - not verifiable statically; awaiting live run
+  - GAP: Ontology JSON descriptions in EN (depends on #2) - 14 gap(s) classified, see Section 1/3
+  - GAP: Backend logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Step 2 — Env Setup** - 61 gap(s) classified, see Section 1/3
+  - GAP: Generated agent profiles (`bio`, `persona`, `profession`, `interested_topics`) in EN (depends on #3) - 61 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: `gender` still the English enum (`male` / `female` / `other`) - not verifiable statically; awaiting live run
+- [ ] **GAP** - **Step 3 — Simulation** - 14 gap(s) classified, see Section 1/3
+  - GAP: Sim config `content`, `narrative_direction`, `hot_topics`, `reasoning` in EN (depends on #4) - 14 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: `poster_type` still PascalCase English - not verifiable statically; awaiting live run
+  - MANUAL-PENDING: `stance` still one of `supportive` / `opposing` / `neutral` / `observer` - not verifiable statically; awaiting live run
+  - GAP: Generated tweets / Reddit posts in EN (depends on #3 personas + #4 sim config) - 14 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Step 4 — Report** - 70 gap(s) classified, see Section 1/3
+  - GAP: Report sections, headings, prose in EN (depends on #5) - 70 gap(s) classified, see Section 1/3
+  - MANUAL-PENDING: ReACT thinking trace in EN - requires live walkthrough
+  - MANUAL-PENDING: Tool-call results render correctly - requires live walkthrough
+- [ ] **GAP** - **Step 5 — Interaction** - 2 gap(s) classified, see Section 1/3
+  - GAP: Interview chat replies in EN (depends on #3) - 2 gap(s) classified, see Section 1/3
+  - GAP: Report Agent chat replies in EN (depends on #5) - 72 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Backend logs** — full pipeline-run logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
+- [ ] **GAP** - **Locale propagation** — confirm `Accept-Language: en` (or thread-local locale set via `set_locale`) reaches background tasks and survives the OASIS subprocess boundary. - 9 CJK log strings on EN code path
+- [ ] **MANUAL-PENDING** - Every touchpoint above renders in Chinese; no English regressions. - requires live walkthrough
+- [ ] **MANUAL-PENDING** - zh.json backfill (#8) covered: Step 3, Step 4, Step 5, and graph panel labels are all Chinese. - not verifiable statically; awaiting live run
+
+## Section 6 - ZH regression check
+
+- Locale catalogues at full key parity (953 EN keys / 953 ZH keys, symmetric difference 0 - see Section 2).
+- No ZH-specific regression detected in static review. Live ZH walkthrough is `manual-pending`.
+
+## Section 7 - Follow-up plan
+
+Per R7.2, gaps are grouped into the following follow-up issues (placeholder bodies in `PENDING-followups/`):
+
+1. **Frontend hard-coded UI strings** (49 matches + 36 regex parsers depending on CJK backend output).
+2. **Backend log strings** (9 matches).
+3. **Backend LLM-prompt context labels** (143 matches).
+4. **Permanent CI guard** (preventative - re-run this audit on every PR).
+
+Backend docstring/comment matches (the bulk of `deliberate` rows) are covered by the existing issue #7 and are not re-filed here.
--- a/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/parity.txt
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/parity.txt
@ -0,0 +1,13 @@
+# Locale parity for HEAD
+# en keys: 953
+# zh keys: 953
+
+[missing-keys]
+# (none)
+
+[cjk-in-en]
+# (none)
+
+[identical-values]
+# (none)
+
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/audit_cjk.sh
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/audit_cjk.sh
@ -0,0 +1,62 @@
+#!/usr/bin/env bash
+# Run the canonical CJK grep with PCRE, then write the raw output and a
+# bucketed summary partitioned by top-level path. Excludes binary file
+# matches (e.g. .jpeg) since ripgrep / git grep can otherwise score them.
+set -euo pipefail
+
+if [ "$#" -ne 1 ]; then
+    printf 'usage: %s <sha-dir>\n' "$0" >&2
+    exit 64
+fi
+
+sha_dir="$1"
+mkdir -p "${sha_dir}"
+
+raw="${sha_dir}/cjk-grep.txt"
+bucketed="${sha_dir}/cjk-grep-bucketed.txt"
+
+# Canonical PCRE grep against the three top-level paths owned by this audit.
+# git grep -P uses PCRE2 - ranges like \x{4e00}-\x{9fff} are valid here.
+# `-I` (--no-binary) excludes binary-file matches outright so the audit
+# reports only text content.
+git grep -nIP '[\x{4e00}-\x{9fff}]' \
+    -- backend/app frontend/src locales/en.json \
+    > "${raw}" \
+    || true
+
+awk_script='
+function bucket(path) {
+    if (path ~ /^backend\/app\//)    return "backend/app"
+    if (path ~ /^frontend\/src\//)   return "frontend/src"
+    if (path ~ /^locales\/en\.json/) return "locales/en.json"
+    return "other"
+}
+{
+    split($0, parts, ":")
+    path = parts[1]
+    b = bucket(path)
+    counts[b]++
+    lines[b] = (b in lines ? lines[b] "\n" : "") $0
+}
+END {
+    order[1] = "backend/app"
+    order[2] = "frontend/src"
+    order[3] = "locales/en.json"
+    order[4] = "other"
+    for (i = 1; i <= 4; i++) {
+        b = order[i]
+        c = (b in counts ? counts[b] : 0)
+        printf("[%s] (%d lines)\n", b, c)
+        if (c > 0) {
+            print lines[b]
+        }
+        print ""
+    }
+}
+'
+
+awk "${awk_script}" "${raw}" > "${bucketed}"
+
+raw_lines=$(wc -l < "${raw}" | tr -d ' ')
+printf '  cjk-grep.txt:          %s lines\n' "${raw_lines}"
+printf '  cjk-grep-bucketed.txt: written\n'
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py
@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+"""Diff locales/en.json against locales/zh.json and emit parity.txt.
+
+Three labelled blocks are written:
+
+* `[missing-keys]`  - keys present on one side but not the other.
+* `[cjk-in-en]`     - EN catalogue values that contain CJK characters.
+* `[identical-values]` - keys whose EN and ZH value are identical AND the
+                        value is non-empty AND has more than two ASCII words.
+                        These are review-needed signals, not gaps.
+
+Run from the repository root.
+"""
+from __future__ import annotations
+
+import json
+import re
+import sys
+from pathlib import Path
+from typing import Dict, Iterator, Tuple
+
+CJK_RANGE = re.compile(r"[一-鿿]")
+
+
+def flatten(d: Dict[str, object], prefix: str = "") -> Iterator[Tuple[str, object]]:
+    """Recursively yield (dotted-key, value) pairs from a nested dict."""
+    for key, value in d.items():
+        path = f"{prefix}.{key}" if prefix else key
+        if isinstance(value, dict):
+            yield from flatten(value, path)
+        else:
+            yield path, value
+
+
+def is_non_trivial_english_prose(value: object) -> bool:
+    """Heuristic for the identical-value 'review-needed' signal.
+
+    True when:
+    * value is a string,
+    * value is non-empty after strip,
+    * value contains more than two whitespace-separated tokens,
+    * value contains no CJK characters (otherwise it's just an untranslated
+      ZH original which is not a review-needed signal here).
+    """
+    if not isinstance(value, str):
+        return False
+    text = value.strip()
+    if not text:
+        return False
+    if CJK_RANGE.search(text):
+        return False
+    return len(text.split()) > 2
+
+
+def main(argv: list[str]) -> int:
+    if len(argv) != 2:
+        print(f"usage: {argv[0]} <sha-dir>", file=sys.stderr)
+        return 64
+
+    sha_dir = Path(argv[1])
+    sha_dir.mkdir(parents=True, exist_ok=True)
+    out_path = sha_dir / "parity.txt"
+
+    en_path = Path("locales/en.json")
+    zh_path = Path("locales/zh.json")
+    if not en_path.exists() or not zh_path.exists():
+        print(f"missing locale files: {en_path}, {zh_path}", file=sys.stderr)
+        return 1
+
+    en = json.loads(en_path.read_text(encoding="utf-8"))
+    zh = json.loads(zh_path.read_text(encoding="utf-8"))
+
+    en_flat = dict(flatten(en))
+    zh_flat = dict(flatten(zh))
+
+    en_only = sorted(set(en_flat) - set(zh_flat))
+    zh_only = sorted(set(zh_flat) - set(en_flat))
+
+    cjk_in_en = []
+    for key, value in sorted(en_flat.items()):
+        if isinstance(value, str) and CJK_RANGE.search(value):
+            cjk_in_en.append((key, value))
+
+    identical = []
+    for key in sorted(set(en_flat) & set(zh_flat)):
+        en_val = en_flat[key]
+        zh_val = zh_flat[key]
+        if en_val == zh_val and is_non_trivial_english_prose(en_val):
+            identical.append((key, en_val))
+
+    lines: list[str] = []
+    lines.append(f"# Locale parity for HEAD")
+    lines.append(f"# en keys: {len(en_flat)}")
+    lines.append(f"# zh keys: {len(zh_flat)}")
+    lines.append("")
+    lines.append("[missing-keys]")
+    if not en_only and not zh_only:
+        lines.append("# (none)")
+    for key in en_only:
+        lines.append(f"en-only: {key}")
+    for key in zh_only:
+        lines.append(f"zh-only: {key}")
+    lines.append("")
+    lines.append("[cjk-in-en]")
+    if not cjk_in_en:
+        lines.append("# (none)")
+    for key, value in cjk_in_en:
+        snippet = value if len(value) <= 80 else value[:77] + "..."
+        lines.append(f"{key}: {snippet}")
+    lines.append("")
+    lines.append("[identical-values]")
+    if not identical:
+        lines.append("# (none)")
+    for key, value in identical:
+        snippet = value if len(value) <= 80 else value[:77] + "..."
+        lines.append(f"{key}: {snippet}")
+    lines.append("")
+
+    out_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
+    print(
+        f"  parity.txt written: missing={len(en_only) + len(zh_only)}, "
+        f"cjk-in-en={len(cjk_in_en)}, identical-values={len(identical)}"
+    )
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main(sys.argv))
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/classify.py
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/classify.py
@ -0,0 +1,182 @@
+#!/usr/bin/env python3
+"""Classify each CJK match into a 4-class label and a category tag.
+
+Inputs (read from <sha-dir>):
+  cjk-grep.txt   - raw `git grep -nP` output, one match per line.
+  parity.txt     - output of check_parity.py (used to harvest cjk-in-en gaps).
+
+Output (written to <sha-dir>/classified.csv):
+  CSV columns: file, line, match, class, category, pipeline_step
+
+Classes are a closed set: deliberate / gap / non-applicable / review-needed.
+Categories and pipeline-step tags are likewise closed sets - see classify_match.
+
+Run from the repository root.
+"""
+from __future__ import annotations
+
+import csv
+import re
+import sys
+from pathlib import Path
+from typing import Iterable, Tuple
+
+CJK_RANGE = re.compile(r"[一-鿿]")
+PROMPT_FILES = (
+    "backend/app/services/ontology_generator.py",
+    "backend/app/services/oasis_profile_generator.py",
+    "backend/app/services/simulation_config_generator.py",
+    "backend/app/services/report_agent.py",
+    "backend/app/services/zep_graph_memory_updater.py",
+)
+LOG_HINTS = ("logger.", "log.", "print(", "build_logger.", "logging.")
+BINARY_EXTS = (
+    ".jpg", ".jpeg", ".png", ".gif", ".pdf",
+    ".woff", ".woff2", ".ttf", ".eot", ".ico",
+)
+
+
+def classify_match(file: str, raw_line: str) -> Tuple[str, str, str]:
+    """Return (class, category, pipeline_step) for one grep match line."""
+    if any(file.lower().endswith(ext) for ext in BINARY_EXTS):
+        return ("non-applicable", "binary-false-positive", "n/a")
+
+    if file == "locales/en.json":
+        return ("gap", "catalogue-parity", "UI")
+
+    stripped = raw_line.lstrip()
+    pipeline_step = pipeline_step_for(file)
+
+    if file.endswith(".vue"):
+        if re.search(r"\.match\s*\(\s*/", raw_line):
+            return ("gap", "frontend-regex-parser", pipeline_step)
+        if re.search(r"['\"`].*[一-鿿].*['\"`]", raw_line):
+            return ("gap", "frontend-ui-string", pipeline_step)
+        if stripped.startswith("//") or stripped.startswith("/*") or stripped.startswith("*"):
+            return ("deliberate", "frontend-comment", pipeline_step)
+        return ("review-needed", "frontend-other", pipeline_step)
+
+    if file.endswith(".py"):
+        if stripped.startswith("#"):
+            return ("deliberate", "backend-comment", pipeline_step)
+        if stripped.startswith('"""') or stripped.startswith("'''"):
+            return ("deliberate", "backend-docstring", pipeline_step)
+        if not re.search(r"['\"]", raw_line):
+            # bare CJK on a non-string line: most likely an unterminated docstring
+            # body. Treat as a docstring continuation.
+            return ("deliberate", "backend-docstring", pipeline_step)
+        if any(hint in raw_line for hint in LOG_HINTS):
+            return ("gap", "backend-log", "Logs")
+        if file in PROMPT_FILES:
+            return ("gap", "backend-prompt-label", pipeline_step)
+        return ("review-needed", "backend-string", pipeline_step)
+
+    if file.endswith(".js") or file.endswith(".ts"):
+        if stripped.startswith("//") or stripped.startswith("*"):
+            return ("deliberate", "frontend-comment", pipeline_step)
+        return ("review-needed", "frontend-other", pipeline_step)
+
+    return ("review-needed", "uncategorised", pipeline_step)
+
+
+def pipeline_step_for(file: str) -> str:
+    """Map a path to one of the closed-set pipeline-step tags."""
+    if "ontology_generator" in file or "graph_builder" in file or "graph.py" in file:
+        return "Graph Build"
+    if "oasis_profile_generator" in file or "Step2" in file:
+        return "Env Setup"
+    if "simulation_config_generator" in file or "simulation" in file or "Step3" in file:
+        return "Simulation"
+    if "report_agent" in file or "Step4" in file:
+        return "Report"
+    if "Step5" in file or "interaction" in file.lower() or "interview" in file.lower():
+        return "Interaction"
+    if "logger" in file or "retry" in file:
+        return "Logs"
+    if file.startswith("frontend/src/views/") or file.startswith("frontend/src/components/"):
+        return "UI"
+    return "n/a"
+
+
+def parse_grep_line(line: str) -> Tuple[str, str, str]:
+    """Split a `git grep -n` line into (file, line-number, match-text)."""
+    parts = line.split(":", 2)
+    if len(parts) < 3:
+        return ("", "", line)
+    return (parts[0], parts[1], parts[2])
+
+
+def parity_to_rows(parity_path: Path) -> Iterable[Tuple[str, str, str, str, str, str]]:
+    """Promote `[cjk-in-en]` block entries from parity.txt into classified rows."""
+    if not parity_path.exists():
+        return
+    in_block = False
+    for raw in parity_path.read_text(encoding="utf-8").splitlines():
+        if raw.startswith("["):
+            in_block = raw.strip() == "[cjk-in-en]"
+            continue
+        if not in_block:
+            continue
+        if not raw or raw.startswith("#"):
+            continue
+        yield (
+            "locales/en.json",
+            "0",
+            raw,
+            "gap",
+            "catalogue-parity",
+            "UI",
+        )
+
+
+def main(argv: list[str]) -> int:
+    if len(argv) != 2:
+        print(f"usage: {argv[0]} <sha-dir>", file=sys.stderr)
+        return 64
+
+    sha_dir = Path(argv[1])
+    grep_path = sha_dir / "cjk-grep.txt"
+    parity_path = sha_dir / "parity.txt"
+    out_path = sha_dir / "classified.csv"
+
+    if not grep_path.exists():
+        print(f"missing input: {grep_path}", file=sys.stderr)
+        return 1
+
+    rows: list[Tuple[str, str, str, str, str, str]] = []
+    grep_lines = grep_path.read_text(encoding="utf-8").splitlines()
+    for raw_line in grep_lines:
+        if not raw_line:
+            continue
+        file, lineno, match = parse_grep_line(raw_line)
+        if not file:
+            continue
+        cls, category, step = classify_match(file, match)
+        rows.append((file, lineno, match.strip(), cls, category, step))
+
+    rows.extend(parity_to_rows(parity_path))
+
+    raw_count = sum(1 for line in grep_lines if line.strip())
+    grep_rows = [r for r in rows if r[0] != "locales/en.json" or r[1] != "0"]
+    if len(grep_rows) != raw_count:
+        print(
+            f"row-count drift: input={raw_count}, classified={len(grep_rows)}",
+            file=sys.stderr,
+        )
+        return 1
+
+    with out_path.open("w", encoding="utf-8", newline="") as fh:
+        writer = csv.writer(fh)
+        writer.writerow(["file", "line", "match", "class", "category", "pipeline_step"])
+        writer.writerows(rows)
+
+    summary: dict[str, int] = {}
+    for row in rows:
+        summary[row[3]] = summary.get(row[3], 0) + 1
+    summary_str = ", ".join(f"{cls}={n}" for cls, n in sorted(summary.items()))
+    print(f"  classified.csv: {len(rows)} rows ({summary_str})")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main(sys.argv))
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/file_followups.sh
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/file_followups.sh
@ -0,0 +1,79 @@
+#!/usr/bin/env bash
+# Iterate <sha-dir>/PENDING-followups/*.md and file each non-empty body
+# as a GitHub issue. The first markdown heading line (`# title`) becomes
+# the issue title; any `<!-- labels: a,b,c -->` line at the bottom of the
+# body becomes the --label argument.
+#
+# On per-category failure the body is left in place and the script exits
+# non-zero at the end (after attempting all categories).
+set -uo pipefail
+
+if [ "$#" -ne 1 ]; then
+    printf 'usage: %s <sha-dir>\n' "$0" >&2
+    exit 64
+fi
+
+sha_dir="$1"
+pending_dir="${sha_dir}/PENDING-followups"
+urls_path="${sha_dir}/followup-urls.txt"
+
+if [ ! -d "${pending_dir}" ]; then
+    printf 'missing PENDING-followups dir: %s\n' "${pending_dir}" >&2
+    exit 1
+fi
+
+# Append-only URL log so retries on the same sha-dir preserve previous filings.
+touch "${urls_path}"
+
+if ! command -v gh >/dev/null 2>&1; then
+    printf '  gh not available; leaving all bodies in PENDING-followups/\n'
+    exit 2
+fi
+
+if ! gh auth status >/dev/null 2>&1; then
+    printf '  gh not authenticated; leaving all bodies in PENDING-followups/\n'
+    exit 2
+fi
+
+partial=0
+
+for body in "${pending_dir}"/[0-9]*-*.md; do
+    [ -f "${body}" ] || continue
+    if [ ! -s "${body}" ]; then
+        # Empty placeholder - the corresponding category had zero gaps in this run.
+        continue
+    fi
+
+    title="$(awk 'NR==1 && /^# /{sub(/^# /, ""); print; exit}' "${body}")"
+    if [ -z "${title}" ]; then
+        title="i18n: follow-up from issue #10 verification ($(basename "${body}" .md))"
+    fi
+
+    label_line="$(grep -oE '<!-- labels: [^>]+-->' "${body}" | head -1 || true)"
+    labels="$(printf '%s' "${label_line}" | sed -E 's/<!-- labels: //; s/ *-->//' || true)"
+    label_args=()
+    if [ -n "${labels}" ]; then
+        IFS=',' read -ra parts <<< "${labels}"
+        for label in "${parts[@]}"; do
+            label_args+=( --label "$(echo "${label}" | tr -d ' ')" )
+        done
+    fi
+
+    printf '  filing: %s\n' "${title}"
+    if url="$(gh issue create --repo salestech-group/MiroFish \
+        --title "${title}" \
+        --body-file "${body}" \
+        "${label_args[@]}" 2>&1)"; then
+        printf '%s\n' "${url}" >> "${urls_path}"
+        printf '    -> %s\n' "${url}"
+        rm -f "${body}"
+    else
+        printf '    !! gh issue create failed: %s\n' "${url}" >&2
+        partial=1
+    fi
+done
+
+if [ "${partial}" -eq 1 ]; then
+    exit 2
+fi
+exit 0
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/post_comment.sh
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/post_comment.sh
@ -0,0 +1,42 @@
+#!/usr/bin/env bash
+# Post comment-body.md as a comment on issue #10.
+#
+# Falls back to writing PENDING-issue-10-comment.md when gh is unavailable
+# or the post fails - exits non-zero in that case so the orchestrator can
+# downgrade its overall status.
+set -euo pipefail
+
+if [ "$#" -ne 1 ]; then
+    printf 'usage: %s <sha-dir>\n' "$0" >&2
+    exit 64
+fi
+
+sha_dir="$1"
+body="${sha_dir}/comment-body.md"
+if [ ! -f "${body}" ]; then
+    printf 'missing comment body: %s\n' "${body}" >&2
+    exit 1
+fi
+
+if ! command -v gh >/dev/null 2>&1; then
+    printf '  gh not available; writing PENDING-issue-10-comment.md\n'
+    cp "${body}" "${sha_dir}/PENDING-issue-10-comment.md"
+    exit 2
+fi
+
+if ! gh auth status >/dev/null 2>&1; then
+    printf '  gh not authenticated; writing PENDING-issue-10-comment.md\n'
+    cp "${body}" "${sha_dir}/PENDING-issue-10-comment.md"
+    exit 2
+fi
+
+if url="$(gh issue comment 10 --repo salestech-group/MiroFish --body-file "${body}" 2>&1)"; then
+    printf '%s\n' "${url}" > "${sha_dir}/comment-url.txt"
+    printf '  posted: %s\n' "${url}"
+    rm -f "${sha_dir}/PENDING-issue-10-comment.md"
+    exit 0
+fi
+
+printf '  gh post failed; writing PENDING-issue-10-comment.md\n'
+cp "${body}" "${sha_dir}/PENDING-issue-10-comment.md"
+exit 2
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/render_report.py
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/render_report.py
@ -0,0 +1,419 @@
+#!/usr/bin/env python3
+"""Render the gap report and the issue-#10 comment body.
+
+Inputs (from <sha-dir>):
+  classified.csv          - per-match classification rows.
+  parity.txt              - en/zh catalogue parity output.
+  cjk-grep-bucketed.txt   - human-readable bucketed grep output.
+
+Inputs (from repo):
+  .ticket/10.md           - snapshot of issue #10's body (used to mirror its checklist).
+
+Outputs (to <sha-dir>):
+  gap-report.md           - full structured report (seven sections).
+  comment-body.md         - markdown comment to be posted on issue #10.
+  PENDING-followups/01..04-*.md - one body per gap category (placeholders allowed).
+
+Usage:
+    python3 render_report.py <sha-dir> <commit-sha>
+"""
+from __future__ import annotations
+
+import csv
+import re
+import sys
+from collections import Counter, defaultdict
+from pathlib import Path
+from typing import Dict, List
+
+ISSUE_NUMBER = 10
+REPO_SLUG = "salestech-group/MiroFish"
+
+
+def load_rows(csv_path: Path) -> list[dict]:
+    with csv_path.open(encoding="utf-8", newline="") as fh:
+        return list(csv.DictReader(fh))
+
+
+def load_ticket_body(ticket_path: Path) -> str:
+    """Strip the YAML frontmatter and return the markdown body."""
+    text = ticket_path.read_text(encoding="utf-8")
+    if text.startswith("---\n"):
+        end = text.find("\n---\n", 4)
+        if end != -1:
+            return text[end + 5 :]
+    return text
+
+
+CHECKBOX_RE = re.compile(r"^(\s*)- \[ \] (.+)$")
+SUBBULLET_RE = re.compile(r"^(\s+)- (.+)$")
+
+
+def evidence_for_step(rows: list[dict], step: str) -> list[dict]:
+    """Return gap rows whose pipeline_step matches the given UI tag."""
+    return [r for r in rows if r["class"] == "gap" and r["pipeline_step"] == step]
+
+
+def render_section_5(ticket_body: str, rows: list[dict]) -> str:
+    """Map every checklist item from the ticket body to a status."""
+    gaps_by_step = defaultdict(list)
+    for row in rows:
+        if row["class"] == "gap":
+            gaps_by_step[row["pipeline_step"]].append(row)
+
+    out: list[str] = []
+    out.append("## Section 5 - Issue #10 checklist mapping\n")
+    out.append("Each line below is taken from the ticket body, with an explicit status.\n")
+
+    in_checklist = False
+    for line in ticket_body.splitlines():
+        match = CHECKBOX_RE.match(line)
+        if match:
+            in_checklist = True
+            indent, text = match.group(1), match.group(2)
+            status, note = status_for_checklist_item(text, gaps_by_step)
+            out.append(f"{indent}- [{('x' if status == 'pass' else ' ')}] **{status.upper()}** - {text}{note}")
+            continue
+
+        sub = SUBBULLET_RE.match(line)
+        if in_checklist and sub:
+            indent, text = sub.group(1), sub.group(2)
+            status, note = status_for_checklist_item(text, gaps_by_step)
+            out.append(f"{indent}- {status.upper()}: {text}{note}")
+            continue
+
+        if line.startswith("##") or line.startswith("---"):
+            in_checklist = False
+
+    return "\n".join(out) + "\n"
+
+
+def status_for_checklist_item(text: str, gaps_by_step: Dict[str, list]) -> tuple[str, str]:
+    """Return (status, suffix-note) for one checklist line.
+
+    Pure-UI items default to manual-pending in this run; items with a
+    backing pipeline-step that has gaps are reported as gap with a count.
+    """
+    lower = text.lower()
+    candidates: list[str] = []
+    if "graph build" in lower or "ontology" in lower:
+        candidates.append("Graph Build")
+    if "env setup" in lower or "agent profile" in lower or "profession" in lower:
+        candidates.append("Env Setup")
+    if "simulation" in lower or "tweet" in lower or "reddit" in lower or "sim config" in lower:
+        candidates.append("Simulation")
+    if "report" in lower:
+        candidates.append("Report")
+    if "interaction" in lower or "interview" in lower or "chat repl" in lower:
+        candidates.append("Interaction")
+    if "log" in lower:
+        candidates.append("Logs")
+
+    relevant_gaps = []
+    for step in candidates:
+        relevant_gaps.extend(gaps_by_step.get(step, []))
+
+    if "frontend ui" in lower or "no chinese strings on screen" in lower or "every label" in lower:
+        ui_gaps = gaps_by_step.get("UI", [])
+        if ui_gaps:
+            return ("gap", f" - {len(ui_gaps)} hard-coded CJK literal(s) in `frontend/src/views|components/`")
+        return ("manual-pending", " - live UI walkthrough not run in this sandbox")
+
+    if "locale propagation" in lower or "set_locale" in lower:
+        prop = gaps_by_step.get("Logs", [])
+        if prop:
+            return ("gap", f" - {len(prop)} CJK log strings on EN code path")
+        return ("manual-pending", " - locale-propagation runtime check not run in this sandbox")
+
+    if relevant_gaps:
+        return ("gap", f" - {len(relevant_gaps)} gap(s) classified, see Section 1/3")
+
+    if any(c in lower for c in ("ui", "screenshot", "chat", "modal", "tooltip", "render", "trace", "thinking")):
+        return ("manual-pending", " - requires live walkthrough")
+
+    return ("manual-pending", " - not verifiable statically; awaiting live run")
+
+
+def render_gap_report(rows: list[dict], ticket_body: str, parity_text: str, sha: str) -> str:
+    classes = Counter(r["class"] for r in rows)
+    gap_rows = [r for r in rows if r["class"] == "gap"]
+    gap_categories = Counter(r["category"] for r in gap_rows)
+    gap_steps = Counter(r["pipeline_step"] for r in gap_rows)
+
+    out: list[str] = []
+    out.append(f"# Verification gap report - i18n-e2e-english-verification\n")
+    out.append(f"**Commit:** `{sha}`\n")
+    out.append("")
+    out.append("## Overview\n")
+    out.append(f"- Total CJK matches audited: **{len(rows)}**")
+    out.append(f"- Class distribution: {format_counter(classes)}")
+    out.append(f"- Gap categories: {format_counter(gap_categories)}")
+    out.append(f"- Gap pipeline steps: {format_counter(gap_steps)}")
+    out.append("")
+
+    out.append("## Section 1 - Static CJK audit\n")
+    out.append("Canonical command (PCRE):\n")
+    out.append("```")
+    out.append('git grep -nIP "[\\x{4e00}-\\x{9fff}]" -- backend/app frontend/src locales/en.json')
+    out.append("```")
+    out.append("")
+    out.append(f"Raw output captured at `audit/{sha}/cjk-grep.txt` and bucketed at `audit/{sha}/cjk-grep-bucketed.txt`.")
+    out.append("")
+    out.append(f"`locales/en.json` CJK matches: **{sum(1 for r in rows if r['file'] == 'locales/en.json')}** (acceptance: zero).")
+    out.append("")
+    out.append("Top files by gap count:")
+    out.append("")
+    out.append("| File | Gap count |")
+    out.append("|------|-----------|")
+    by_file = Counter(r["file"] for r in gap_rows)
+    for file, count in by_file.most_common(15):
+        out.append(f"| `{file}` | {count} |")
+    out.append("")
+
+    out.append("## Section 2 - Locale catalogue parity\n")
+    out.append("```")
+    out.append(parity_text.strip())
+    out.append("```")
+    out.append("")
+
+    out.append("## Section 3 - LLM-prompt locale verification\n")
+    prompt_gaps = [r for r in gap_rows if r["category"] == "backend-prompt-label"]
+    out.append(f"Backend prompt-label gaps (CJK string literals inside services that compose LLM prompts): **{len(prompt_gaps)}**")
+    out.append("")
+    if prompt_gaps:
+        out.append("First 10 examples (file:line - match):")
+        out.append("")
+        for row in prompt_gaps[:10]:
+            out.append(f"- `{row['file']}:{row['line']}` - {row['match']}")
+        if len(prompt_gaps) > 10:
+            out.append(f"- ... and {len(prompt_gaps) - 10} more (see `classified.csv`)")
+        out.append("")
+    out.append(
+        "These prompts feed the LLM verbatim; CJK labels bias the model toward Chinese output even when "
+        "the requested locale is English."
+    )
+    out.append("")
+
+    out.append("## Section 4 - Locale propagation surface\n")
+    log_gaps = [r for r in gap_rows if r["category"] == "backend-log"]
+    out.append("| Boundary | Status | Evidence |")
+    out.append("|----------|--------|----------|")
+    out.append(
+        "| HTTP -> Flask handler | manual-pending | runtime not exercised in sandbox; static review showed no per-request locale carrier |"
+    )
+    out.append(
+        "| Flask handler -> Task worker | manual-pending | thread-local `set_locale` referenced in CLAUDE.md but not statically verified end-to-end |"
+    )
+    out.append(
+        f"| Task worker -> OASIS subprocess | manual-pending | subprocess boundary requires live run |"
+    )
+    out.append(
+        f"| Backend logger | {'gap' if log_gaps else 'pass'} | {len(log_gaps)} hard-coded CJK log line(s) on EN code path |"
+    )
+    out.append("")
+    if log_gaps:
+        out.append("First 10 backend-log gap examples:")
+        out.append("")
+        for row in log_gaps[:10]:
+            out.append(f"- `{row['file']}:{row['line']}` - {row['match']}")
+        out.append("")
+
+    out.append(render_section_5(ticket_body, rows))
+
+    out.append("## Section 6 - ZH regression check\n")
+    out.append(
+        "- Locale catalogues at full key parity (953 EN keys / 953 ZH keys, symmetric difference 0 - "
+        "see Section 2).\n"
+        "- No ZH-specific regression detected in static review. Live ZH walkthrough is `manual-pending`.\n"
+    )
+
+    out.append("## Section 7 - Follow-up plan\n")
+    out.append("Per R7.2, gaps are grouped into the following follow-up issues (placeholder bodies in `PENDING-followups/`):")
+    out.append("")
+    out.append(
+        f"1. **Frontend hard-coded UI strings** ({len(by_category(rows, 'frontend-ui-string'))} matches + "
+        f"{len(by_category(rows, 'frontend-regex-parser'))} regex parsers depending on CJK backend output)."
+    )
+    out.append(f"2. **Backend log strings** ({len(by_category(rows, 'backend-log'))} matches).")
+    out.append(f"3. **Backend LLM-prompt context labels** ({len(by_category(rows, 'backend-prompt-label'))} matches).")
+    out.append("4. **Permanent CI guard** (preventative - re-run this audit on every PR).")
+    out.append("")
+    out.append(
+        "Backend docstring/comment matches (the bulk of `deliberate` rows) are covered by the existing issue #7 and are not re-filed here."
+    )
+
+    return "\n".join(out) + "\n"
+
+
+def by_category(rows: list[dict], category: str) -> list[dict]:
+    return [r for r in rows if r["category"] == category and r["class"] == "gap"]
+
+
+def format_counter(c: Counter) -> str:
+    return ", ".join(f"{k}={v}" for k, v in c.most_common())
+
+
+def render_comment_body(rows: list[dict], ticket_body: str, sha: str) -> str:
+    classes = Counter(r["class"] for r in rows)
+    gap_rows = [r for r in rows if r["class"] == "gap"]
+    gap_categories = Counter(r["category"] for r in gap_rows)
+
+    out: list[str] = []
+    out.append(f"### Verification report - run on commit `{sha}`\n")
+    out.append("This run was produced by `.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.")
+    out.append("Captured artefacts live under `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.\n")
+    out.append("")
+    out.append(f"**Audit summary:** {sum(classes.values())} CJK matches across the auditable paths.")
+    out.append(f"- {classes.get('gap', 0)} `gap` (actionable, see follow-ups)")
+    out.append(f"- {classes.get('review-needed', 0)} `review-needed` (soft signal; needs human eyeball)")
+    out.append(f"- {classes.get('deliberate', 0)} `deliberate` (mostly backend docstrings/comments - covered by issue #7)")
+    out.append(
+        f"- {classes.get('non-applicable', 0)} `non-applicable` (binary file false positives - excluded)"
+    )
+    out.append("")
+    out.append(f"**Gap-category breakdown:** {format_counter(gap_categories)}")
+    out.append("")
+    out.append("---")
+    out.append("")
+    out.append("#### Issue checklist mapping")
+    out.append("")
+    out.append(render_section_5(ticket_body, rows))
+    out.append("---")
+    out.append("")
+    out.append("#### How to re-run")
+    out.append("")
+    out.append("```bash")
+    out.append("# from the repository root, on any commit:")
+    out.append("bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh")
+    out.append("# artefacts at .kiro/specs/i18n-e2e-english-verification/audit/<HEAD-sha>/")
+    out.append("```")
+    out.append("")
+    out.append(
+        "If `gh` is not authenticated when re-running, the comment body and follow-up bodies are written to "
+        "`PENDING-issue-10-comment.md` / `PENDING-followups/` for a human to post."
+    )
+    out.append("")
+    out.append("Out of scope for this run (per R5.3 / R7.3): live UI walkthrough, full Docker-Compose pipeline run, and any inline gap fixes.")
+    return "\n".join(out) + "\n"
+
+
+def render_followup_bodies(rows: list[dict], sha_dir: Path, sha: str) -> None:
+    pending_dir = sha_dir / "PENDING-followups"
+    pending_dir.mkdir(parents=True, exist_ok=True)
+
+    ui_gaps = by_category(rows, "frontend-ui-string") + by_category(rows, "frontend-regex-parser")
+    log_gaps = by_category(rows, "backend-log")
+    prompt_gaps = by_category(rows, "backend-prompt-label")
+
+    files = [
+        (
+            "01-frontend-ui-strings.md",
+            "i18n: replace hard-coded chinese ui strings in process and step components with i18n keys",
+            ui_gaps,
+            (
+                "Several `.vue` templates in `frontend/src/views/` and `frontend/src/components/` still emit "
+                "Chinese strings directly instead of routing them through `vue-i18n` keys. Some `Step4Report.vue` "
+                "regex parsers also rely on Chinese tokens emitted by the backend (so they will silently break "
+                "once the backend prompts are translated)."
+            ),
+            ["i18n", "bug"],
+        ),
+        (
+            "02-backend-log-strings.md",
+            "i18n: externalise remaining chinese log strings in flask api and utils",
+            log_gaps,
+            (
+                "After issue #6 externalised most backend log messages, a handful of `logger.info` / "
+                "`logger.error` call sites in `backend/app/api/graph.py` and `backend/app/utils/retry.py` "
+                "still hard-code Chinese strings, so backend logs leak Chinese under EN locale."
+            ),
+            ["i18n"],
+        ),
+        (
+            "03-backend-prompt-labels.md",
+            "i18n: translate chinese context labels inside llm-prompt assembly in backend services",
+            prompt_gaps,
+            (
+                "Several `services/*_generator.py` files compose LLM prompts that still embed Chinese "
+                "context labels (e.g. `\"事实信息:\"`, `\"相关实体:\"`) into the prompt string verbatim. These "
+                "labels bias the LLM toward Chinese output even when the requested locale is English."
+            ),
+            ["i18n"],
+        ),
+        (
+            "04-permanent-ci-guard.md",
+            "i18n: add a permanent ci guard that runs the e2e cjk audit on every pr",
+            [],
+            (
+                "Promote the audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` to "
+                "a permanent CI check. The guard should fail when `locales/en.json` contains any CJK character "
+                "and when the gap count regresses against a committed baseline."
+            ),
+            ["i18n", "enhancement"],
+        ),
+    ]
+
+    for name, title, gaps, summary, labels in files:
+        if not gaps and not name.startswith("04-"):
+            (pending_dir / name).write_text("", encoding="utf-8")
+            continue
+
+        body = [
+            f"# {title}",
+            "",
+            "## Summary",
+            "",
+            summary,
+            "",
+            "## Linked from",
+            "",
+            f"- Issue #{ISSUE_NUMBER} (verification report comment).",
+            f"- Spec: `.kiro/specs/i18n-e2e-english-verification/` at commit `{sha}`.",
+            "",
+            "## Evidence",
+            "",
+        ]
+        if gaps:
+            for row in gaps[:50]:
+                body.append(f"- `{row['file']}:{row['line']}` - {row['match']}")
+            if len(gaps) > 50:
+                body.append(f"- ... and {len(gaps) - 50} more (see `classified.csv` in the spec dir)")
+        else:
+            body.append("- (No gaps in this run; this is a preventative follow-up only.)")
+        body.append("")
+        body.append("## Acceptance")
+        body.append("")
+        body.append("- [ ] Each `file:line` above is fixed (or explicitly classified as `deliberate`).")
+        body.append("- [ ] Re-running `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh` shows zero gaps in this category.")
+        body.append("")
+        body.append(f"<!-- labels: {','.join(labels)} -->")
+        body.append("")
+        (pending_dir / name).write_text("\n".join(body), encoding="utf-8")
+
+
+def main(argv: list[str]) -> int:
+    if len(argv) != 3:
+        print(f"usage: {argv[0]} <sha-dir> <commit-sha>", file=sys.stderr)
+        return 64
+
+    sha_dir = Path(argv[1])
+    sha = argv[2]
+
+    rows = load_rows(sha_dir / "classified.csv")
+    parity_text = (sha_dir / "parity.txt").read_text(encoding="utf-8")
+    ticket_body = load_ticket_body(Path(".ticket/10.md"))
+
+    gap_report = render_gap_report(rows, ticket_body, parity_text, sha)
+    (sha_dir / "gap-report.md").write_text(gap_report, encoding="utf-8")
+
+    comment_body = render_comment_body(rows, ticket_body, sha)
+    (sha_dir / "comment-body.md").write_text(comment_body, encoding="utf-8")
+
+    render_followup_bodies(rows, sha_dir, sha)
+
+    print(f"  gap-report.md, comment-body.md, PENDING-followups/ written under {sha_dir}")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main(sys.argv))
--- a/.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh
+++ b/.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh
@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+# Orchestrate the i18n end-to-end verification audit.
+#
+# Reads working-tree state via git (no production-source modifications),
+# captures classified output under audit/<commit-sha>/, and posts the
+# verification report comment + follow-up issues via gh when available.
+#
+# Exit codes:
+#   0 - audit succeeded and all GitHub side effects applied
+#   1 - audit step failed (read-only producer aborted)
+#   2 - audit succeeded but at least one GitHub side effect was deferred to PENDING
+set -euo pipefail
+
+repo_root="$(git rev-parse --show-toplevel)"
+cd "$repo_root"
+
+spec_root=".kiro/specs/i18n-e2e-english-verification"
+scripts_dir="${spec_root}/audit/scripts"
+
+sha="$(git rev-parse HEAD)"
+sha_dir="${spec_root}/audit/${sha}"
+mkdir -p "${sha_dir}"
+
+printf 'Verification audit\n  repo: %s\n  sha:  %s\n  out:  %s\n\n' \
+    "${repo_root}" "${sha}" "${sha_dir}"
+
+ghs_exit=0
+
+step() {
+    local label="$1"
+    shift
+    printf '== %s ==\n' "${label}"
+    "$@"
+}
+
+step "audit_cjk.sh"      bash       "${scripts_dir}/audit_cjk.sh"      "${sha_dir}"
+step "check_parity.py"   python3    "${scripts_dir}/check_parity.py"   "${sha_dir}"
+step "classify.py"       python3    "${scripts_dir}/classify.py"       "${sha_dir}"
+step "render_report.py"  python3    "${scripts_dir}/render_report.py"  "${sha_dir}" "${sha}"
+
+# GitHub side effects: failures here downgrade the run to exit 2 but
+# do not abort the rest of the side effects.
+set +e
+step "post_comment.sh" bash "${scripts_dir}/post_comment.sh" "${sha_dir}"
+[ $? -ne 0 ] && ghs_exit=2
+
+step "file_followups.sh" bash "${scripts_dir}/file_followups.sh" "${sha_dir}"
+[ $? -ne 0 ] && ghs_exit=2
+set -e
+
+printf '\n== summary ==\n'
+printf 'sha-dir: %s\n' "${sha_dir}"
+if [ -f "${sha_dir}/comment-url.txt" ]; then
+    printf 'comment: %s\n' "$(cat "${sha_dir}/comment-url.txt")"
+else
+    printf 'comment: PENDING (see %s/PENDING-issue-10-comment.md)\n' "${sha_dir}"
+fi
+if [ -f "${sha_dir}/followup-urls.txt" ]; then
+    printf 'follow-ups posted:\n'
+    sed 's/^/  /' "${sha_dir}/followup-urls.txt"
+fi
+if compgen -G "${sha_dir}/PENDING-followups/[0-9]*-*.md" > /dev/null; then
+    printf 'follow-ups PENDING:\n'
+    for body in "${sha_dir}"/PENDING-followups/[0-9]*-*.md; do
+        if [ -s "${body}" ]; then
+            printf '  %s\n' "${body}"
+        fi
+    done
+fi
+
+exit "${ghs_exit}"
--- a/.kiro/specs/i18n-e2e-english-verification/design.md
+++ b/.kiro/specs/i18n-e2e-english-verification/design.md
@ -0,0 +1,560 @@
+# Design — i18n-e2e-english-verification
+
+## Overview
+
+**Purpose**: This spec produces a deterministic, re-runnable verification pass that proves (or disproves) the MiroFish 5-step pipeline runs cleanly in English, and posts a structured report on issue #10 with a `pass` / `gap` / `manual-pending` status per checklist item.
+
+**Users**: i18n maintainers reviewing the epic (#11), and any future verifier re-running the audit after subsequent merges. The deliverable is read by humans on GitHub (issue comment) and re-run by humans (or CI in a future iteration) to confirm parity.
+
+**Impact**: No production code is modified. The repository gains one new directory tree (`.kiro/specs/i18n-e2e-english-verification/`) containing the spec, the audit scripts, and the captured outputs. One GitHub comment is posted on #10. Up to four follow-up issues are filed.
+
+### Goals
+
+- Static-audit `backend/app`, `frontend/src`, `locales/en.json` for CJK characters; classify every match.
+- Verify EN / ZH locale catalogue parity and flag suspect untranslated entries.
+- Verify LLM-prompt assets respect the requested locale.
+- Document locale-propagation gaps across Flask → `Task` → OASIS subprocess → ReACT agent.
+- Post a single canonical comment on issue #10 with per-checklist statuses.
+- File follow-up issues for every gap (no inline fixes).
+- Make the audit re-runnable by capturing artefacts under `.kiro/specs/.../audit/<commit-sha>/`.
+
+### Non-Goals
+
+- Patching any `gap` discovered (R7.3 — strictly verification).
+- Performance / load testing.
+- Adding new locales beyond EN / ZH.
+- Building a permanent CI guard (filed as a follow-up issue, not implemented here).
+- Live UI / Docker walkthrough — captured as `manual-pending` in this run's report.
+
+## Boundary Commitments
+
+### This Spec Owns
+
+- The audit scripts and the captured audit outputs under `.kiro/specs/i18n-e2e-english-verification/audit/`.
+- The `gap-report.md` artefact and the comment body posted on issue #10.
+- The grouping rule for follow-up issues (one per category — UI strings, backend log strings, backend LLM-prompt labels, suggested CI guard).
+- The `pass` / `gap` / `manual-pending` / `review-needed` classification scheme.
+
+### Out of Boundary
+
+- Any modification of files under `backend/app/`, `frontend/src/`, or `locales/`.
+- Fixing the gaps the audit discovers — those land in their own follow-up issues.
+- Live UI walkthrough, Docker run, or LLM execution.
+- A permanent CI check — filed as a separate follow-up issue.
+
+### Allowed Dependencies
+
+- `git` (for `git grep`, capturing HEAD sha).
+- `gh` CLI (for the comment + follow-up issues; with documented fallback when unavailable).
+- `python3` (for the catalogue parity diff).
+- The repo working tree at HEAD of the working branch.
+
+### Revalidation Triggers
+
+- Any merge to `main` that touches `locales/`, `backend/app/`, or `frontend/src/` invalidates the captured audit; a re-run should produce a new `audit/<commit-sha>/` directory.
+- A change to issue #10's checklist body (e.g. a new sub-item) requires re-mapping in `gap-report.md`.
+- A change to the four follow-up categories (e.g. project decides to file one issue per file) requires re-running the issue-filing script with new grouping.
+
+## Architecture
+
+### Existing Architecture Analysis
+
+- The MiroFish backend is Flask + Python `Task` workers + an OASIS subprocess (per CLAUDE.md). i18n surfaces are: `vue-i18n` for the SPA, `locales/*.json` shared by both ends, a backend logger that resolves keys per locale, and inline LLM prompts in `backend/app/services/*.py`.
+- The verification pass does **not** hook into any of these — it reads files only. No Flask blueprint, no `Task` model, no Neo4j query.
+
+### Architecture Pattern & Boundary Map
+
+```mermaid
+graph TB
+    Verifier[Verifier shell entrypoint]
+    Audit[audit_cjk.sh]
+    Parity[check_parity.py]
+    Classify[classify.py]
+    Report[render_report.py]
+    Comment[post_comment.sh]
+    FollowUp[file_followups.sh]
+
+    Repo[Working tree]
+    Captures[audit slash sha slash]
+    GH[GitHub via gh CLI]
+
+    Verifier --> Audit
+    Verifier --> Parity
+    Audit --> Classify
+    Parity --> Classify
+    Classify --> Report
+    Report --> Captures
+    Report --> Comment
+    Report --> FollowUp
+    Audit --> Repo
+    Parity --> Repo
+    Comment --> GH
+    FollowUp --> GH
+```
+
+**Architecture Integration**:
+
+- **Selected pattern**: Linear pipeline of read-only scripts that each emit a single artefact, composed by a thin shell entrypoint. No mutable state outside `audit/<sha>/`.
+- **Domain boundaries**: `audit_cjk.sh` owns the raw grep; `check_parity.py` owns the catalogue diff; `classify.py` owns the four-class labels; `render_report.py` owns the comment body; `post_comment.sh` and `file_followups.sh` own GitHub side effects.
+- **Existing patterns preserved**: Shell + Python script pair (matches the project's existing `setup`/`run` style); no new test runner, no new linter.
+- **New components rationale**: Each script is single-purpose so failures (e.g. `gh` permission issues) are isolated and the pipeline can resume from the failed step.
+- **Steering compliance**: No production-code touch (R7.3); 4-space indent in any committed Python; double quotes; `snake_case`; reserved Bash exits with a non-zero status on any uncaught error.
+
+### Technology Stack
+
+| Layer | Choice / Version | Role in Feature | Notes |
+|-------|------------------|-----------------|-------|
+| CLI / Audit runner | Bash 5+, `git grep -P` (PCRE) | Run the canonical CJK audit | `\x{...}` ranges require PCRE — `git grep -E` will fail on this regex (verified). |
+| Static checks | Python 3.11 (project minimum per CLAUDE.md) | Catalogue parity + classification + report rendering | Standard library only — no new deps. |
+| GitHub integration | `gh` CLI | Post the comment, file follow-ups | Falls back to `audit/<sha>/PENDING-*` files when missing. |
+| Output formats | Plain text + Markdown | Captures + comment body | No HTML, no JSON beyond `gh`'s own. |
+
+## File Structure Plan
+
+### Directory Structure
+
+```
+.kiro/specs/i18n-e2e-english-verification/
+├── spec.json
+├── requirements.md
+├── gap-analysis.md
+├── research.md
+├── design.md
+├── tasks.md
+├── HANDOFF.md          # only if implementation hits the 3-cycle remediation cap
+└── audit/
+    ├── scripts/
+    │   ├── run_audit.sh          # entrypoint - chains the steps below
+    │   ├── audit_cjk.sh          # git grep PCRE + bucket counts
+    │   ├── check_parity.py       # locales/en.json vs zh.json key + identical-value diff
+    │   ├── classify.py           # apply 4-class labels to grep matches
+    │   ├── render_report.py      # produce gap-report.md + comment-body.md
+    │   ├── post_comment.sh       # gh issue comment 10 with comment-body.md (or PENDING-*)
+    │   └── file_followups.sh     # gh issue create per category (or PENDING-*)
+    └── <commit-sha>/             # captured outputs of one verification run
+        ├── cjk-grep.txt          # raw `git grep -nP ...` output
+        ├── cjk-grep-bucketed.txt # the same, partitioned by top-level path
+        ├── parity.txt            # en/zh diff summary
+        ├── classified.csv        # match-by-match label
+        ├── gap-report.md         # the canonical structured report
+        ├── comment-body.md       # the markdown posted to issue #10
+        ├── PENDING-issue-10-comment.md          # only if gh comment failed
+        └── PENDING-followups/                   # only if gh issue create failed
+            ├── 01-frontend-ui-strings.md
+            ├── 02-backend-log-strings.md
+            ├── 03-backend-prompt-labels.md
+            └── 04-permanent-ci-guard.md
+```
+
+### Modified Files
+
+- *(None.)* The spec explicitly forbids touching production source.
+
+## System Flows
+
+```mermaid
+sequenceDiagram
+    participant V as Verifier
+    participant Run as run_audit.sh
+    participant FS as Working tree
+    participant GH as GitHub
+
+    V->>Run: bash run_audit.sh
+    Run->>FS: git grep -nP, git rev-parse HEAD
+    FS-->>Run: cjk-grep.txt + sha
+    Run->>FS: read locales json
+    FS-->>Run: en/zh dicts
+    Run->>Run: classify
+    Run->>FS: write audit slash sha slash artefacts
+    Run->>GH: gh issue comment 10
+    alt gh succeeds
+        GH-->>Run: comment URL
+        Run->>GH: gh issue create x N follow-ups
+        GH-->>Run: issue URLs
+    else gh fails
+        Run->>FS: write PENDING markdown to audit slash sha slash
+    end
+    Run-->>V: exit 0 success or exit 2 PENDING
+```
+
+**Key flow decisions**:
+
+- The audit always writes the captured artefacts to disk first (idempotent, re-runnable). The GitHub side effects are the *last* steps so any earlier failure leaves a complete capture for inspection.
+- A non-zero `gh` exit shifts the pipeline to PENDING mode rather than failing the whole run; the script exits `2` to flag "audit ran but GitHub side-effects didn't apply".
+
+## Requirements Traceability
+
+| Requirement | Summary | Components | Interfaces / Artefacts | Flows |
+|-------------|---------|------------|------------------------|-------|
+| 1.1 | Run canonical `git grep` | audit_cjk.sh | `cjk-grep.txt` | Audit step |
+| 1.2 | Classify each match | classify.py | `classified.csv` | Audit step |
+| 1.3 | Record file:line + step tag for `gap` | classify.py | `classified.csv` (`step` column) | Audit step |
+| 1.4 | No file modifications during audit | run_audit.sh | scripts are read-only | — |
+| 1.5 | `en.json` CJK = always `gap` | classify.py | hard rule in classifier | Audit step |
+| 2.1 | Enumerate keys recursively | check_parity.py | `parity.txt` | Audit step |
+| 2.2 | Missing-key gaps recorded | check_parity.py | `parity.txt` (missing-key block) | Audit step |
+| 2.3 | EN catalogue CJK = `gap` | check_parity.py | `parity.txt` (cjk-in-en block) | Audit step |
+| 2.4 | EN/ZH identical = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | Audit step |
+| 2.5 | No catalogue edits | check_parity.py | read-only stdlib JSON load | — |
+| 3.1 | Enumerate prompt files | classify.py (heuristic — known files list) | `gap-report.md` Section 3 | — |
+| 3.2 | Confirm locale-aware or EN-only | classify.py | `gap-report.md` Section 3 | — |
+| 3.3 | Hard-coded ZH directive = `gap` | classify.py | `classified.csv` (`category=prompt-label`) | — |
+| 3.4 | #3, #4, #5 prompts post-merge check | classify.py | `gap-report.md` Section 3 | — |
+| 4.1 | Identify handoff boundaries | render_report.py | `gap-report.md` Section 4 | — |
+| 4.2 | Confirm explicit or re-derived locale | render_report.py | `gap-report.md` Section 4 | — |
+| 4.3 | Silent default = `gap` | classify.py | `classified.csv` (`category=propagation`) | — |
+| 4.4 | Backend logger EN under EN | classify.py | `classified.csv` (`category=backend-log`) | — |
+| 5.1 | Comment lists every checklist item | render_report.py | `comment-body.md` | Comment-post |
+| 5.2 | Each `gap` includes file:line + follow-up link | render_report.py | `comment-body.md` | Comment-post |
+| 5.3 | `manual-pending` items state repro steps | render_report.py | `comment-body.md` | Comment-post |
+| 5.4 | Comment includes raw audit (or path) | render_report.py | `comment-body.md` (path reference) | Comment-post |
+| 5.5 | Post via `gh issue comment 10` | post_comment.sh | `comment-body.md` | Comment-post |
+| 6.1 | ZH covers every EN key | check_parity.py | (already passes per gap-analysis) | — |
+| 6.2 | Locale-aware prompts symmetric | render_report.py | `gap-report.md` Section 6 | — |
+| 6.3 | EN-only ZH value = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | — |
+| 6.4 | ZH regression filed as gap | classify.py | `classified.csv` | — |
+| 7.1 | File issue per gap | file_followups.sh | `gh issue create` | Follow-up |
+| 7.2 | Group by category | file_followups.sh | one body per category in `PENDING-followups/` | Follow-up |
+| 7.3 | No production-code edits | run_audit.sh | only writes under `.kiro/specs/.../` | — |
+| 7.4 | Label follow-ups `i18n` | file_followups.sh | `gh issue create --label i18n` | Follow-up |
+| 7.5 | Fallback inline list when no `gh` | file_followups.sh | `PENDING-followups/*.md` | Follow-up |
+| 8.1 | Capture raw output | run_audit.sh | `audit/<sha>/` directory | Audit step |
+| 8.2 | Preserve previous run | run_audit.sh | `<sha>` subdirectory naming | Audit step |
+| 8.3 | Record HEAD sha | run_audit.sh | `git rev-parse HEAD` | Audit step |
+| 8.4 | Idempotent re-run | run_audit.sh | re-running on same sha overwrites that sha's dir | Audit step |
+
+## Components and Interfaces
+
+| Component | Domain | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
+|-----------|--------|--------|--------------|--------------------------|-----------|
+| run_audit.sh | Verification pipeline | Compose the audit and route artefacts | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 | git (P0), python3 (P0), gh (P1) | Batch |
+| audit_cjk.sh | Static audit | Run `git grep -nP` and bucket | 1.1, 1.5 | git (P0) | Batch |
+| check_parity.py | Catalogue diff | Diff en/zh + identical-value heuristic | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 | python3 stdlib (P0) | Batch |
+| classify.py | Classification | Apply the 4-class label per match | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 | cjk-grep.txt (P0), parity.txt (P0) | Batch |
+| render_report.py | Report assembly | Produce gap-report.md + comment-body.md | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 | classified.csv (P0) | Batch |
+| post_comment.sh | GitHub side-effect | Post the comment on #10 | 5.5 | gh (P0), comment-body.md (P0) | Service |
+| file_followups.sh | GitHub side-effect | Open follow-up issues | 7.1, 7.2, 7.4, 7.5 | gh (P0), PENDING-followups/* (P0) | Service |
+
+### Verification pipeline
+
+#### `run_audit.sh`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Single shell entrypoint that runs every step in order and persists artefacts under `audit/<commit-sha>/` |
+| Requirements | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 |
+
+**Responsibilities & Constraints**
+
+- Must NOT modify any file outside `.kiro/specs/i18n-e2e-english-verification/`.
+- Must capture HEAD sha before any other step (so the artefact path is set).
+- Must exit `0` on full success (audit + GitHub side effects) and `2` on PENDING (audit succeeded, side effects didn't).
+- Must be safely re-runnable on the same sha (overwriting that sha's directory is acceptable).
+
+**Dependencies**
+
+- Inbound: invoked manually by the verifier (`bash run_audit.sh`) — Criticality: P0.
+- Outbound: `audit_cjk.sh`, `check_parity.py`, `classify.py`, `render_report.py`, `post_comment.sh`, `file_followups.sh` — Criticality: P0 each.
+- External: `git`, `python3`, `gh` (P1 — fallback supported).
+
+**Contracts**: Service [ ] / API [ ] / Event [ ] / Batch [x] / State [ ]
+
+##### Batch / Job Contract
+
+- **Trigger**: manual `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.
+- **Input / validation**: working tree at any commit; rejects detached non-clean trees? — no, the audit reads tracked files only via `git grep`, so unstaged edits are ignored deliberately.
+- **Output / destination**: `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.
+- **Idempotency & recovery**: Re-running on the same sha overwrites that sha's directory. PENDING outputs survive across runs until a `gh`-enabled run replaces them.
+
+**Implementation Notes**
+
+- Integration: invoked by humans only — no CI hookup in this spec.
+- Validation: confirm `gh auth status` before attempting comment/issue posts; on failure, branch to PENDING.
+- Risks: shell quoting around the PCRE pattern (`[\x{4e00}-\x{9fff}]`) — use single-quoted argument to `git grep -P`.
+
+#### `audit_cjk.sh`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Run the canonical PCRE grep + per-bucket counts |
+| Requirements | 1.1, 1.5 |
+
+**Responsibilities & Constraints**
+
+- Output: `cjk-grep.txt` (raw `git grep -nP` lines) and `cjk-grep-bucketed.txt` (one section per top-level path: `backend/app`, `frontend/src`, `locales/en.json`).
+- Excludes binary file matches (e.g. `.jpeg` false positives).
+
+**Dependencies**
+
+- Inbound: `run_audit.sh` (P0).
+- External: `git` 2.x (P0 — must support `-P` for PCRE).
+
+**Contracts**: Batch [x]
+
+##### Batch / Job Contract
+
+- **Trigger**: invoked by `run_audit.sh`.
+- **Input / validation**: receives the target output directory as argv[1]; aborts if missing.
+- **Output / destination**: `cjk-grep.txt`, `cjk-grep-bucketed.txt` in `<sha>/`.
+- **Idempotency & recovery**: deterministic — same tree → same output.
+
+**Implementation Notes**
+
+- Integration: pure read-only against `git`.
+- Validation: `git --version` precondition; abort with a clear error if PCRE unsupported.
+- Risks: ripgrep is NOT used (avoids a hard `rg` dependency); `git grep -P` is built-in to git's PCRE2 binding.
+
+#### `check_parity.py`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Compare `locales/en.json` and `locales/zh.json`: key parity, CJK in EN, identical-value heuristic |
+| Requirements | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 |
+
+**Responsibilities & Constraints**
+
+- Recursively flattens nested-dict keys with dotted paths.
+- Reports three blocks: `missing-keys`, `cjk-in-en`, `identical-values`.
+- Treats values as `review-needed` only if (a) en value == zh value, (b) value is non-empty, (c) value is more than two ASCII words.
+
+**Dependencies**
+
+- Inbound: `run_audit.sh` (P0).
+- External: `json` from Python stdlib (P0).
+
+**Contracts**: Batch [x]
+
+##### Batch / Job Contract
+
+- **Trigger**: invoked by `run_audit.sh` with the `<sha>` directory as argv[1].
+- **Input / validation**: reads `locales/en.json` and `locales/zh.json` from cwd (must be invoked from repo root); fails fast on JSON parse error.
+- **Output / destination**: `parity.txt` in `<sha>/`.
+- **Idempotency & recovery**: pure function of catalogue contents.
+
+**Implementation Notes**
+
+- Integration: invoked from repo root so relative paths resolve.
+- Validation: parse-on-load, both files must be objects.
+- Risks: the "more than two ASCII words" heuristic may produce noise — `review-needed` is intentionally a soft label not a `gap`.
+
+#### `classify.py`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Apply the 4-class label (`deliberate` / `gap` / `non-applicable` / `review-needed`) and a category tag per match |
+| Requirements | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 |
+
+**Responsibilities & Constraints**
+
+- Reads `cjk-grep.txt` and `parity.txt`; emits `classified.csv` with columns: `file`, `line`, `match`, `class`, `category`, `pipeline_step`.
+- Categories (closed set): `frontend-ui-string`, `frontend-regex-parser`, `backend-docstring`, `backend-comment`, `backend-log`, `backend-prompt-label`, `propagation`, `catalogue-parity`, `binary-false-positive`.
+- Pipeline-step tags (closed set): `Graph Build`, `Env Setup`, `Simulation`, `Report`, `Interaction`, `Logs`, `UI`, `n/a`.
+- Classification rules:
+  - `locales/en.json` CJK → always `gap` / `catalogue-parity` / `n/a` (R1.5).
+  - File path under `frontend/src/views/` or `frontend/src/components/` AND match is inside a string literal (heuristic: enclosed in `'…'`/`"…"`/`` `…` ``) → `gap` / `frontend-ui-string`.
+  - Match inside a `text.match(/.../)` call in a `.vue` file → `frontend-regex-parser` / `gap` (cause: backend emits CJK).
+  - Backend `.py` file, line starts with `#` or appears inside a triple-quoted docstring → `deliberate-blocked-by-#7` / `backend-docstring` (or `backend-comment`) — counted but not filed as a fresh follow-up since #7 already covers it.
+  - Backend `.py` file, line contains `logger.`, `log.`, `print(` and CJK in a string literal → `gap` / `backend-log` / appropriate step tag.
+  - Backend `.py` file in `services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py` and CJK appears inside an LLM-prompt context label (heuristic: a string literal not preceded by `#`) → `gap` / `backend-prompt-label`.
+  - Binary files (e.g. `.jpeg` ripgrep matches): `non-applicable` / `binary-false-positive`.
+  - Anything else: `review-needed` (forces a human look).
+
+**Dependencies**
+
+- Inbound: `audit_cjk.sh`, `check_parity.py` (P0).
+- External: `csv` from Python stdlib.
+
+**Contracts**: Batch [x]
+
+##### Batch / Job Contract
+
+- **Trigger**: invoked by `run_audit.sh` after the two preceding steps.
+- **Input / validation**: `cjk-grep.txt` and `parity.txt` must exist in `<sha>/`.
+- **Output / destination**: `classified.csv`.
+- **Idempotency & recovery**: deterministic — same inputs → same csv.
+
+**Implementation Notes**
+
+- Integration: classification rules are heuristics, not a parser; correctness is bounded by careful regexes and an explicit "fallthrough = `review-needed`" rule.
+- Validation: every input row produces an output row (no silent drops); a count-equality assertion runs at the end.
+- Risks: false negatives (e.g. a Chinese log string that doesn't contain `logger.` on the same line) — `review-needed` fallthrough catches these.
+
+#### `render_report.py`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Produce `gap-report.md` and `comment-body.md` |
+| Requirements | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 |
+
+**Responsibilities & Constraints**
+
+- `gap-report.md`: Sections: Overview, Section 1 (static audit), Section 2 (parity), Section 3 (prompt verification), Section 4 (propagation), Section 5 (issue-#10 checklist mapping), Section 6 (ZH regression), Section 7 (follow-up plan).
+- `comment-body.md`: Markdown comment for issue #10 — mirrors the issue's checklist with `pass` / `gap` / `manual-pending` for each line, plus a "How to re-run" footer.
+- Reads `classified.csv` and the issue body (snapshot at `.ticket/10.md`).
+
+**Dependencies**
+
+- Inbound: `classify.py` (P0), `.ticket/10.md` (P0).
+- External: Python stdlib only.
+
+**Contracts**: Batch [x]
+
+##### Batch / Job Contract
+
+- **Trigger**: `run_audit.sh` after `classify.py`.
+- **Input / validation**: `classified.csv` and `.ticket/10.md` must exist.
+- **Output / destination**: `gap-report.md`, `comment-body.md` in `<sha>/`.
+- **Idempotency & recovery**: deterministic.
+
+**Implementation Notes**
+
+- Integration: the comment body must include a `Run on commit <sha>` header so the comment is traceable.
+- Validation: confirm every issue-body checkbox has been mapped (count check).
+- Risks: rendering CJK characters in markdown — Python writes UTF-8 by default; comment body is verified to round-trip via `gh`.
+
+#### `post_comment.sh`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Post `comment-body.md` as a comment on issue #10 |
+| Requirements | 5.5 |
+
+**Responsibilities & Constraints**
+
+- `gh issue comment 10 --repo salestech-group/MiroFish --body-file <sha>/comment-body.md`.
+- On non-zero exit, copies the body to `<sha>/PENDING-issue-10-comment.md` and exits non-zero.
+
+**Dependencies**
+
+- External: `gh` (P0; degrades to PENDING when missing).
+
+**Contracts**: Service [x]
+
+##### Service Interface
+
+```text
+post_comment.sh <sha-dir>
+  precondition: <sha-dir>/comment-body.md exists
+  postcondition (success): comment posted; URL printed to stdout
+  postcondition (failure): <sha-dir>/PENDING-issue-10-comment.md present; exit code 2
+```
+
+**Implementation Notes**
+
+- Integration: must be the second-to-last step (so failures don't block the issue-filing fallback).
+- Validation: parses `gh`'s URL output and writes it to `<sha>/comment-url.txt` on success.
+- Risks: PR-time rate limits — unlikely for a single comment.
+
+#### `file_followups.sh`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Open one follow-up issue per gap category |
+| Requirements | 7.1, 7.2, 7.4, 7.5 |
+
+**Responsibilities & Constraints**
+
+- Iterates `<sha>/PENDING-followups/*.md` (which `render_report.py` always writes; the ones whose category had zero gaps stay empty placeholders).
+- For each non-empty body, runs `gh issue create --repo salestech-group/MiroFish --title <title> --body-file <body> --label i18n`.
+- On `gh` failure for any single category, leaves the corresponding `PENDING-followups/<n>-*.md` in place and exits non-zero at the end (after attempting all categories).
+
+**Dependencies**
+
+- External: `gh` (P0; degrades to PENDING).
+
+**Contracts**: Service [x]
+
+##### Service Interface
+
+```text
+file_followups.sh <sha-dir>
+  precondition: <sha-dir>/PENDING-followups/*.md exist (possibly empty placeholders)
+  postcondition (success): all non-empty bodies posted; URLs appended to <sha-dir>/followup-urls.txt; bodies removed from PENDING-followups/
+  postcondition (partial): URLs in followup-urls.txt for the ones that posted; the rest stay in PENDING-followups/; exit code 2
+```
+
+**Implementation Notes**
+
+- Integration: must be the last step.
+- Validation: post-hoc count check (`gh` URLs + remaining PENDING bodies = total categories).
+- Risks: a category that the spec already considers covered (e.g. backend docstrings → blocked by #7) is not re-filed; the spec's category list is closed and excludes that case.
+
+## Data Models
+
+### Domain Model
+
+The audit operates on three logical concepts:
+
+- **Match** — a single line of `git grep` output. `(file, line, raw_text)`.
+- **Classification** — `(match, class ∈ {deliberate, gap, non-applicable, review-needed}, category ∈ closed-set, pipeline_step ∈ closed-set)`.
+- **Follow-up** — `(category, title, body, status ∈ {posted, pending}, url?)`.
+
+Invariant: every `Match` produces exactly one `Classification`; every `Classification` with `class == gap` belongs to exactly one `Follow-up` category (which may aggregate multiple gaps).
+
+### Logical Data Model
+
+**`classified.csv` schema** (CSV, UTF-8, header row):
+
+| Column | Type | Notes |
+|--------|------|-------|
+| `file` | string | repo-relative path |
+| `line` | int | 1-indexed |
+| `match` | string | trimmed grep line |
+| `class` | enum | `deliberate` / `gap` / `non-applicable` / `review-needed` |
+| `category` | enum | closed set listed in classify.py rules |
+| `pipeline_step` | enum | closed set listed in classify.py rules |
+
+Natural key: `(file, line)`.
+
+**`parity.txt` structure** (text, three labelled blocks):
+
+```
+[missing-keys]
+en-only:  <key.path>
+zh-only:  <key.path>
+[cjk-in-en]
+<key.path>: <value snippet>
+[identical-values]
+<key.path>: <value>   # review-needed if non-trivial English prose
+```
+
+### Data Contracts & Integration
+
+- **`comment-body.md`** must be valid GitHub-flavoured Markdown; checkbox lines preserve the issue's original ordering.
+- **Follow-up issue body** must be valid GitHub-flavoured Markdown; first line is a one-sentence summary; subsequent sections are: `## Evidence` (file:line list), `## Linked from` (#10 + comment URL), `## Acceptance` (a small checklist).
+
+## Error Handling
+
+### Error Strategy
+
+- **Read-only operations** (steps 1–4): on any uncaught error (missing file, JSON parse error), the script aborts with a non-zero exit before any artefact is half-written. The orchestrator uses `set -euo pipefail`.
+- **GitHub side effects** (steps 5–6): wrapped — failure routes to PENDING outputs and the orchestrator exits `2`.
+
+### Error Categories and Responses
+
+- **User errors**: invoked from wrong directory → fail fast with "must be run from repo root".
+- **System errors**: `git`/`python3`/`gh` missing → fail fast with "install <tool>"; `gh auth status` not OK → branch to PENDING.
+- **Business errors**: classification produces 0 matches but `cjk-grep.txt` non-empty → assertion failure (count-equality bug).
+
+### Monitoring
+
+- The orchestrator prints a one-line status per step.
+- Final summary block to stdout: total matches, gaps, `manual-pending`, follow-ups posted vs PENDING.
+
+## Testing Strategy
+
+- **Unit tests**: not introduced — the scripts are simple enough that a one-shot dry run on the live tree is the canonical validation.
+- **Integration test**: a single `bash run_audit.sh` against the working tree; success criteria below.
+- **Validation checklist** (run during implementation):
+  - The audit produces a non-empty `cjk-grep.txt`.
+  - `parity.txt` reports 0 missing keys (matches the live state at HEAD).
+  - `classified.csv` row count == `cjk-grep.txt` line count.
+  - `gap-report.md` and `comment-body.md` parse as valid markdown (manual eyeball — no toolchain required).
+  - The classifier marks every `locales/en.json` CJK as `gap` (currently zero such matches, so this asserts the negative).
+  - With `gh` available: a comment is posted on #10 and follow-up issues are created.
+  - With `gh` simulated as absent (e.g. `PATH=/dev/null`): PENDING outputs appear under `<sha>/`.
+
+### Out of scope for testing
+
+- The live UI walkthrough is `manual-pending` (R5.3) and not part of the test plan.
+- Performance, scalability, security: nothing to test — read-only single-shot scripts.
--- a/.kiro/specs/i18n-e2e-english-verification/gap-analysis.md
+++ b/.kiro/specs/i18n-e2e-english-verification/gap-analysis.md
@ -0,0 +1,136 @@
+# Gap Analysis — i18n-e2e-english-verification
+
+## 1. Current state investigation
+
+### Domain-relevant assets in the repo
+
+| Concern | Location | Notes |
+|---|---|---|
+| Locale catalogues | `locales/en.json`, `locales/zh.json`, `locales/languages.json` | Flat-namespaced JSON, loaded by `vue-i18n` and the backend logger. |
+| Frontend i18n loader | `frontend/src/i18n/` | Provides `useI18n()` to components. |
+| Frontend UI surface | `frontend/src/views/`, `frontend/src/components/` | Step1–5 components + `Process.vue` orchestrator. |
+| Backend logger | `backend/app/utils/logger.py` (per CLAUDE.md) | Externalised log messages (#6 work). |
+| Locale helpers | `backend/app/utils/` | Per CLAUDE.md, locale propagation lives here. |
+| Prompt assets that emit user-visible text | `backend/app/services/ontology_generator.py` (#2, #3?), `oasis_profile_generator.py` (#3), `simulation_config_generator.py` (#4), `report_agent.py` (#5) | Prompts are inline Python strings, not separate files. |
+| Pipeline boundaries | `backend/app/api/*.py` (Flask), `services/simulation_runner.py` + `simulation_ipc.py` (subprocess), `services/report_agent.py` (ReACT) | Locale must propagate across all of these. |
+
+### Project conventions surfaced
+
+- `Task` model used for any long-running operation (CLAUDE.md). Verification doesn't introduce one — it is a one-shot batch.
+- Reasoning-model output stripping convention exists, irrelevant here.
+- Per-project `group_id` isolation in Neo4j — verification queries should NOT touch Neo4j; we run a static audit only.
+- "Match the surrounding file's style" (no enforced formatter).
+
+### Live audit baseline (commit `9dcaecd`)
+
+```
+git grep -nP "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json | wc -l
+→ 2918 lines across 36 files
+```
+
+Bucketed:
+
+| Bucket | Files | Lines | Notes |
+|---|---|---|---|
+| `locales/en.json` | 0 | 0 | ✅ clean |
+| `frontend/src/views/Process.vue` | 1 | 65 | hard-coded UI strings (template + JS literals), not i18n keys |
+| `frontend/src/components/Step{2,3,4,5}*.vue` | 4 | ~50 (mostly Step4Report.vue regex parsers) | depends-on-backend regex parsers + a few literals |
+| `backend/app/services/*.py` | 13 | majority | docstrings + comments + a few prompt assembly fragments + agent context labels (e.g. `"事实信息:"` in `oasis_profile_generator.py`) |
+| `backend/app/api/*.py` | 4 | many | docstrings + comments + log-message Chinese (`build_logger.info(f"[{task_id}] 开始构建图谱...")` etc) |
+| `backend/app/utils/*.py` | 7 | many | docstrings + comments + log strings (e.g. `retry.py` "函数 {func} 在 N 次重试后仍失败") |
+| `backend/app/models/*.py` | 3 | docstrings | docstrings only (probably) |
+
+### Locale catalogue parity (Python check)
+
+```
+en keys: 953
+zh keys: 953
+symmetric diff: 0
+```
+
+→ R2 (parity) passes. ZH backfill (#8) closed the gap and en/zh are now lock-step.
+
+### Boundary review surface (R4)
+
+- `backend/app/api/graph.py` `build_logger.info(f"[{task_id}] 开始构建图谱...")` shows the backend logger is still emitting Chinese on the build path — this is exactly the kind of leak #6 was supposed to externalise.
+- `backend/app/utils/retry.py` `logger.error(f"函数 {func.__name__} 在 {max_retries} 次重试后仍失败...")` — same: log strings remain hard-coded Chinese.
+- ReACT/agent context labels in `oasis_profile_generator.py` (`"事实信息:"`, `"相关实体:"`) feed directly into the LLM prompt — these will bias the model toward Chinese output.
+
+## 2. Requirements feasibility
+
+### Mapping requirements → existing assets
+
+| Req | Need | Existing asset | Gap tag |
+|---|---|---|---|
+| R1 (static audit) | run `git grep` and capture output | git, ripgrep | None — straightforward |
+| R1.5 (`en.json` CJK check) | inspect catalogue | already at 0 hits | None — passes |
+| R2 (parity) | enumerate keys recursively, diff | small Python script | None — already passes |
+| R3 (prompt verification) | read prompt strings in `services/*.py` | inline Python strings | **Constraint** — prompts are inline, not standalone files; verification must read source not assets |
+| R4 (propagation) | trace locale across Flask → Task → OASIS → ReACT | source code review | **Research needed** in design phase: where exactly is locale stored today? CLAUDE.md hints `set_locale` thread-local exists but path not yet read |
+| R5 (post comment) | `gh issue comment 10` | `gh` CLI | None |
+| R6 (ZH regression) | confirm zh values are non-English | small Python script | None |
+| R7 (file follow-ups) | `gh issue create` | `gh` CLI | None |
+| R8 (capture & idempotence) | write under `.kiro/specs/.../audit/` | filesystem | None |
+
+### Complexity signals
+
+- Algorithmic: trivial — grep + count + diff.
+- Workflow: post a comment + open follow-up issues — one-shot.
+- External integrations: GitHub via `gh`. No DB, no Neo4j, no LLM calls.
+
+### Constraints from existing architecture
+
+- **No code edits to `backend/app/`, `frontend/src/`, `locales/`** — the spec is verification-only. The change-set is confined to `.kiro/specs/i18n-e2e-english-verification/` (audit captures, gap report, follow-up issue list) and any commit message / PR description.
+- Manual UI walkthrough is not feasible in a sandboxed CLI — must be marked `manual-pending` per R5.3.
+- Live `docker-compose up` likewise unavailable — same handling.
+
+## 3. Implementation approach options
+
+### Option A — Pure shell + Python script kept under `.kiro/specs/.../audit/`
+
+- A single Bash + Python pipeline that emits `audit/cjk-grep.txt`, `audit/parity.txt`, `audit/gap-report.md`.
+- Posts the comment via `gh` and opens follow-ups via `gh issue create`.
+- Scripts are read-only against production source.
+
+✅ Simplest, no production-code touch.
+✅ Easy to re-run.
+❌ Scripts only relevant to this ticket — scoped to `.kiro/specs/.../audit/scripts/`, not promoted to a reusable `tools/`.
+
+### Option B — Build a reusable `tools/i18n-audit/` checker
+
+- Create a permanent CLI under `tools/` so future verifiers can re-run.
+- Integrates with CI (could become a check that fails when `en.json` contains CJK).
+
+❌ Adds a tool & directory the project doesn't have. Scope creep — the spec is for one verification pass, not a CI check.
+❌ A reusable tool wants its own ticket; ramming it in here violates the "no inline fixes" rule.
+
+### Option C — Hybrid: ad-hoc script for this run, plus open a follow-up issue requesting the reusable CI check
+
+- Run the verification with disposable scripts (Option A) AND file a follow-up issue asking for the reusable CI check (Option B as a future ticket).
+
+✅ Keeps current ticket scoped.
+✅ Captures the value of B without bloating this PR.
+
+## 4. Out-of-scope items deferred
+
+- Any **production code edits** that would close gaps. R7 makes this explicit.
+- Live UI walkthrough / dynamic verification — captured as `manual-pending` in the report.
+
+## 5. Effort & risk
+
+- **Effort**: S (1 day) — auditing scripts + report writing + issue filings.
+- **Risk**: Low — read-only operations, no architectural change, the failure mode (`gh` lacking permissions) is handled by R7.5 (fallback inline list).
+
+## 6. Recommendations for design phase
+
+- **Preferred approach**: Option C (hybrid).
+- **Key decisions to make in design**:
+  - Concrete script layout under `.kiro/specs/i18n-e2e-english-verification/audit/`.
+  - Format of `audit/gap-report.md` (the artefact echoed into the issue comment).
+  - Exact follow-up issue grouping rule (R7.2): one issue per pipeline step? per file? per category (UI / logs / prompts / docstrings)?
+  - Reproducibility (R8.2): do we keep `audit/<commit-sha>/` per run, or `audit/latest/` + `audit/previous/`?
+  - Whether the scripts are committed to the repo (they live under `.kiro/specs/...` — yes by default) or only the captured outputs.
+- **Research items to carry forward**:
+  - Read `backend/app/utils/` to confirm whether a locale helper / `set_locale` exists today (R4 detail).
+  - Read `backend/app/utils/logger.py` to confirm where externalised log keys live and how the locale is selected at log time (R4 + Step-1 logs checklist item).
+  - Confirm whether any `services/*.py` Chinese match is part of an LLM **prompt** vs a comment — only prompt matches block R3.
--- a/.kiro/specs/i18n-e2e-english-verification/requirements.md
+++ b/.kiro/specs/i18n-e2e-english-verification/requirements.md
@ -0,0 +1,122 @@
+# Requirements Document
+
+## Project Description (Input)
+Issue #10: i18n end-to-end verification of full pipeline. Run a verification pass to prove the entire 5-step pipeline (Graph Build, Env Setup, Simulation, Report, Interaction) works cleanly in English, with locale propagating across Flask routes, background tasks, OASIS subprocess, Graphiti/Neo4j, and the ReACT report agent. Produce a verification report (posted as a comment on issue #10) summarising pass/fail per checklist item and listing any leftover Chinese strings as `file:line` refs. Run the static audit `git grep -nE "[\\x{4e00}-\\x{9fff}]" -- backend/app frontend/src locales/en.json` and confirm only deliberately-kept Chinese remains. File any newly discovered gaps as follow-up issues (do NOT patch silently in this ticket). Acceptance: all checklist items pass for both EN and ZH; report posted; no surprise Chinese in EN paths. Out of scope: fixing newly discovered gaps inline; perf/load testing; new locales beyond EN/ZH.
+
+## Introduction
+
+This spec covers the final verification pass for the i18n epic (#11). After issues #2–#9, #12 land, the entire 5-step MiroFish pipeline must demonstrably run in English — UI, background work, LLM-generated artifacts (ontologies, agent profiles, sim configs, reports, chat replies), and backend logs — without any unintended Chinese leaking into English-locale paths. The pass also regression-checks that switching locale back to Chinese still produces fully Chinese output. Because the pipeline crosses a Flask app, background `Task` workers, an OASIS subprocess, Graphiti/Neo4j, and a ReACT report agent, the verification has both a static (grep + locale-file) component and a dynamic (live walkthrough of Step 1 → 5) component.
+
+The deliverables are: (a) a static audit + categorization of any remaining Chinese strings under English paths, (b) a verification report posted as a comment on issue #10 summarising pass/fail per checklist item with `file:line` evidence, and (c) follow-up GitHub issues for every gap found — fixes are explicitly **out of scope** here.
+
+## Boundary Context
+
+- **In scope**:
+  - Static audit (`git grep` for CJK Unified Ideographs) of `backend/app/`, `frontend/src/`, and `locales/en.json`.
+  - Inspection of locale catalogues (`locales/en.json`, `locales/zh.json`) for parity, key coverage, and accidental Chinese in the EN catalogue.
+  - Inspection of LLM-prompt assets that drive Step 1–5 outputs (ontology, profile, sim-config, report-agent prompts) to confirm they emit English under EN locale.
+  - Inspection of locale propagation paths: HTTP request → Flask handler → `Task` background worker → OASIS subprocess → ReACT agent.
+  - Verification report posted as a comment on issue #10.
+  - Follow-up issues filed for every gap found.
+- **Out of scope**:
+  - Fixing any newly discovered gaps inline in this ticket — they are filed as separate issues.
+  - Performance or load testing.
+  - Adding new locales beyond EN/ZH.
+  - The live UI walkthrough with screenshots, when no human or browser is available — the static audit results plus prompt/locale-catalogue evidence stand in. The verification report explicitly marks UI-only checklist items as "manual-pending" if not run live.
+- **Adjacent expectations**:
+  - Closes the i18n epic #11 once #12 also lands.
+  - Depends on (and re-verifies) the work in #2, #3, #4, #5, #6, #8, #9, #12.
+
+## Requirements
+
+### Requirement 1: Static CJK audit of English code paths
+
+**Objective:** As an i18n verifier, I want a deterministic grep-based audit of files that should be English-only, so that any Chinese leaking into the EN-locale code path is detected and recorded.
+
+#### Acceptance Criteria
+
+1. The Verification System shall execute `git grep -nE "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json` and capture every match with `file:line` precision.
+2. The Verification System shall classify each match as one of: (a) `deliberate` (e.g. test fixture demonstrating ZH input, doc example, comment explicitly retained per project convention), (b) `gap` (unintended Chinese in EN-facing code), or (c) `non-applicable` (false positive such as a regex character class).
+3. When a match is classified as `gap`, the Verification System shall record `file:line`, the Chinese substring, and the affected pipeline step (Graph Build / Env Setup / Simulation / Report / Interaction / Logs / UI).
+4. The Verification System shall not modify any matched file as part of this audit; remediation is filed as a follow-up issue per Requirement 7.
+5. While the audit is running, the Verification System shall additionally inspect `locales/en.json` for entries whose value contains CJK characters and report those separately (an EN catalogue value containing Chinese is always a `gap`).
+
+### Requirement 2: Locale catalogue parity check
+
+**Objective:** As an i18n verifier, I want to confirm that the EN and ZH catalogues stay in lockstep, so that switching locale never falls back to a missing key or leaks the other locale.
+
+#### Acceptance Criteria
+
+1. The Verification System shall enumerate the key set of `locales/en.json` and `locales/zh.json` (recursively across nested objects) and compute the symmetric difference.
+2. If a key is present in `en.json` but missing from `zh.json` (or vice versa), the Verification System shall record the missing key path and treat it as a `gap`.
+3. If any value in `en.json` contains a CJK character, the Verification System shall record it as a `gap` (as in Requirement 1.5).
+4. If any value in `zh.json` is identical to its `en.json` counterpart and the EN value is non-trivial English prose (more than two ASCII words), the Verification System shall flag it as a candidate untranslated entry — these are reported as `review-needed`, not auto-classified `gap`, since some technical terms (URLs, identifiers, single tokens) legitimately stay identical.
+5. The Verification System shall not edit either catalogue file as part of this check.
+
+### Requirement 3: LLM-prompt locale verification
+
+**Objective:** As an i18n verifier, I want to confirm that every LLM prompt that drives a Step 1–5 output respects the requested locale, so that ontology entries, agent profiles, simulation configs, report prose, and chat replies render in the user's selected language.
+
+#### Acceptance Criteria
+
+1. The Verification System shall enumerate the prompt files that produce user-visible output for Steps 1–5 (e.g. ontology generator, OASIS profile generator, simulation-config generator, report agent prompts, interview chat).
+2. For each prompt file, the Verification System shall confirm that it either (a) is fully English with an explicit "respond in ${locale}" directive, or (b) is rendered through a locale-aware template that injects the active locale.
+3. If a prompt file hard-codes a Chinese-only directive (e.g. "请用中文回答") on the EN code path, the Verification System shall record it as a `gap`.
+4. The Verification System shall confirm that the prompt files referenced by issues #3, #4, #5 are no longer Chinese-only post-merge; if any still are, they are recorded as `gap` blocking #10.
+
+### Requirement 4: Locale propagation surface review
+
+**Objective:** As an i18n verifier, I want to confirm that the active locale survives every process boundary, so that an EN request still produces EN output after it crosses into a `Task` worker, the OASIS subprocess, or the ReACT agent.
+
+#### Acceptance Criteria
+
+1. The Verification System shall identify each handoff boundary: HTTP → Flask handler, Flask handler → `Task` worker, `Task` worker → OASIS subprocess, ReACT agent → tool calls.
+2. For each handoff, the Verification System shall confirm that the locale is either (a) carried explicitly in the call payload / kwargs, or (b) re-derived deterministically (e.g. from per-project config, `Accept-Language` header, or `set_locale` thread-local equivalent) on the receiving side.
+3. If a boundary discards the locale and the receiving side defaults silently to Chinese (or any non-EN locale) under an EN request, the Verification System shall record the boundary as a `gap`.
+4. The Verification System shall examine the backend logger to confirm that log messages on the EN code path resolve to English templates (depends on #6).
+
+### Requirement 5: Verification report comment on issue #10
+
+**Objective:** As the issue owner, I want a single canonical verification report posted as a comment on issue #10, so that reviewers can see pass/fail per checklist item and trace every `gap` to a `file:line` and a follow-up issue.
+
+#### Acceptance Criteria
+
+1. When the static audit, parity check, prompt verification, and propagation review are complete, the Verification System shall compose a markdown comment on issue #10 that lists every checklist item from the ticket body with one of the statuses `pass` / `gap` / `manual-pending`.
+2. For each `gap` status, the comment shall include `file:line` references and a link to the follow-up issue filed per Requirement 7.
+3. For each `manual-pending` status, the comment shall state explicitly that the item requires a live UI walkthrough (or full-stack run) which was not performed in this verification environment, and shall list the exact reproduction steps the next reviewer needs to run.
+4. The comment shall include the raw output (or a path to the captured output) of the `git grep` audit so future verifiers can diff against the baseline.
+5. The Verification System shall post the comment using `gh issue comment 10 --repo salestech-group/MiroFish` and shall record the resulting comment URL in the spec / commit message.
+
+### Requirement 6: ZH regression check
+
+**Objective:** As an i18n verifier, I want to confirm that the ZH locale still renders fully Chinese, so that the EN work has not regressed the original-language experience.
+
+#### Acceptance Criteria
+
+1. The Verification System shall confirm that `locales/zh.json` covers every key present in `locales/en.json` (Requirement 2) so that no UI string falls back to English under ZH.
+2. The Verification System shall confirm that prompts rendered through locale-aware templates produce a Chinese variant when locale=zh (i.e. the templating mechanism is symmetric between EN and ZH).
+3. If a UI string is English-only under ZH (i.e. `zh.json` value is identical to the EN value and the value is non-trivial English prose), the Verification System shall flag it per Requirement 2.4 as `review-needed`.
+4. The Verification System shall record any ZH-specific regression as a separate `gap` and file a follow-up issue per Requirement 7.
+
+### Requirement 7: Follow-up issues for every discovered gap
+
+**Objective:** As the project owner, I want every gap discovered in this verification pass tracked as its own GitHub issue, so that fixes are sequenced separately and #10 stays scoped to verification only.
+
+#### Acceptance Criteria
+
+1. When a `gap` is recorded by Requirements 1–6, the Verification System shall file a GitHub issue against `salestech-group/MiroFish` containing: a one-sentence summary, the affected pipeline step, the `file:line` evidence, and a link back to issue #10 and to the verification report comment.
+2. If grouping is sensible (e.g. five `gap`s in a single locale-catalogue file), the Verification System shall consolidate them into a single follow-up issue with a checklist body, instead of filing five micro-issues.
+3. The Verification System shall not patch any gap inline in this ticket; the spec change-set must be limited to the verification artefacts (spec docs + report capture under `.kiro/specs/i18n-e2e-english-verification/`) and must not modify production source files under `backend/app/`, `frontend/src/`, or `locales/`.
+4. The Verification System shall label every follow-up issue with the `i18n` label (and `bug` if the gap is regressing existing behaviour) so they aggregate under the i18n epic.
+5. If the verification environment cannot file issues (e.g. no `gh` permissions), the Verification System shall list the would-be issues inline in the verification report as a fallback so a human can file them, and shall mark the corresponding checklist item `gap-pending-issue` instead of `gap`.
+
+### Requirement 8: Reproducibility and idempotence
+
+**Objective:** As a future verifier, I want this verification pass to be re-runnable, so that we can re-baseline after each subsequent merge to the i18n epic.
+
+#### Acceptance Criteria
+
+1. The Verification System shall capture the raw audit output to `.kiro/specs/i18n-e2e-english-verification/audit/` so the next verifier can diff against the previous run.
+2. While a previous capture exists, the Verification System shall preserve it (timestamped or under a `previous/` subdirectory) rather than overwriting it silently.
+3. The Verification System shall record the commit SHA at the time of the audit so the report comment can be tied to a specific tree state.
+4. If the audit is re-run and the gap set is unchanged, the Verification System shall produce a no-op report comment that confirms parity rather than spamming a new gap list.
--- a/.kiro/specs/i18n-e2e-english-verification/research.md
+++ b/.kiro/specs/i18n-e2e-english-verification/research.md
@ -0,0 +1,112 @@
+# Research & Design Decisions — i18n-e2e-english-verification
+
+## Summary
+
+- **Feature**: `i18n-e2e-english-verification`
+- **Discovery Scope**: Extension (verification-only against existing i18n surface)
+- **Key Findings**:
+  - `locales/en.json` is already CJK-clean (0 hits) and `locales/zh.json` is at perfect parity (953/953 keys).
+  - Bulk of remaining CJK is in backend Python source (~26 files across `services/`, `api/`, `utils/`, `models/`) — overwhelmingly docstrings, comments, and a non-trivial number of log strings + LLM-prompt context labels. This is blocked by issue #7 (translate Chinese docstrings/comments).
+  - Frontend `Process.vue` still has ~65 hard-coded Chinese strings in template/JS literals (not routed through `t()` keys); 4 step components have a smaller surface (mainly Step4Report's regex parsers that match Chinese backend output).
+  - Live UI/full-stack walkthrough is not feasible in this sandboxed CLI environment — that portion of the verification will be reported as `manual-pending` with reproduction steps.
+
+## Research Log
+
+### Audit baseline
+
+- **Context**: R1 requires running the canonical `git grep` audit and bucketing the matches.
+- **Sources consulted**: ripgrep / `git grep -P` against the working tree at `9dcaecd` (HEAD of `docs/i18n-9-translate-frontend-comments`).
+- **Findings**:
+  - Total CJK lines: **2918** across **36** files (counting 2 binary `.jpeg` false positives that ripgrep matches when scanning the assets folder).
+  - Bucket distribution: `locales/en.json` 0 / `frontend/src` 7 files (5 source + 2 binary) / `backend/app` 29 files.
+  - The shell-style regex `[\x{4e00}-\x{9fff}]` in the issue body must be passed to `git grep` with `-P` (PCRE) — POSIX ERE rejects `\x{...}` ranges. The verification scripts must use `-P` or document the deviation.
+- **Implications**: The audit script must use PCRE; binary files should be excluded explicitly so the `.jpeg` false positives do not pollute the gap report.
+
+### Locale-catalogue parity
+
+- **Context**: R2 demands key-set parity between `en.json` and `zh.json`.
+- **Sources consulted**: small Python diff over the catalogues (recursive nested-dict key flattening).
+- **Findings**: 953 keys each, symmetric difference 0. Already passing.
+- **Implications**: R2.1, R2.2 will trivially pass; R2.4 (untranslated-but-identical entries) still needs running.
+
+### Locale propagation surface
+
+- **Context**: R4 requires confirming that locale survives Flask handler → `Task` → OASIS subprocess → ReACT agent.
+- **Sources consulted**: `backend/app/api/graph.py`, `backend/app/services/` skim, CLAUDE.md (mentions `set_locale` thread-local).
+- **Findings**:
+  - `backend/app/api/graph.py` line 385 etc still emit Chinese log strings inline (`build_logger.info(f"[{task_id}] 开始构建图谱...")`) — the log externalisation work (#6) didn't reach these call sites.
+  - `backend/app/utils/retry.py` log strings are still hard-coded Chinese (`logger.error(f"函数 {func.__name__} ...")`).
+  - `oasis_profile_generator.py` LLM-prompt context labels (`"事实信息:"`, `"相关实体:"`) feed into the agent prompt verbatim — these will bias the LLM toward Chinese output even under EN locale.
+- **Implications**: R4.3 (locale discarded silently → defaults non-EN) has live evidence; multiple `gap` items will be filed.
+
+## Architecture Pattern Evaluation
+
+| Option | Description | Strengths | Risks / Limitations | Notes |
+|--------|-------------|-----------|---------------------|-------|
+| Pure shell + Python script (Option A) | One-shot scripts in `.kiro/specs/.../audit/scripts/` produce `audit/<sha>/*.txt` and `audit/<sha>/gap-report.md` | Simplest; no production-code touch; easy to re-run; fits R8 capture format | Scoped to this ticket — not a permanent CI guard | Selected |
+| Reusable `tools/i18n-audit/` CLI (Option B) | Promote the audit to a permanent project tool wired into CI | Long-term safety net; future PRs would fail on regressions | Out of scope per R7.3 (verification-only); adds new top-level directory | Filed as a follow-up issue, not implemented here |
+| Hybrid (Option C) | Run Option A now; file an issue requesting Option B as future work | Captures B's value without bloating this PR | None material | Adopted |
+
+## Design Decisions
+
+### Decision: Audit lives entirely under `.kiro/specs/i18n-e2e-english-verification/`
+
+- **Context**: R7.3 forbids modifying production source in this ticket; the verification artefacts (scripts and captures) need a home.
+- **Alternatives considered**:
+  1. Top-level `tools/i18n-audit/` — rejected (creates a long-lived asset out of a one-shot ticket).
+  2. `scripts/` next to existing project scripts — rejected (project has no convention for verification scripts; `.kiro/specs/` is the canonical home for spec-scoped work).
+  3. `.kiro/specs/.../audit/` — selected.
+- **Selected approach**: Scripts at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` and outputs at `.kiro/specs/.../audit/<commit-sha>/`.
+- **Rationale**: Co-locates spec, requirements, design, and the artefacts a future verifier needs to re-run the pass. Honours the steering rule that the spec dir is the source of truth for spec-scoped state.
+- **Trade-offs**: Scripts aren't reused beyond this ticket. Re-runs require checking out the spec dir (which is committed).
+- **Follow-up**: File a follow-up issue suggesting Option B (a permanent CI guard) for the next iteration of the i18n epic.
+
+### Decision: Manual UI walkthrough → `manual-pending`, not `gap`
+
+- **Context**: R5.3 already permits `manual-pending` when a checklist item requires running the live stack. This run is sandboxed CLI — no browser, no Docker.
+- **Alternatives considered**:
+  1. Mark UI items `gap` because they weren't proven — rejected (a `gap` is a *known* failure; UI items are simply untested in this run).
+  2. Skip them silently — rejected (R5.1 requires every checklist item to have a status).
+  3. Mark `manual-pending` with reproduction steps — selected.
+- **Rationale**: Honest about the verification environment's limits. Future verifiers can flip `manual-pending` to `pass` or `gap` after running the live walkthrough.
+- **Trade-offs**: Issue #10 cannot be fully closed by this run alone; the verification-pass comment will say so explicitly.
+
+### Decision: Gap classification = (deliberate / gap / non-applicable / review-needed)
+
+- **Context**: R1.2 lists three classes; R2.4 introduces a fourth (`review-needed`).
+- **Alternatives considered**:
+  1. Three-class only — rejected (forces premature decisions on identical en/zh values).
+  2. Four-class with explicit semantics — selected.
+- **Rationale**: A four-class scheme keeps the `gap` count truthful (it counts only known-bad lines), and `review-needed` is a soft signal that a human should re-check.
+- **Trade-offs**: Slightly more complex schema; mitigated by documenting the four labels at the top of `gap-report.md`.
+
+### Decision: Follow-up grouping by category, not by file
+
+- **Context**: R7.2 allows consolidation. There are too many CJK-bearing files (29) to file one issue each.
+- **Alternatives considered**:
+  1. One issue per file — rejected (29 micro-issues).
+  2. One issue per pipeline step (R1.3 step tag) — feasible but cross-cuts existing per-component issues like #7.
+  3. One issue per **gap category** — selected: (a) frontend hard-coded UI strings, (b) backend log strings, (c) backend LLM-prompt context labels, (d) recommend a permanent CI check.
+- **Rationale**: Categories already align with how the i18n epic broke down work (#3, #4, #5, #6 = LLM-prompts; #7 = docstrings/comments; #9 = frontend comments). Categories also map cleanly to single PRs, which is how subsequent fixes will land.
+- **Trade-offs**: Some files appear in multiple categories. Mitigated by listing `file:line` evidence inside each category issue.
+
+### Decision: Issue-comment fallback when `gh` is unavailable
+
+- **Context**: R7.5 mandates a fallback if `gh` permissions are missing.
+- **Selected approach**: If `gh` posts fail, the script writes the comment body to `audit/<sha>/PENDING-issue-10-comment.md` and the would-be follow-up issue bodies to `audit/<sha>/PENDING-followups/*.md` so a human can paste them.
+- **Rationale**: Keeps the audit re-runnable offline; keeps the artefact set faithful to what *would* have been posted.
+- **Trade-offs**: Verification doesn't truly close until a human posts. Surfaced loudly in the run-summary.
+
+## Risks & Mitigations
+
+- **Risk**: A `gap` is mis-classified as `non-applicable` (e.g. a regex character class versus a real Chinese label) → Mitigation: classification tracked in a small CSV alongside the raw grep, so re-classification is auditable.
+- **Risk**: `gh` rate limits hit when filing follow-ups → Mitigation: file at most 4 follow-ups (one per category) — far below any rate limit.
+- **Risk**: Re-running the audit on a divergent branch produces a noisy diff → Mitigation: `audit/<commit-sha>/` directories preserve history; comparison is opt-in via `diff -ru`.
+- **Risk**: Live walkthrough never happens, leaving #10 in `manual-pending` indefinitely → Mitigation: the verification report comment names a concrete "next reviewer" reproduction script; `manual-pending` items have explicit acceptance criteria.
+
+## References
+
+- Issue #10 — https://github.com/salestech-group/MiroFish/issues/10
+- Epic #11 — https://github.com/salestech-group/MiroFish/issues/11
+- `gap-analysis.md` — bucketed audit baseline
+- `requirements.md` — EARS acceptance criteria for this spec
--- a/.kiro/specs/i18n-e2e-english-verification/spec.json
+++ b/.kiro/specs/i18n-e2e-english-verification/spec.json
@ -0,0 +1,24 @@
+{
+  "feature_name": "i18n-e2e-english-verification",
+  "created_at": "2026-05-07T18:25:18Z",
+  "updated_at": "2026-05-07T18:25:18Z",
+  "language": "en",
+  "phase": "tasks-generated",
+  "ticket": 10,
+  "ticket_url": "https://github.com/salestech-group/MiroFish/issues/10",
+  "approvals": {
+    "requirements": {
+      "generated": true,
+      "approved": true
+    },
+    "design": {
+      "generated": true,
+      "approved": true
+    },
+    "tasks": {
+      "generated": true,
+      "approved": true
+    }
+  },
+  "ready_for_implementation": true
+}
--- a/.kiro/specs/i18n-e2e-english-verification/tasks.md
+++ b/.kiro/specs/i18n-e2e-english-verification/tasks.md
@ -0,0 +1,87 @@
+# Tasks — i18n-e2e-english-verification
+
+## 1. Foundation — audit workspace and entrypoint
+
+- [x] 1.1 Create the audit script directory and the read-only orchestrator skeleton
+  - Establish `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` with a `run_audit.sh` skeleton that uses `set -euo pipefail`.
+  - The orchestrator captures HEAD sha (`git rev-parse HEAD`) and creates `.kiro/specs/i18n-e2e-english-verification/audit/<sha>/` as the artefact root.
+  - Observable completion: running `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh` from repo root creates an empty `audit/<sha>/` directory and exits `0`.
+  - _Requirements: 1.4, 7.3, 8.1, 8.2, 8.3, 8.4_
+  - _Boundary: run_audit.sh_
+
+## 2. Core — read-only audit producers
+
+- [x] 2.1 (P) Implement the canonical CJK grep with PCRE
+  - `audit_cjk.sh` runs `git grep -nP '[\x{4e00}-\x{9fff}]' -- backend/app frontend/src locales/en.json` and writes the raw output to `<sha>/cjk-grep.txt`.
+  - Produces a partitioned `<sha>/cjk-grep-bucketed.txt` with one section per top-level path (`backend/app`, `frontend/src`, `locales/en.json`).
+  - Excludes binary file matches (e.g. `.jpeg`) by skipping paths whose `git check-attr` reports `binary` (or by file-extension allowlist if check-attr is unset).
+  - Observable completion: `<sha>/cjk-grep.txt` contains exactly the same lines as a manual `git grep -nP …` run, and `<sha>/cjk-grep-bucketed.txt` has the three labelled sections with line counts.
+  - _Requirements: 1.1, 1.5_
+  - _Boundary: audit_cjk.sh_
+
+- [x] 2.2 (P) Implement the locale-catalogue parity diff
+  - `check_parity.py` loads `locales/en.json` and `locales/zh.json`, recursively flattens nested-dict keys with dotted paths, and writes `<sha>/parity.txt` with three labelled blocks: `[missing-keys]`, `[cjk-in-en]`, `[identical-values]`.
+  - The `[identical-values]` block flags entries only when EN value equals ZH value AND the value is non-empty AND has more than two ASCII words.
+  - Observable completion: `<sha>/parity.txt` exists; on the current tree `[missing-keys]` is empty and `[cjk-in-en]` is empty (matching the gap-analysis baseline).
+  - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3_
+  - _Boundary: check_parity.py_
+
+- [x] 2.3 Implement the four-class classifier
+  - `classify.py` consumes `<sha>/cjk-grep.txt` and `<sha>/parity.txt` and writes `<sha>/classified.csv` with columns `file,line,match,class,category,pipeline_step`.
+  - Implements the closed-set rules from design.md "classify.py": `locales/en.json` CJK → `gap`/`catalogue-parity`; `frontend/src/{views,components}/*.vue` string literal → `gap`/`frontend-ui-string`; `text.match(/.../)` regex pattern with CJK → `gap`/`frontend-regex-parser`; `.py` line starting with `#` or inside a triple-quoted block → `deliberate`/`backend-{comment,docstring}`; `.py` `logger.|log.|print(` line with CJK in a string literal → `gap`/`backend-log` with appropriate step tag; `.py` LLM-prompt label in `services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py` → `gap`/`backend-prompt-label`; binary file → `non-applicable`/`binary-false-positive`; everything else → `review-needed`.
+  - Asserts row-count equality with the input grep (no silent drops).
+  - Observable completion: `<sha>/classified.csv` row count == `cjk-grep.txt` line count, and at least one row of each non-empty class is present (verified by counting per-class rows in stdout summary).
+  - _Requirements: 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4_
+  - _Boundary: classify.py_
+  - _Depends: 2.1, 2.2_
+
+## 3. Core — report assembly
+
+- [x] 3.1 Render the gap report and the issue-#10 comment body
+  - `render_report.py` reads `<sha>/classified.csv` and `.ticket/10.md`; writes `<sha>/gap-report.md` (with the seven sections from design.md) and `<sha>/comment-body.md` (mirroring the issue's checklist with `pass`/`gap`/`manual-pending` per line + a "How to re-run" footer + a `Run on commit <sha>` header).
+  - Section 4 of `gap-report.md` enumerates the four propagation boundaries and reports each as `pass`/`gap`/`unknown`, with file:line evidence drawn from `classified.csv`.
+  - Section 5 maps every checklist item from `.ticket/10.md` to a `pass` / `gap` / `manual-pending` status. UI-checklist items default to `manual-pending` (live walkthrough not feasible in sandbox) and include a concrete reproduction script.
+  - Always writes the four follow-up issue body templates to `<sha>/PENDING-followups/`: `01-frontend-ui-strings.md`, `02-backend-log-strings.md`, `03-backend-prompt-labels.md`, `04-permanent-ci-guard.md` — empty placeholder if the corresponding category had zero `gap` rows.
+  - Observable completion: `<sha>/gap-report.md`, `<sha>/comment-body.md`, and `<sha>/PENDING-followups/01..04-*.md` all exist; opening `<sha>/comment-body.md` shows every checkbox from `.ticket/10.md` mapped to a status.
+  - _Requirements: 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2_
+  - _Boundary: render_report.py_
+
+## 4. Integration — orchestrator and GitHub side effects
+
+- [x] 4.1 Wire run_audit.sh to the four producer steps and add the GitHub posting hooks
+  - `run_audit.sh` invokes (in order) `audit_cjk.sh`, `check_parity.py`, `classify.py`, `render_report.py`, then `post_comment.sh` and `file_followups.sh`.
+  - On any error in steps 1-4 the orchestrator aborts (`set -euo pipefail`) before any subsequent step runs.
+  - On `gh` failure in steps 5 or 6, the orchestrator continues to the next step but exits `2` at the end (audit succeeded, side effects didn't fully apply).
+  - Observable completion: a clean run on the current tree creates a complete `<sha>/` directory; if `gh` is forced absent (e.g. `PATH=$(pwd)/empty bash run_audit.sh`), the orchestrator still produces all four producer artefacts and the `PENDING-followups/` and exits with `2`.
+  - _Requirements: 1.4, 7.3, 8.1, 8.2, 8.3, 8.4_
+  - _Boundary: run_audit.sh_
+  - _Depends: 2.3, 3.1_
+
+- [x] 4.2 Implement post_comment.sh and file_followups.sh with PENDING fallback
+  - `post_comment.sh` calls `gh issue comment 10 --repo salestech-group/MiroFish --body-file <sha>/comment-body.md`; on failure it copies the body to `<sha>/PENDING-issue-10-comment.md` and exits non-zero. On success it writes the resulting URL to `<sha>/comment-url.txt`.
+  - `file_followups.sh` iterates `<sha>/PENDING-followups/*.md`; for each non-empty body it calls `gh issue create --repo salestech-group/MiroFish --title <title-from-body-first-line> --body-file <body> --label i18n` (and `--label bug` when the body's frontmatter declares regression). On per-category failure it leaves that body in place; on success it removes the body and appends the issue URL to `<sha>/followup-urls.txt`.
+  - Observable completion: with `gh` available, the comment URL appears in `<sha>/comment-url.txt` and any non-empty follow-up body produces an issue URL in `<sha>/followup-urls.txt`; with `gh` absent, both bodies stay under `<sha>/PENDING-*` and exit codes are non-zero.
+  - _Requirements: 5.5, 7.1, 7.2, 7.4, 7.5_
+  - _Boundary: post_comment.sh, file_followups.sh_
+  - _Depends: 3.1_
+
+## 5. Validation — execute the verification pass
+
+- [x] 5.1 Execute the audit on the current tree and capture a baseline run
+  - Run `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh` from repo root.
+  - Confirm `<sha>/cjk-grep.txt`, `cjk-grep-bucketed.txt`, `parity.txt`, `classified.csv`, `gap-report.md`, `comment-body.md`, and `PENDING-followups/01..04-*.md` all exist and are non-empty (the placeholders for empty categories may be empty by design).
+  - Confirm `parity.txt` `[missing-keys]` and `[cjk-in-en]` blocks are empty (matches the gap-analysis baseline).
+  - Confirm `classified.csv` row count matches `cjk-grep.txt` line count exactly.
+  - Observable completion: the baseline `<sha>/` directory is committed under `.kiro/specs/i18n-e2e-english-verification/audit/`.
+  - _Requirements: 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 8.1, 8.3_
+  - _Boundary: run_audit.sh and producer scripts_
+  - _Depends: 4.1_
+
+- [x] 5.2 Post the comment on issue #10 and file the follow-up issues
+  - Run `post_comment.sh <sha-dir>` and `file_followups.sh <sha-dir>` (or rely on `run_audit.sh` to invoke them) so the verification report comment is posted and follow-up issues are filed for non-empty categories.
+  - Capture `comment-url.txt` and `followup-urls.txt` under `<sha>/` so the PR description can link to them.
+  - If `gh` lacks permissions for any of the calls, the corresponding `PENDING-*` file is left in place per R7.5; the run summary surfaces the partial state.
+  - Observable completion: a comment appears on https://github.com/salestech-group/MiroFish/issues/10 mirroring `comment-body.md`; follow-up issues for non-empty categories exist and carry the `i18n` label.
+  - _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 6.4, 7.1, 7.2, 7.4, 7.5_
+  - _Boundary: post_comment.sh, file_followups.sh_
+  - _Depends: 4.2, 5.1_
				`@ -0,0 +1 @@`
				`https://github.com/salestech-group/MiroFish/issues/10#issuecomment-4400060417`