Merge pull request #27 from salestech-group/chore/i18n-10-e2e-english-verification

chore(i18n): add e2e english verification spec, audit, and report
This commit is contained in:
Dominik Seemann 2026-05-08 11:06:46 +02:00 committed by GitHub
commit d53f3110dd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
21 changed files with 11002 additions and 0 deletions

View File

@ -0,0 +1,60 @@
### Verification report - run on commit `9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd`
This run was produced by `.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.
Captured artefacts live under `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.
**Audit summary:** 2916 CJK matches across the auditable paths.
- 237 `gap` (actionable, see follow-ups)
- 380 `review-needed` (soft signal; needs human eyeball)
- 2299 `deliberate` (mostly backend docstrings/comments - covered by issue #7)
- 0 `non-applicable` (binary file false positives - excluded)
**Gap-category breakdown:** backend-prompt-label=143, frontend-ui-string=49, frontend-regex-parser=36, backend-log=9
---
#### Issue checklist mapping
## Section 5 - Issue #10 checklist mapping
Each line below is taken from the ticket body, with an explicit status.
- [ ] **GAP** - **Frontend UI** — every label, button, modal, error toast, and tooltip in EN. No Chinese strings on screen. - 29 hard-coded CJK literal(s) in `frontend/src/views|components/`
- [ ] **GAP** - **Step 1 — Graph Build** - 5 gap(s) classified, see Section 1/3
- MANUAL-PENDING: Status messages in EN - not verifiable statically; awaiting live run
- GAP: Ontology JSON descriptions in EN (depends on #2) - 14 gap(s) classified, see Section 1/3
- GAP: Backend logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Step 2 — Env Setup** - 61 gap(s) classified, see Section 1/3
- GAP: Generated agent profiles (`bio`, `persona`, `profession`, `interested_topics`) in EN (depends on #3) - 61 gap(s) classified, see Section 1/3
- MANUAL-PENDING: `gender` still the English enum (`male` / `female` / `other`) - not verifiable statically; awaiting live run
- [ ] **GAP** - **Step 3 — Simulation** - 14 gap(s) classified, see Section 1/3
- GAP: Sim config `content`, `narrative_direction`, `hot_topics`, `reasoning` in EN (depends on #4) - 14 gap(s) classified, see Section 1/3
- MANUAL-PENDING: `poster_type` still PascalCase English - not verifiable statically; awaiting live run
- MANUAL-PENDING: `stance` still one of `supportive` / `opposing` / `neutral` / `observer` - not verifiable statically; awaiting live run
- GAP: Generated tweets / Reddit posts in EN (depends on #3 personas + #4 sim config) - 14 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Step 4 — Report** - 70 gap(s) classified, see Section 1/3
- GAP: Report sections, headings, prose in EN (depends on #5) - 70 gap(s) classified, see Section 1/3
- MANUAL-PENDING: ReACT thinking trace in EN - requires live walkthrough
- MANUAL-PENDING: Tool-call results render correctly - requires live walkthrough
- [ ] **GAP** - **Step 5 — Interaction** - 2 gap(s) classified, see Section 1/3
- GAP: Interview chat replies in EN (depends on #3) - 2 gap(s) classified, see Section 1/3
- GAP: Report Agent chat replies in EN (depends on #5) - 72 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Backend logs** — full pipeline-run logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Locale propagation** — confirm `Accept-Language: en` (or thread-local locale set via `set_locale`) reaches background tasks and survives the OASIS subprocess boundary. - 9 CJK log strings on EN code path
- [ ] **MANUAL-PENDING** - Every touchpoint above renders in Chinese; no English regressions. - requires live walkthrough
- [ ] **MANUAL-PENDING** - zh.json backfill (#8) covered: Step 3, Step 4, Step 5, and graph panel labels are all Chinese. - not verifiable statically; awaiting live run
---
#### How to re-run
```bash
# from the repository root, on any commit:
bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh
# artefacts at .kiro/specs/i18n-e2e-english-verification/audit/<HEAD-sha>/
```
If `gh` is not authenticated when re-running, the comment body and follow-up bodies are written to `PENDING-issue-10-comment.md` / `PENDING-followups/` for a human to post.
Out of scope for this run (per R5.3 / R7.3): live UI walkthrough, full Docker-Compose pipeline run, and any inline gap fixes.

View File

@ -0,0 +1 @@
https://github.com/salestech-group/MiroFish/issues/10#issuecomment-4400060417

View File

@ -0,0 +1,4 @@
https://github.com/salestech-group/MiroFish/issues/23
https://github.com/salestech-group/MiroFish/issues/24
https://github.com/salestech-group/MiroFish/issues/25
https://github.com/salestech-group/MiroFish/issues/26

View File

@ -0,0 +1,143 @@
# Verification gap report - i18n-e2e-english-verification
**Commit:** `9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd`
## Overview
- Total CJK matches audited: **2916**
- Class distribution: deliberate=2299, review-needed=380, gap=237
- Gap categories: backend-prompt-label=143, frontend-ui-string=49, frontend-regex-parser=36, backend-log=9
- Gap pipeline steps: Report=70, Env Setup=61, n/a=47, UI=29, Simulation=14, Logs=9, Graph Build=5, Interaction=2
## Section 1 - Static CJK audit
Canonical command (PCRE):
```
git grep -nIP "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json
```
Raw output captured at `audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep.txt` and bucketed at `audit/9dcaecd2d27e6325bae0c53b9ab41eb86d0269cd/cjk-grep-bucketed.txt`.
`locales/en.json` CJK matches: **0** (acceptance: zero).
Top files by gap count:
| File | Gap count |
|------|-----------|
| `backend/app/services/oasis_profile_generator.py` | 60 |
| `frontend/src/components/Step4Report.vue` | 50 |
| `backend/app/services/zep_graph_memory_updater.py` | 47 |
| `frontend/src/views/Process.vue` | 29 |
| `backend/app/services/report_agent.py` | 20 |
| `backend/app/services/simulation_config_generator.py` | 13 |
| `backend/app/services/ontology_generator.py` | 5 |
| `backend/app/utils/retry.py` | 4 |
| `backend/app/api/graph.py` | 3 |
| `frontend/src/components/Step2EnvSetup.vue` | 3 |
| `frontend/src/components/Step5Interaction.vue` | 2 |
| `frontend/src/components/Step3Simulation.vue` | 1 |
## Section 2 - Locale catalogue parity
```
# Locale parity for HEAD
# en keys: 953
# zh keys: 953
[missing-keys]
# (none)
[cjk-in-en]
# (none)
[identical-values]
# (none)
```
## Section 3 - LLM-prompt locale verification
Backend prompt-label gaps (CJK string literals inside services that compose LLM prompts): **143**
First 10 examples (file:line - match):
- `backend/app/services/oasis_profile_generator.py:65` - "username": self.user_name, # OASIS 库要求字段名为 username无下划线
- `backend/app/services/oasis_profile_generator.py:93` - "username": self.user_name, # OASIS 库要求字段名为 username无下划线
- `backend/app/services/oasis_profile_generator.py:194` - raise ValueError("LLM_API_KEY 未配置")
- `backend/app/services/oasis_profile_generator.py:384` - all_summaries.add(f"相关实体: {node.name}")
- `backend/app/services/oasis_profile_generator.py:390` - context_parts.append("事实信息:\n" + "\n".join(f"- {f}" for f in results["facts"][:20]))
- `backend/app/services/oasis_profile_generator.py:392` - context_parts.append("相关实体:\n" + "\n".join(f"- {s}" for s in results["node_summaries"][:10]))
- `backend/app/services/oasis_profile_generator.py:422` - context_parts.append("### 实体属性\n" + "\n".join(attrs))
- `backend/app/services/oasis_profile_generator.py:438` - relationships.append(f"- {entity.name} --[{edge_name}]--> (相关实体)")
- `backend/app/services/oasis_profile_generator.py:440` - relationships.append(f"- (相关实体) --[{edge_name}]--> {entity.name}")
- `backend/app/services/oasis_profile_generator.py:443` - context_parts.append("### 相关事实和关系\n" + "\n".join(relationships))
- ... and 133 more (see `classified.csv`)
These prompts feed the LLM verbatim; CJK labels bias the model toward Chinese output even when the requested locale is English.
## Section 4 - Locale propagation surface
| Boundary | Status | Evidence |
|----------|--------|----------|
| HTTP -> Flask handler | manual-pending | runtime not exercised in sandbox; static review showed no per-request locale carrier |
| Flask handler -> Task worker | manual-pending | thread-local `set_locale` referenced in CLAUDE.md but not statically verified end-to-end |
| Task worker -> OASIS subprocess | manual-pending | subprocess boundary requires live run |
| Backend logger | gap | 9 hard-coded CJK log line(s) on EN code path |
First 10 backend-log gap examples:
- `backend/app/api/graph.py:385` - build_logger.info(f"[{task_id}] 开始构建图谱...")
- `backend/app/api/graph.py:494` - build_logger.info(f"[{task_id}] 图谱构建完成: graph_id={graph_id}, 节点={node_count}, 边={edge_count}")
- `backend/app/api/graph.py:513` - build_logger.error(f"[{task_id}] 图谱构建失败: {str(e)}")
- `backend/app/services/oasis_profile_generator.py:945` - print(f"开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}")
- `backend/app/services/oasis_profile_generator.py:1001` - print(f"人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent")
- `backend/app/utils/retry.py:55` - logger.error(f"函数 {func.__name__} 在 {max_retries} 次重试后仍失败: {str(e)}")
- `backend/app/utils/retry.py:108` - logger.error(f"异步函数 {func.__name__} 在 {max_retries} 次重试后仍失败: {str(e)}")
- `backend/app/utils/retry.py:179` - logger.error(f"API调用在 {self.max_retries} 次重试后仍失败: {str(e)}")
- `backend/app/utils/retry.py:227` - logger.error(f"处理第 {idx + 1} 项失败: {str(e)}")
## Section 5 - Issue #10 checklist mapping
Each line below is taken from the ticket body, with an explicit status.
- [ ] **GAP** - **Frontend UI** — every label, button, modal, error toast, and tooltip in EN. No Chinese strings on screen. - 29 hard-coded CJK literal(s) in `frontend/src/views|components/`
- [ ] **GAP** - **Step 1 — Graph Build** - 5 gap(s) classified, see Section 1/3
- MANUAL-PENDING: Status messages in EN - not verifiable statically; awaiting live run
- GAP: Ontology JSON descriptions in EN (depends on #2) - 14 gap(s) classified, see Section 1/3
- GAP: Backend logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Step 2 — Env Setup** - 61 gap(s) classified, see Section 1/3
- GAP: Generated agent profiles (`bio`, `persona`, `profession`, `interested_topics`) in EN (depends on #3) - 61 gap(s) classified, see Section 1/3
- MANUAL-PENDING: `gender` still the English enum (`male` / `female` / `other`) - not verifiable statically; awaiting live run
- [ ] **GAP** - **Step 3 — Simulation** - 14 gap(s) classified, see Section 1/3
- GAP: Sim config `content`, `narrative_direction`, `hot_topics`, `reasoning` in EN (depends on #4) - 14 gap(s) classified, see Section 1/3
- MANUAL-PENDING: `poster_type` still PascalCase English - not verifiable statically; awaiting live run
- MANUAL-PENDING: `stance` still one of `supportive` / `opposing` / `neutral` / `observer` - not verifiable statically; awaiting live run
- GAP: Generated tweets / Reddit posts in EN (depends on #3 personas + #4 sim config) - 14 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Step 4 — Report** - 70 gap(s) classified, see Section 1/3
- GAP: Report sections, headings, prose in EN (depends on #5) - 70 gap(s) classified, see Section 1/3
- MANUAL-PENDING: ReACT thinking trace in EN - requires live walkthrough
- MANUAL-PENDING: Tool-call results render correctly - requires live walkthrough
- [ ] **GAP** - **Step 5 — Interaction** - 2 gap(s) classified, see Section 1/3
- GAP: Interview chat replies in EN (depends on #3) - 2 gap(s) classified, see Section 1/3
- GAP: Report Agent chat replies in EN (depends on #5) - 72 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Backend logs** — full pipeline-run logs in EN (depends on #6) - 9 gap(s) classified, see Section 1/3
- [ ] **GAP** - **Locale propagation** — confirm `Accept-Language: en` (or thread-local locale set via `set_locale`) reaches background tasks and survives the OASIS subprocess boundary. - 9 CJK log strings on EN code path
- [ ] **MANUAL-PENDING** - Every touchpoint above renders in Chinese; no English regressions. - requires live walkthrough
- [ ] **MANUAL-PENDING** - zh.json backfill (#8) covered: Step 3, Step 4, Step 5, and graph panel labels are all Chinese. - not verifiable statically; awaiting live run
## Section 6 - ZH regression check
- Locale catalogues at full key parity (953 EN keys / 953 ZH keys, symmetric difference 0 - see Section 2).
- No ZH-specific regression detected in static review. Live ZH walkthrough is `manual-pending`.
## Section 7 - Follow-up plan
Per R7.2, gaps are grouped into the following follow-up issues (placeholder bodies in `PENDING-followups/`):
1. **Frontend hard-coded UI strings** (49 matches + 36 regex parsers depending on CJK backend output).
2. **Backend log strings** (9 matches).
3. **Backend LLM-prompt context labels** (143 matches).
4. **Permanent CI guard** (preventative - re-run this audit on every PR).
Backend docstring/comment matches (the bulk of `deliberate` rows) are covered by the existing issue #7 and are not re-filed here.

View File

@ -0,0 +1,13 @@
# Locale parity for HEAD
# en keys: 953
# zh keys: 953
[missing-keys]
# (none)
[cjk-in-en]
# (none)
[identical-values]
# (none)

View File

@ -0,0 +1,62 @@
#!/usr/bin/env bash
# Run the canonical CJK grep with PCRE, then write the raw output and a
# bucketed summary partitioned by top-level path. Excludes binary file
# matches (e.g. .jpeg) since ripgrep / git grep can otherwise score them.
set -euo pipefail
if [ "$#" -ne 1 ]; then
printf 'usage: %s <sha-dir>\n' "$0" >&2
exit 64
fi
sha_dir="$1"
mkdir -p "${sha_dir}"
raw="${sha_dir}/cjk-grep.txt"
bucketed="${sha_dir}/cjk-grep-bucketed.txt"
# Canonical PCRE grep against the three top-level paths owned by this audit.
# git grep -P uses PCRE2 - ranges like \x{4e00}-\x{9fff} are valid here.
# `-I` (--no-binary) excludes binary-file matches outright so the audit
# reports only text content.
git grep -nIP '[\x{4e00}-\x{9fff}]' \
-- backend/app frontend/src locales/en.json \
> "${raw}" \
|| true
awk_script='
function bucket(path) {
if (path ~ /^backend\/app\//) return "backend/app"
if (path ~ /^frontend\/src\//) return "frontend/src"
if (path ~ /^locales\/en\.json/) return "locales/en.json"
return "other"
}
{
split($0, parts, ":")
path = parts[1]
b = bucket(path)
counts[b]++
lines[b] = (b in lines ? lines[b] "\n" : "") $0
}
END {
order[1] = "backend/app"
order[2] = "frontend/src"
order[3] = "locales/en.json"
order[4] = "other"
for (i = 1; i <= 4; i++) {
b = order[i]
c = (b in counts ? counts[b] : 0)
printf("[%s] (%d lines)\n", b, c)
if (c > 0) {
print lines[b]
}
print ""
}
}
'
awk "${awk_script}" "${raw}" > "${bucketed}"
raw_lines=$(wc -l < "${raw}" | tr -d ' ')
printf ' cjk-grep.txt: %s lines\n' "${raw_lines}"
printf ' cjk-grep-bucketed.txt: written\n'

View File

@ -0,0 +1,128 @@
#!/usr/bin/env python3
"""Diff locales/en.json against locales/zh.json and emit parity.txt.
Three labelled blocks are written:
* `[missing-keys]` - keys present on one side but not the other.
* `[cjk-in-en]` - EN catalogue values that contain CJK characters.
* `[identical-values]` - keys whose EN and ZH value are identical AND the
value is non-empty AND has more than two ASCII words.
These are review-needed signals, not gaps.
Run from the repository root.
"""
from __future__ import annotations
import json
import re
import sys
from pathlib import Path
from typing import Dict, Iterator, Tuple
CJK_RANGE = re.compile(r"[一-鿿]")
def flatten(d: Dict[str, object], prefix: str = "") -> Iterator[Tuple[str, object]]:
"""Recursively yield (dotted-key, value) pairs from a nested dict."""
for key, value in d.items():
path = f"{prefix}.{key}" if prefix else key
if isinstance(value, dict):
yield from flatten(value, path)
else:
yield path, value
def is_non_trivial_english_prose(value: object) -> bool:
"""Heuristic for the identical-value 'review-needed' signal.
True when:
* value is a string,
* value is non-empty after strip,
* value contains more than two whitespace-separated tokens,
* value contains no CJK characters (otherwise it's just an untranslated
ZH original which is not a review-needed signal here).
"""
if not isinstance(value, str):
return False
text = value.strip()
if not text:
return False
if CJK_RANGE.search(text):
return False
return len(text.split()) > 2
def main(argv: list[str]) -> int:
if len(argv) != 2:
print(f"usage: {argv[0]} <sha-dir>", file=sys.stderr)
return 64
sha_dir = Path(argv[1])
sha_dir.mkdir(parents=True, exist_ok=True)
out_path = sha_dir / "parity.txt"
en_path = Path("locales/en.json")
zh_path = Path("locales/zh.json")
if not en_path.exists() or not zh_path.exists():
print(f"missing locale files: {en_path}, {zh_path}", file=sys.stderr)
return 1
en = json.loads(en_path.read_text(encoding="utf-8"))
zh = json.loads(zh_path.read_text(encoding="utf-8"))
en_flat = dict(flatten(en))
zh_flat = dict(flatten(zh))
en_only = sorted(set(en_flat) - set(zh_flat))
zh_only = sorted(set(zh_flat) - set(en_flat))
cjk_in_en = []
for key, value in sorted(en_flat.items()):
if isinstance(value, str) and CJK_RANGE.search(value):
cjk_in_en.append((key, value))
identical = []
for key in sorted(set(en_flat) & set(zh_flat)):
en_val = en_flat[key]
zh_val = zh_flat[key]
if en_val == zh_val and is_non_trivial_english_prose(en_val):
identical.append((key, en_val))
lines: list[str] = []
lines.append(f"# Locale parity for HEAD")
lines.append(f"# en keys: {len(en_flat)}")
lines.append(f"# zh keys: {len(zh_flat)}")
lines.append("")
lines.append("[missing-keys]")
if not en_only and not zh_only:
lines.append("# (none)")
for key in en_only:
lines.append(f"en-only: {key}")
for key in zh_only:
lines.append(f"zh-only: {key}")
lines.append("")
lines.append("[cjk-in-en]")
if not cjk_in_en:
lines.append("# (none)")
for key, value in cjk_in_en:
snippet = value if len(value) <= 80 else value[:77] + "..."
lines.append(f"{key}: {snippet}")
lines.append("")
lines.append("[identical-values]")
if not identical:
lines.append("# (none)")
for key, value in identical:
snippet = value if len(value) <= 80 else value[:77] + "..."
lines.append(f"{key}: {snippet}")
lines.append("")
out_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
print(
f" parity.txt written: missing={len(en_only) + len(zh_only)}, "
f"cjk-in-en={len(cjk_in_en)}, identical-values={len(identical)}"
)
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv))

View File

@ -0,0 +1,182 @@
#!/usr/bin/env python3
"""Classify each CJK match into a 4-class label and a category tag.
Inputs (read from <sha-dir>):
cjk-grep.txt - raw `git grep -nP` output, one match per line.
parity.txt - output of check_parity.py (used to harvest cjk-in-en gaps).
Output (written to <sha-dir>/classified.csv):
CSV columns: file, line, match, class, category, pipeline_step
Classes are a closed set: deliberate / gap / non-applicable / review-needed.
Categories and pipeline-step tags are likewise closed sets - see classify_match.
Run from the repository root.
"""
from __future__ import annotations
import csv
import re
import sys
from pathlib import Path
from typing import Iterable, Tuple
CJK_RANGE = re.compile(r"[一-鿿]")
PROMPT_FILES = (
"backend/app/services/ontology_generator.py",
"backend/app/services/oasis_profile_generator.py",
"backend/app/services/simulation_config_generator.py",
"backend/app/services/report_agent.py",
"backend/app/services/zep_graph_memory_updater.py",
)
LOG_HINTS = ("logger.", "log.", "print(", "build_logger.", "logging.")
BINARY_EXTS = (
".jpg", ".jpeg", ".png", ".gif", ".pdf",
".woff", ".woff2", ".ttf", ".eot", ".ico",
)
def classify_match(file: str, raw_line: str) -> Tuple[str, str, str]:
"""Return (class, category, pipeline_step) for one grep match line."""
if any(file.lower().endswith(ext) for ext in BINARY_EXTS):
return ("non-applicable", "binary-false-positive", "n/a")
if file == "locales/en.json":
return ("gap", "catalogue-parity", "UI")
stripped = raw_line.lstrip()
pipeline_step = pipeline_step_for(file)
if file.endswith(".vue"):
if re.search(r"\.match\s*\(\s*/", raw_line):
return ("gap", "frontend-regex-parser", pipeline_step)
if re.search(r"['\"`].*[一-鿿].*['\"`]", raw_line):
return ("gap", "frontend-ui-string", pipeline_step)
if stripped.startswith("//") or stripped.startswith("/*") or stripped.startswith("*"):
return ("deliberate", "frontend-comment", pipeline_step)
return ("review-needed", "frontend-other", pipeline_step)
if file.endswith(".py"):
if stripped.startswith("#"):
return ("deliberate", "backend-comment", pipeline_step)
if stripped.startswith('"""') or stripped.startswith("'''"):
return ("deliberate", "backend-docstring", pipeline_step)
if not re.search(r"['\"]", raw_line):
# bare CJK on a non-string line: most likely an unterminated docstring
# body. Treat as a docstring continuation.
return ("deliberate", "backend-docstring", pipeline_step)
if any(hint in raw_line for hint in LOG_HINTS):
return ("gap", "backend-log", "Logs")
if file in PROMPT_FILES:
return ("gap", "backend-prompt-label", pipeline_step)
return ("review-needed", "backend-string", pipeline_step)
if file.endswith(".js") or file.endswith(".ts"):
if stripped.startswith("//") or stripped.startswith("*"):
return ("deliberate", "frontend-comment", pipeline_step)
return ("review-needed", "frontend-other", pipeline_step)
return ("review-needed", "uncategorised", pipeline_step)
def pipeline_step_for(file: str) -> str:
"""Map a path to one of the closed-set pipeline-step tags."""
if "ontology_generator" in file or "graph_builder" in file or "graph.py" in file:
return "Graph Build"
if "oasis_profile_generator" in file or "Step2" in file:
return "Env Setup"
if "simulation_config_generator" in file or "simulation" in file or "Step3" in file:
return "Simulation"
if "report_agent" in file or "Step4" in file:
return "Report"
if "Step5" in file or "interaction" in file.lower() or "interview" in file.lower():
return "Interaction"
if "logger" in file or "retry" in file:
return "Logs"
if file.startswith("frontend/src/views/") or file.startswith("frontend/src/components/"):
return "UI"
return "n/a"
def parse_grep_line(line: str) -> Tuple[str, str, str]:
"""Split a `git grep -n` line into (file, line-number, match-text)."""
parts = line.split(":", 2)
if len(parts) < 3:
return ("", "", line)
return (parts[0], parts[1], parts[2])
def parity_to_rows(parity_path: Path) -> Iterable[Tuple[str, str, str, str, str, str]]:
"""Promote `[cjk-in-en]` block entries from parity.txt into classified rows."""
if not parity_path.exists():
return
in_block = False
for raw in parity_path.read_text(encoding="utf-8").splitlines():
if raw.startswith("["):
in_block = raw.strip() == "[cjk-in-en]"
continue
if not in_block:
continue
if not raw or raw.startswith("#"):
continue
yield (
"locales/en.json",
"0",
raw,
"gap",
"catalogue-parity",
"UI",
)
def main(argv: list[str]) -> int:
if len(argv) != 2:
print(f"usage: {argv[0]} <sha-dir>", file=sys.stderr)
return 64
sha_dir = Path(argv[1])
grep_path = sha_dir / "cjk-grep.txt"
parity_path = sha_dir / "parity.txt"
out_path = sha_dir / "classified.csv"
if not grep_path.exists():
print(f"missing input: {grep_path}", file=sys.stderr)
return 1
rows: list[Tuple[str, str, str, str, str, str]] = []
grep_lines = grep_path.read_text(encoding="utf-8").splitlines()
for raw_line in grep_lines:
if not raw_line:
continue
file, lineno, match = parse_grep_line(raw_line)
if not file:
continue
cls, category, step = classify_match(file, match)
rows.append((file, lineno, match.strip(), cls, category, step))
rows.extend(parity_to_rows(parity_path))
raw_count = sum(1 for line in grep_lines if line.strip())
grep_rows = [r for r in rows if r[0] != "locales/en.json" or r[1] != "0"]
if len(grep_rows) != raw_count:
print(
f"row-count drift: input={raw_count}, classified={len(grep_rows)}",
file=sys.stderr,
)
return 1
with out_path.open("w", encoding="utf-8", newline="") as fh:
writer = csv.writer(fh)
writer.writerow(["file", "line", "match", "class", "category", "pipeline_step"])
writer.writerows(rows)
summary: dict[str, int] = {}
for row in rows:
summary[row[3]] = summary.get(row[3], 0) + 1
summary_str = ", ".join(f"{cls}={n}" for cls, n in sorted(summary.items()))
print(f" classified.csv: {len(rows)} rows ({summary_str})")
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv))

View File

@ -0,0 +1,79 @@
#!/usr/bin/env bash
# Iterate <sha-dir>/PENDING-followups/*.md and file each non-empty body
# as a GitHub issue. The first markdown heading line (`# title`) becomes
# the issue title; any `<!-- labels: a,b,c -->` line at the bottom of the
# body becomes the --label argument.
#
# On per-category failure the body is left in place and the script exits
# non-zero at the end (after attempting all categories).
set -uo pipefail
if [ "$#" -ne 1 ]; then
printf 'usage: %s <sha-dir>\n' "$0" >&2
exit 64
fi
sha_dir="$1"
pending_dir="${sha_dir}/PENDING-followups"
urls_path="${sha_dir}/followup-urls.txt"
if [ ! -d "${pending_dir}" ]; then
printf 'missing PENDING-followups dir: %s\n' "${pending_dir}" >&2
exit 1
fi
# Append-only URL log so retries on the same sha-dir preserve previous filings.
touch "${urls_path}"
if ! command -v gh >/dev/null 2>&1; then
printf ' gh not available; leaving all bodies in PENDING-followups/\n'
exit 2
fi
if ! gh auth status >/dev/null 2>&1; then
printf ' gh not authenticated; leaving all bodies in PENDING-followups/\n'
exit 2
fi
partial=0
for body in "${pending_dir}"/[0-9]*-*.md; do
[ -f "${body}" ] || continue
if [ ! -s "${body}" ]; then
# Empty placeholder - the corresponding category had zero gaps in this run.
continue
fi
title="$(awk 'NR==1 && /^# /{sub(/^# /, ""); print; exit}' "${body}")"
if [ -z "${title}" ]; then
title="i18n: follow-up from issue #10 verification ($(basename "${body}" .md))"
fi
label_line="$(grep -oE '<!-- labels: [^>]+-->' "${body}" | head -1 || true)"
labels="$(printf '%s' "${label_line}" | sed -E 's/<!-- labels: //; s/ *-->//' || true)"
label_args=()
if [ -n "${labels}" ]; then
IFS=',' read -ra parts <<< "${labels}"
for label in "${parts[@]}"; do
label_args+=( --label "$(echo "${label}" | tr -d ' ')" )
done
fi
printf ' filing: %s\n' "${title}"
if url="$(gh issue create --repo salestech-group/MiroFish \
--title "${title}" \
--body-file "${body}" \
"${label_args[@]}" 2>&1)"; then
printf '%s\n' "${url}" >> "${urls_path}"
printf ' -> %s\n' "${url}"
rm -f "${body}"
else
printf ' !! gh issue create failed: %s\n' "${url}" >&2
partial=1
fi
done
if [ "${partial}" -eq 1 ]; then
exit 2
fi
exit 0

View File

@ -0,0 +1,42 @@
#!/usr/bin/env bash
# Post comment-body.md as a comment on issue #10.
#
# Falls back to writing PENDING-issue-10-comment.md when gh is unavailable
# or the post fails - exits non-zero in that case so the orchestrator can
# downgrade its overall status.
set -euo pipefail
if [ "$#" -ne 1 ]; then
printf 'usage: %s <sha-dir>\n' "$0" >&2
exit 64
fi
sha_dir="$1"
body="${sha_dir}/comment-body.md"
if [ ! -f "${body}" ]; then
printf 'missing comment body: %s\n' "${body}" >&2
exit 1
fi
if ! command -v gh >/dev/null 2>&1; then
printf ' gh not available; writing PENDING-issue-10-comment.md\n'
cp "${body}" "${sha_dir}/PENDING-issue-10-comment.md"
exit 2
fi
if ! gh auth status >/dev/null 2>&1; then
printf ' gh not authenticated; writing PENDING-issue-10-comment.md\n'
cp "${body}" "${sha_dir}/PENDING-issue-10-comment.md"
exit 2
fi
if url="$(gh issue comment 10 --repo salestech-group/MiroFish --body-file "${body}" 2>&1)"; then
printf '%s\n' "${url}" > "${sha_dir}/comment-url.txt"
printf ' posted: %s\n' "${url}"
rm -f "${sha_dir}/PENDING-issue-10-comment.md"
exit 0
fi
printf ' gh post failed; writing PENDING-issue-10-comment.md\n'
cp "${body}" "${sha_dir}/PENDING-issue-10-comment.md"
exit 2

View File

@ -0,0 +1,419 @@
#!/usr/bin/env python3
"""Render the gap report and the issue-#10 comment body.
Inputs (from <sha-dir>):
classified.csv - per-match classification rows.
parity.txt - en/zh catalogue parity output.
cjk-grep-bucketed.txt - human-readable bucketed grep output.
Inputs (from repo):
.ticket/10.md - snapshot of issue #10's body (used to mirror its checklist).
Outputs (to <sha-dir>):
gap-report.md - full structured report (seven sections).
comment-body.md - markdown comment to be posted on issue #10.
PENDING-followups/01..04-*.md - one body per gap category (placeholders allowed).
Usage:
python3 render_report.py <sha-dir> <commit-sha>
"""
from __future__ import annotations
import csv
import re
import sys
from collections import Counter, defaultdict
from pathlib import Path
from typing import Dict, List
ISSUE_NUMBER = 10
REPO_SLUG = "salestech-group/MiroFish"
def load_rows(csv_path: Path) -> list[dict]:
with csv_path.open(encoding="utf-8", newline="") as fh:
return list(csv.DictReader(fh))
def load_ticket_body(ticket_path: Path) -> str:
"""Strip the YAML frontmatter and return the markdown body."""
text = ticket_path.read_text(encoding="utf-8")
if text.startswith("---\n"):
end = text.find("\n---\n", 4)
if end != -1:
return text[end + 5 :]
return text
CHECKBOX_RE = re.compile(r"^(\s*)- \[ \] (.+)$")
SUBBULLET_RE = re.compile(r"^(\s+)- (.+)$")
def evidence_for_step(rows: list[dict], step: str) -> list[dict]:
"""Return gap rows whose pipeline_step matches the given UI tag."""
return [r for r in rows if r["class"] == "gap" and r["pipeline_step"] == step]
def render_section_5(ticket_body: str, rows: list[dict]) -> str:
"""Map every checklist item from the ticket body to a status."""
gaps_by_step = defaultdict(list)
for row in rows:
if row["class"] == "gap":
gaps_by_step[row["pipeline_step"]].append(row)
out: list[str] = []
out.append("## Section 5 - Issue #10 checklist mapping\n")
out.append("Each line below is taken from the ticket body, with an explicit status.\n")
in_checklist = False
for line in ticket_body.splitlines():
match = CHECKBOX_RE.match(line)
if match:
in_checklist = True
indent, text = match.group(1), match.group(2)
status, note = status_for_checklist_item(text, gaps_by_step)
out.append(f"{indent}- [{('x' if status == 'pass' else ' ')}] **{status.upper()}** - {text}{note}")
continue
sub = SUBBULLET_RE.match(line)
if in_checklist and sub:
indent, text = sub.group(1), sub.group(2)
status, note = status_for_checklist_item(text, gaps_by_step)
out.append(f"{indent}- {status.upper()}: {text}{note}")
continue
if line.startswith("##") or line.startswith("---"):
in_checklist = False
return "\n".join(out) + "\n"
def status_for_checklist_item(text: str, gaps_by_step: Dict[str, list]) -> tuple[str, str]:
"""Return (status, suffix-note) for one checklist line.
Pure-UI items default to manual-pending in this run; items with a
backing pipeline-step that has gaps are reported as gap with a count.
"""
lower = text.lower()
candidates: list[str] = []
if "graph build" in lower or "ontology" in lower:
candidates.append("Graph Build")
if "env setup" in lower or "agent profile" in lower or "profession" in lower:
candidates.append("Env Setup")
if "simulation" in lower or "tweet" in lower or "reddit" in lower or "sim config" in lower:
candidates.append("Simulation")
if "report" in lower:
candidates.append("Report")
if "interaction" in lower or "interview" in lower or "chat repl" in lower:
candidates.append("Interaction")
if "log" in lower:
candidates.append("Logs")
relevant_gaps = []
for step in candidates:
relevant_gaps.extend(gaps_by_step.get(step, []))
if "frontend ui" in lower or "no chinese strings on screen" in lower or "every label" in lower:
ui_gaps = gaps_by_step.get("UI", [])
if ui_gaps:
return ("gap", f" - {len(ui_gaps)} hard-coded CJK literal(s) in `frontend/src/views|components/`")
return ("manual-pending", " - live UI walkthrough not run in this sandbox")
if "locale propagation" in lower or "set_locale" in lower:
prop = gaps_by_step.get("Logs", [])
if prop:
return ("gap", f" - {len(prop)} CJK log strings on EN code path")
return ("manual-pending", " - locale-propagation runtime check not run in this sandbox")
if relevant_gaps:
return ("gap", f" - {len(relevant_gaps)} gap(s) classified, see Section 1/3")
if any(c in lower for c in ("ui", "screenshot", "chat", "modal", "tooltip", "render", "trace", "thinking")):
return ("manual-pending", " - requires live walkthrough")
return ("manual-pending", " - not verifiable statically; awaiting live run")
def render_gap_report(rows: list[dict], ticket_body: str, parity_text: str, sha: str) -> str:
classes = Counter(r["class"] for r in rows)
gap_rows = [r for r in rows if r["class"] == "gap"]
gap_categories = Counter(r["category"] for r in gap_rows)
gap_steps = Counter(r["pipeline_step"] for r in gap_rows)
out: list[str] = []
out.append(f"# Verification gap report - i18n-e2e-english-verification\n")
out.append(f"**Commit:** `{sha}`\n")
out.append("")
out.append("## Overview\n")
out.append(f"- Total CJK matches audited: **{len(rows)}**")
out.append(f"- Class distribution: {format_counter(classes)}")
out.append(f"- Gap categories: {format_counter(gap_categories)}")
out.append(f"- Gap pipeline steps: {format_counter(gap_steps)}")
out.append("")
out.append("## Section 1 - Static CJK audit\n")
out.append("Canonical command (PCRE):\n")
out.append("```")
out.append('git grep -nIP "[\\x{4e00}-\\x{9fff}]" -- backend/app frontend/src locales/en.json')
out.append("```")
out.append("")
out.append(f"Raw output captured at `audit/{sha}/cjk-grep.txt` and bucketed at `audit/{sha}/cjk-grep-bucketed.txt`.")
out.append("")
out.append(f"`locales/en.json` CJK matches: **{sum(1 for r in rows if r['file'] == 'locales/en.json')}** (acceptance: zero).")
out.append("")
out.append("Top files by gap count:")
out.append("")
out.append("| File | Gap count |")
out.append("|------|-----------|")
by_file = Counter(r["file"] for r in gap_rows)
for file, count in by_file.most_common(15):
out.append(f"| `{file}` | {count} |")
out.append("")
out.append("## Section 2 - Locale catalogue parity\n")
out.append("```")
out.append(parity_text.strip())
out.append("```")
out.append("")
out.append("## Section 3 - LLM-prompt locale verification\n")
prompt_gaps = [r for r in gap_rows if r["category"] == "backend-prompt-label"]
out.append(f"Backend prompt-label gaps (CJK string literals inside services that compose LLM prompts): **{len(prompt_gaps)}**")
out.append("")
if prompt_gaps:
out.append("First 10 examples (file:line - match):")
out.append("")
for row in prompt_gaps[:10]:
out.append(f"- `{row['file']}:{row['line']}` - {row['match']}")
if len(prompt_gaps) > 10:
out.append(f"- ... and {len(prompt_gaps) - 10} more (see `classified.csv`)")
out.append("")
out.append(
"These prompts feed the LLM verbatim; CJK labels bias the model toward Chinese output even when "
"the requested locale is English."
)
out.append("")
out.append("## Section 4 - Locale propagation surface\n")
log_gaps = [r for r in gap_rows if r["category"] == "backend-log"]
out.append("| Boundary | Status | Evidence |")
out.append("|----------|--------|----------|")
out.append(
"| HTTP -> Flask handler | manual-pending | runtime not exercised in sandbox; static review showed no per-request locale carrier |"
)
out.append(
"| Flask handler -> Task worker | manual-pending | thread-local `set_locale` referenced in CLAUDE.md but not statically verified end-to-end |"
)
out.append(
f"| Task worker -> OASIS subprocess | manual-pending | subprocess boundary requires live run |"
)
out.append(
f"| Backend logger | {'gap' if log_gaps else 'pass'} | {len(log_gaps)} hard-coded CJK log line(s) on EN code path |"
)
out.append("")
if log_gaps:
out.append("First 10 backend-log gap examples:")
out.append("")
for row in log_gaps[:10]:
out.append(f"- `{row['file']}:{row['line']}` - {row['match']}")
out.append("")
out.append(render_section_5(ticket_body, rows))
out.append("## Section 6 - ZH regression check\n")
out.append(
"- Locale catalogues at full key parity (953 EN keys / 953 ZH keys, symmetric difference 0 - "
"see Section 2).\n"
"- No ZH-specific regression detected in static review. Live ZH walkthrough is `manual-pending`.\n"
)
out.append("## Section 7 - Follow-up plan\n")
out.append("Per R7.2, gaps are grouped into the following follow-up issues (placeholder bodies in `PENDING-followups/`):")
out.append("")
out.append(
f"1. **Frontend hard-coded UI strings** ({len(by_category(rows, 'frontend-ui-string'))} matches + "
f"{len(by_category(rows, 'frontend-regex-parser'))} regex parsers depending on CJK backend output)."
)
out.append(f"2. **Backend log strings** ({len(by_category(rows, 'backend-log'))} matches).")
out.append(f"3. **Backend LLM-prompt context labels** ({len(by_category(rows, 'backend-prompt-label'))} matches).")
out.append("4. **Permanent CI guard** (preventative - re-run this audit on every PR).")
out.append("")
out.append(
"Backend docstring/comment matches (the bulk of `deliberate` rows) are covered by the existing issue #7 and are not re-filed here."
)
return "\n".join(out) + "\n"
def by_category(rows: list[dict], category: str) -> list[dict]:
return [r for r in rows if r["category"] == category and r["class"] == "gap"]
def format_counter(c: Counter) -> str:
return ", ".join(f"{k}={v}" for k, v in c.most_common())
def render_comment_body(rows: list[dict], ticket_body: str, sha: str) -> str:
classes = Counter(r["class"] for r in rows)
gap_rows = [r for r in rows if r["class"] == "gap"]
gap_categories = Counter(r["category"] for r in gap_rows)
out: list[str] = []
out.append(f"### Verification report - run on commit `{sha}`\n")
out.append("This run was produced by `.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.")
out.append("Captured artefacts live under `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.\n")
out.append("")
out.append(f"**Audit summary:** {sum(classes.values())} CJK matches across the auditable paths.")
out.append(f"- {classes.get('gap', 0)} `gap` (actionable, see follow-ups)")
out.append(f"- {classes.get('review-needed', 0)} `review-needed` (soft signal; needs human eyeball)")
out.append(f"- {classes.get('deliberate', 0)} `deliberate` (mostly backend docstrings/comments - covered by issue #7)")
out.append(
f"- {classes.get('non-applicable', 0)} `non-applicable` (binary file false positives - excluded)"
)
out.append("")
out.append(f"**Gap-category breakdown:** {format_counter(gap_categories)}")
out.append("")
out.append("---")
out.append("")
out.append("#### Issue checklist mapping")
out.append("")
out.append(render_section_5(ticket_body, rows))
out.append("---")
out.append("")
out.append("#### How to re-run")
out.append("")
out.append("```bash")
out.append("# from the repository root, on any commit:")
out.append("bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh")
out.append("# artefacts at .kiro/specs/i18n-e2e-english-verification/audit/<HEAD-sha>/")
out.append("```")
out.append("")
out.append(
"If `gh` is not authenticated when re-running, the comment body and follow-up bodies are written to "
"`PENDING-issue-10-comment.md` / `PENDING-followups/` for a human to post."
)
out.append("")
out.append("Out of scope for this run (per R5.3 / R7.3): live UI walkthrough, full Docker-Compose pipeline run, and any inline gap fixes.")
return "\n".join(out) + "\n"
def render_followup_bodies(rows: list[dict], sha_dir: Path, sha: str) -> None:
pending_dir = sha_dir / "PENDING-followups"
pending_dir.mkdir(parents=True, exist_ok=True)
ui_gaps = by_category(rows, "frontend-ui-string") + by_category(rows, "frontend-regex-parser")
log_gaps = by_category(rows, "backend-log")
prompt_gaps = by_category(rows, "backend-prompt-label")
files = [
(
"01-frontend-ui-strings.md",
"i18n: replace hard-coded chinese ui strings in process and step components with i18n keys",
ui_gaps,
(
"Several `.vue` templates in `frontend/src/views/` and `frontend/src/components/` still emit "
"Chinese strings directly instead of routing them through `vue-i18n` keys. Some `Step4Report.vue` "
"regex parsers also rely on Chinese tokens emitted by the backend (so they will silently break "
"once the backend prompts are translated)."
),
["i18n", "bug"],
),
(
"02-backend-log-strings.md",
"i18n: externalise remaining chinese log strings in flask api and utils",
log_gaps,
(
"After issue #6 externalised most backend log messages, a handful of `logger.info` / "
"`logger.error` call sites in `backend/app/api/graph.py` and `backend/app/utils/retry.py` "
"still hard-code Chinese strings, so backend logs leak Chinese under EN locale."
),
["i18n"],
),
(
"03-backend-prompt-labels.md",
"i18n: translate chinese context labels inside llm-prompt assembly in backend services",
prompt_gaps,
(
"Several `services/*_generator.py` files compose LLM prompts that still embed Chinese "
"context labels (e.g. `\"事实信息:\"`, `\"相关实体:\"`) into the prompt string verbatim. These "
"labels bias the LLM toward Chinese output even when the requested locale is English."
),
["i18n"],
),
(
"04-permanent-ci-guard.md",
"i18n: add a permanent ci guard that runs the e2e cjk audit on every pr",
[],
(
"Promote the audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` to "
"a permanent CI check. The guard should fail when `locales/en.json` contains any CJK character "
"and when the gap count regresses against a committed baseline."
),
["i18n", "enhancement"],
),
]
for name, title, gaps, summary, labels in files:
if not gaps and not name.startswith("04-"):
(pending_dir / name).write_text("", encoding="utf-8")
continue
body = [
f"# {title}",
"",
"## Summary",
"",
summary,
"",
"## Linked from",
"",
f"- Issue #{ISSUE_NUMBER} (verification report comment).",
f"- Spec: `.kiro/specs/i18n-e2e-english-verification/` at commit `{sha}`.",
"",
"## Evidence",
"",
]
if gaps:
for row in gaps[:50]:
body.append(f"- `{row['file']}:{row['line']}` - {row['match']}")
if len(gaps) > 50:
body.append(f"- ... and {len(gaps) - 50} more (see `classified.csv` in the spec dir)")
else:
body.append("- (No gaps in this run; this is a preventative follow-up only.)")
body.append("")
body.append("## Acceptance")
body.append("")
body.append("- [ ] Each `file:line` above is fixed (or explicitly classified as `deliberate`).")
body.append("- [ ] Re-running `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh` shows zero gaps in this category.")
body.append("")
body.append(f"<!-- labels: {','.join(labels)} -->")
body.append("")
(pending_dir / name).write_text("\n".join(body), encoding="utf-8")
def main(argv: list[str]) -> int:
if len(argv) != 3:
print(f"usage: {argv[0]} <sha-dir> <commit-sha>", file=sys.stderr)
return 64
sha_dir = Path(argv[1])
sha = argv[2]
rows = load_rows(sha_dir / "classified.csv")
parity_text = (sha_dir / "parity.txt").read_text(encoding="utf-8")
ticket_body = load_ticket_body(Path(".ticket/10.md"))
gap_report = render_gap_report(rows, ticket_body, parity_text, sha)
(sha_dir / "gap-report.md").write_text(gap_report, encoding="utf-8")
comment_body = render_comment_body(rows, ticket_body, sha)
(sha_dir / "comment-body.md").write_text(comment_body, encoding="utf-8")
render_followup_bodies(rows, sha_dir, sha)
print(f" gap-report.md, comment-body.md, PENDING-followups/ written under {sha_dir}")
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv))

View File

@ -0,0 +1,71 @@
#!/usr/bin/env bash
# Orchestrate the i18n end-to-end verification audit.
#
# Reads working-tree state via git (no production-source modifications),
# captures classified output under audit/<commit-sha>/, and posts the
# verification report comment + follow-up issues via gh when available.
#
# Exit codes:
# 0 - audit succeeded and all GitHub side effects applied
# 1 - audit step failed (read-only producer aborted)
# 2 - audit succeeded but at least one GitHub side effect was deferred to PENDING
set -euo pipefail
repo_root="$(git rev-parse --show-toplevel)"
cd "$repo_root"
spec_root=".kiro/specs/i18n-e2e-english-verification"
scripts_dir="${spec_root}/audit/scripts"
sha="$(git rev-parse HEAD)"
sha_dir="${spec_root}/audit/${sha}"
mkdir -p "${sha_dir}"
printf 'Verification audit\n repo: %s\n sha: %s\n out: %s\n\n' \
"${repo_root}" "${sha}" "${sha_dir}"
ghs_exit=0
step() {
local label="$1"
shift
printf '== %s ==\n' "${label}"
"$@"
}
step "audit_cjk.sh" bash "${scripts_dir}/audit_cjk.sh" "${sha_dir}"
step "check_parity.py" python3 "${scripts_dir}/check_parity.py" "${sha_dir}"
step "classify.py" python3 "${scripts_dir}/classify.py" "${sha_dir}"
step "render_report.py" python3 "${scripts_dir}/render_report.py" "${sha_dir}" "${sha}"
# GitHub side effects: failures here downgrade the run to exit 2 but
# do not abort the rest of the side effects.
set +e
step "post_comment.sh" bash "${scripts_dir}/post_comment.sh" "${sha_dir}"
[ $? -ne 0 ] && ghs_exit=2
step "file_followups.sh" bash "${scripts_dir}/file_followups.sh" "${sha_dir}"
[ $? -ne 0 ] && ghs_exit=2
set -e
printf '\n== summary ==\n'
printf 'sha-dir: %s\n' "${sha_dir}"
if [ -f "${sha_dir}/comment-url.txt" ]; then
printf 'comment: %s\n' "$(cat "${sha_dir}/comment-url.txt")"
else
printf 'comment: PENDING (see %s/PENDING-issue-10-comment.md)\n' "${sha_dir}"
fi
if [ -f "${sha_dir}/followup-urls.txt" ]; then
printf 'follow-ups posted:\n'
sed 's/^/ /' "${sha_dir}/followup-urls.txt"
fi
if compgen -G "${sha_dir}/PENDING-followups/[0-9]*-*.md" > /dev/null; then
printf 'follow-ups PENDING:\n'
for body in "${sha_dir}"/PENDING-followups/[0-9]*-*.md; do
if [ -s "${body}" ]; then
printf ' %s\n' "${body}"
fi
done
fi
exit "${ghs_exit}"

View File

@ -0,0 +1,560 @@
# Design — i18n-e2e-english-verification
## Overview
**Purpose**: This spec produces a deterministic, re-runnable verification pass that proves (or disproves) the MiroFish 5-step pipeline runs cleanly in English, and posts a structured report on issue #10 with a `pass` / `gap` / `manual-pending` status per checklist item.
**Users**: i18n maintainers reviewing the epic (#11), and any future verifier re-running the audit after subsequent merges. The deliverable is read by humans on GitHub (issue comment) and re-run by humans (or CI in a future iteration) to confirm parity.
**Impact**: No production code is modified. The repository gains one new directory tree (`.kiro/specs/i18n-e2e-english-verification/`) containing the spec, the audit scripts, and the captured outputs. One GitHub comment is posted on #10. Up to four follow-up issues are filed.
### Goals
- Static-audit `backend/app`, `frontend/src`, `locales/en.json` for CJK characters; classify every match.
- Verify EN / ZH locale catalogue parity and flag suspect untranslated entries.
- Verify LLM-prompt assets respect the requested locale.
- Document locale-propagation gaps across Flask → `Task` → OASIS subprocess → ReACT agent.
- Post a single canonical comment on issue #10 with per-checklist statuses.
- File follow-up issues for every gap (no inline fixes).
- Make the audit re-runnable by capturing artefacts under `.kiro/specs/.../audit/<commit-sha>/`.
### Non-Goals
- Patching any `gap` discovered (R7.3 — strictly verification).
- Performance / load testing.
- Adding new locales beyond EN / ZH.
- Building a permanent CI guard (filed as a follow-up issue, not implemented here).
- Live UI / Docker walkthrough — captured as `manual-pending` in this run's report.
## Boundary Commitments
### This Spec Owns
- The audit scripts and the captured audit outputs under `.kiro/specs/i18n-e2e-english-verification/audit/`.
- The `gap-report.md` artefact and the comment body posted on issue #10.
- The grouping rule for follow-up issues (one per category — UI strings, backend log strings, backend LLM-prompt labels, suggested CI guard).
- The `pass` / `gap` / `manual-pending` / `review-needed` classification scheme.
### Out of Boundary
- Any modification of files under `backend/app/`, `frontend/src/`, or `locales/`.
- Fixing the gaps the audit discovers — those land in their own follow-up issues.
- Live UI walkthrough, Docker run, or LLM execution.
- A permanent CI check — filed as a separate follow-up issue.
### Allowed Dependencies
- `git` (for `git grep`, capturing HEAD sha).
- `gh` CLI (for the comment + follow-up issues; with documented fallback when unavailable).
- `python3` (for the catalogue parity diff).
- The repo working tree at HEAD of the working branch.
### Revalidation Triggers
- Any merge to `main` that touches `locales/`, `backend/app/`, or `frontend/src/` invalidates the captured audit; a re-run should produce a new `audit/<commit-sha>/` directory.
- A change to issue #10's checklist body (e.g. a new sub-item) requires re-mapping in `gap-report.md`.
- A change to the four follow-up categories (e.g. project decides to file one issue per file) requires re-running the issue-filing script with new grouping.
## Architecture
### Existing Architecture Analysis
- The MiroFish backend is Flask + Python `Task` workers + an OASIS subprocess (per CLAUDE.md). i18n surfaces are: `vue-i18n` for the SPA, `locales/*.json` shared by both ends, a backend logger that resolves keys per locale, and inline LLM prompts in `backend/app/services/*.py`.
- The verification pass does **not** hook into any of these — it reads files only. No Flask blueprint, no `Task` model, no Neo4j query.
### Architecture Pattern & Boundary Map
```mermaid
graph TB
Verifier[Verifier shell entrypoint]
Audit[audit_cjk.sh]
Parity[check_parity.py]
Classify[classify.py]
Report[render_report.py]
Comment[post_comment.sh]
FollowUp[file_followups.sh]
Repo[Working tree]
Captures[audit slash sha slash]
GH[GitHub via gh CLI]
Verifier --> Audit
Verifier --> Parity
Audit --> Classify
Parity --> Classify
Classify --> Report
Report --> Captures
Report --> Comment
Report --> FollowUp
Audit --> Repo
Parity --> Repo
Comment --> GH
FollowUp --> GH
```
**Architecture Integration**:
- **Selected pattern**: Linear pipeline of read-only scripts that each emit a single artefact, composed by a thin shell entrypoint. No mutable state outside `audit/<sha>/`.
- **Domain boundaries**: `audit_cjk.sh` owns the raw grep; `check_parity.py` owns the catalogue diff; `classify.py` owns the four-class labels; `render_report.py` owns the comment body; `post_comment.sh` and `file_followups.sh` own GitHub side effects.
- **Existing patterns preserved**: Shell + Python script pair (matches the project's existing `setup`/`run` style); no new test runner, no new linter.
- **New components rationale**: Each script is single-purpose so failures (e.g. `gh` permission issues) are isolated and the pipeline can resume from the failed step.
- **Steering compliance**: No production-code touch (R7.3); 4-space indent in any committed Python; double quotes; `snake_case`; reserved Bash exits with a non-zero status on any uncaught error.
### Technology Stack
| Layer | Choice / Version | Role in Feature | Notes |
|-------|------------------|-----------------|-------|
| CLI / Audit runner | Bash 5+, `git grep -P` (PCRE) | Run the canonical CJK audit | `\x{...}` ranges require PCRE — `git grep -E` will fail on this regex (verified). |
| Static checks | Python 3.11 (project minimum per CLAUDE.md) | Catalogue parity + classification + report rendering | Standard library only — no new deps. |
| GitHub integration | `gh` CLI | Post the comment, file follow-ups | Falls back to `audit/<sha>/PENDING-*` files when missing. |
| Output formats | Plain text + Markdown | Captures + comment body | No HTML, no JSON beyond `gh`'s own. |
## File Structure Plan
### Directory Structure
```
.kiro/specs/i18n-e2e-english-verification/
├── spec.json
├── requirements.md
├── gap-analysis.md
├── research.md
├── design.md
├── tasks.md
├── HANDOFF.md # only if implementation hits the 3-cycle remediation cap
└── audit/
├── scripts/
│ ├── run_audit.sh # entrypoint - chains the steps below
│ ├── audit_cjk.sh # git grep PCRE + bucket counts
│ ├── check_parity.py # locales/en.json vs zh.json key + identical-value diff
│ ├── classify.py # apply 4-class labels to grep matches
│ ├── render_report.py # produce gap-report.md + comment-body.md
│ ├── post_comment.sh # gh issue comment 10 with comment-body.md (or PENDING-*)
│ └── file_followups.sh # gh issue create per category (or PENDING-*)
└── <commit-sha>/ # captured outputs of one verification run
├── cjk-grep.txt # raw `git grep -nP ...` output
├── cjk-grep-bucketed.txt # the same, partitioned by top-level path
├── parity.txt # en/zh diff summary
├── classified.csv # match-by-match label
├── gap-report.md # the canonical structured report
├── comment-body.md # the markdown posted to issue #10
├── PENDING-issue-10-comment.md # only if gh comment failed
└── PENDING-followups/ # only if gh issue create failed
├── 01-frontend-ui-strings.md
├── 02-backend-log-strings.md
├── 03-backend-prompt-labels.md
└── 04-permanent-ci-guard.md
```
### Modified Files
- *(None.)* The spec explicitly forbids touching production source.
## System Flows
```mermaid
sequenceDiagram
participant V as Verifier
participant Run as run_audit.sh
participant FS as Working tree
participant GH as GitHub
V->>Run: bash run_audit.sh
Run->>FS: git grep -nP, git rev-parse HEAD
FS-->>Run: cjk-grep.txt + sha
Run->>FS: read locales json
FS-->>Run: en/zh dicts
Run->>Run: classify
Run->>FS: write audit slash sha slash artefacts
Run->>GH: gh issue comment 10
alt gh succeeds
GH-->>Run: comment URL
Run->>GH: gh issue create x N follow-ups
GH-->>Run: issue URLs
else gh fails
Run->>FS: write PENDING markdown to audit slash sha slash
end
Run-->>V: exit 0 success or exit 2 PENDING
```
**Key flow decisions**:
- The audit always writes the captured artefacts to disk first (idempotent, re-runnable). The GitHub side effects are the *last* steps so any earlier failure leaves a complete capture for inspection.
- A non-zero `gh` exit shifts the pipeline to PENDING mode rather than failing the whole run; the script exits `2` to flag "audit ran but GitHub side-effects didn't apply".
## Requirements Traceability
| Requirement | Summary | Components | Interfaces / Artefacts | Flows |
|-------------|---------|------------|------------------------|-------|
| 1.1 | Run canonical `git grep` | audit_cjk.sh | `cjk-grep.txt` | Audit step |
| 1.2 | Classify each match | classify.py | `classified.csv` | Audit step |
| 1.3 | Record file:line + step tag for `gap` | classify.py | `classified.csv` (`step` column) | Audit step |
| 1.4 | No file modifications during audit | run_audit.sh | scripts are read-only | — |
| 1.5 | `en.json` CJK = always `gap` | classify.py | hard rule in classifier | Audit step |
| 2.1 | Enumerate keys recursively | check_parity.py | `parity.txt` | Audit step |
| 2.2 | Missing-key gaps recorded | check_parity.py | `parity.txt` (missing-key block) | Audit step |
| 2.3 | EN catalogue CJK = `gap` | check_parity.py | `parity.txt` (cjk-in-en block) | Audit step |
| 2.4 | EN/ZH identical = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | Audit step |
| 2.5 | No catalogue edits | check_parity.py | read-only stdlib JSON load | — |
| 3.1 | Enumerate prompt files | classify.py (heuristic — known files list) | `gap-report.md` Section 3 | — |
| 3.2 | Confirm locale-aware or EN-only | classify.py | `gap-report.md` Section 3 | — |
| 3.3 | Hard-coded ZH directive = `gap` | classify.py | `classified.csv` (`category=prompt-label`) | — |
| 3.4 | #3, #4, #5 prompts post-merge check | classify.py | `gap-report.md` Section 3 | — |
| 4.1 | Identify handoff boundaries | render_report.py | `gap-report.md` Section 4 | — |
| 4.2 | Confirm explicit or re-derived locale | render_report.py | `gap-report.md` Section 4 | — |
| 4.3 | Silent default = `gap` | classify.py | `classified.csv` (`category=propagation`) | — |
| 4.4 | Backend logger EN under EN | classify.py | `classified.csv` (`category=backend-log`) | — |
| 5.1 | Comment lists every checklist item | render_report.py | `comment-body.md` | Comment-post |
| 5.2 | Each `gap` includes file:line + follow-up link | render_report.py | `comment-body.md` | Comment-post |
| 5.3 | `manual-pending` items state repro steps | render_report.py | `comment-body.md` | Comment-post |
| 5.4 | Comment includes raw audit (or path) | render_report.py | `comment-body.md` (path reference) | Comment-post |
| 5.5 | Post via `gh issue comment 10` | post_comment.sh | `comment-body.md` | Comment-post |
| 6.1 | ZH covers every EN key | check_parity.py | (already passes per gap-analysis) | — |
| 6.2 | Locale-aware prompts symmetric | render_report.py | `gap-report.md` Section 6 | — |
| 6.3 | EN-only ZH value = `review-needed` | check_parity.py | `parity.txt` (identical-value block) | — |
| 6.4 | ZH regression filed as gap | classify.py | `classified.csv` | — |
| 7.1 | File issue per gap | file_followups.sh | `gh issue create` | Follow-up |
| 7.2 | Group by category | file_followups.sh | one body per category in `PENDING-followups/` | Follow-up |
| 7.3 | No production-code edits | run_audit.sh | only writes under `.kiro/specs/.../` | — |
| 7.4 | Label follow-ups `i18n` | file_followups.sh | `gh issue create --label i18n` | Follow-up |
| 7.5 | Fallback inline list when no `gh` | file_followups.sh | `PENDING-followups/*.md` | Follow-up |
| 8.1 | Capture raw output | run_audit.sh | `audit/<sha>/` directory | Audit step |
| 8.2 | Preserve previous run | run_audit.sh | `<sha>` subdirectory naming | Audit step |
| 8.3 | Record HEAD sha | run_audit.sh | `git rev-parse HEAD` | Audit step |
| 8.4 | Idempotent re-run | run_audit.sh | re-running on same sha overwrites that sha's dir | Audit step |
## Components and Interfaces
| Component | Domain | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|-----------|--------|--------|--------------|--------------------------|-----------|
| run_audit.sh | Verification pipeline | Compose the audit and route artefacts | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 | git (P0), python3 (P0), gh (P1) | Batch |
| audit_cjk.sh | Static audit | Run `git grep -nP` and bucket | 1.1, 1.5 | git (P0) | Batch |
| check_parity.py | Catalogue diff | Diff en/zh + identical-value heuristic | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 | python3 stdlib (P0) | Batch |
| classify.py | Classification | Apply the 4-class label per match | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 | cjk-grep.txt (P0), parity.txt (P0) | Batch |
| render_report.py | Report assembly | Produce gap-report.md + comment-body.md | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 | classified.csv (P0) | Batch |
| post_comment.sh | GitHub side-effect | Post the comment on #10 | 5.5 | gh (P0), comment-body.md (P0) | Service |
| file_followups.sh | GitHub side-effect | Open follow-up issues | 7.1, 7.2, 7.4, 7.5 | gh (P0), PENDING-followups/* (P0) | Service |
### Verification pipeline
#### `run_audit.sh`
| Field | Detail |
|-------|--------|
| Intent | Single shell entrypoint that runs every step in order and persists artefacts under `audit/<commit-sha>/` |
| Requirements | 1.4, 7.3, 8.1, 8.2, 8.3, 8.4 |
**Responsibilities & Constraints**
- Must NOT modify any file outside `.kiro/specs/i18n-e2e-english-verification/`.
- Must capture HEAD sha before any other step (so the artefact path is set).
- Must exit `0` on full success (audit + GitHub side effects) and `2` on PENDING (audit succeeded, side effects didn't).
- Must be safely re-runnable on the same sha (overwriting that sha's directory is acceptable).
**Dependencies**
- Inbound: invoked manually by the verifier (`bash run_audit.sh`) — Criticality: P0.
- Outbound: `audit_cjk.sh`, `check_parity.py`, `classify.py`, `render_report.py`, `post_comment.sh`, `file_followups.sh` — Criticality: P0 each.
- External: `git`, `python3`, `gh` (P1 — fallback supported).
**Contracts**: Service [ ] / API [ ] / Event [ ] / Batch [x] / State [ ]
##### Batch / Job Contract
- **Trigger**: manual `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`.
- **Input / validation**: working tree at any commit; rejects detached non-clean trees? — no, the audit reads tracked files only via `git grep`, so unstaged edits are ignored deliberately.
- **Output / destination**: `.kiro/specs/i18n-e2e-english-verification/audit/<commit-sha>/`.
- **Idempotency & recovery**: Re-running on the same sha overwrites that sha's directory. PENDING outputs survive across runs until a `gh`-enabled run replaces them.
**Implementation Notes**
- Integration: invoked by humans only — no CI hookup in this spec.
- Validation: confirm `gh auth status` before attempting comment/issue posts; on failure, branch to PENDING.
- Risks: shell quoting around the PCRE pattern (`[\x{4e00}-\x{9fff}]`) — use single-quoted argument to `git grep -P`.
#### `audit_cjk.sh`
| Field | Detail |
|-------|--------|
| Intent | Run the canonical PCRE grep + per-bucket counts |
| Requirements | 1.1, 1.5 |
**Responsibilities & Constraints**
- Output: `cjk-grep.txt` (raw `git grep -nP` lines) and `cjk-grep-bucketed.txt` (one section per top-level path: `backend/app`, `frontend/src`, `locales/en.json`).
- Excludes binary file matches (e.g. `.jpeg` false positives).
**Dependencies**
- Inbound: `run_audit.sh` (P0).
- External: `git` 2.x (P0 — must support `-P` for PCRE).
**Contracts**: Batch [x]
##### Batch / Job Contract
- **Trigger**: invoked by `run_audit.sh`.
- **Input / validation**: receives the target output directory as argv[1]; aborts if missing.
- **Output / destination**: `cjk-grep.txt`, `cjk-grep-bucketed.txt` in `<sha>/`.
- **Idempotency & recovery**: deterministic — same tree → same output.
**Implementation Notes**
- Integration: pure read-only against `git`.
- Validation: `git --version` precondition; abort with a clear error if PCRE unsupported.
- Risks: ripgrep is NOT used (avoids a hard `rg` dependency); `git grep -P` is built-in to git's PCRE2 binding.
#### `check_parity.py`
| Field | Detail |
|-------|--------|
| Intent | Compare `locales/en.json` and `locales/zh.json`: key parity, CJK in EN, identical-value heuristic |
| Requirements | 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3 |
**Responsibilities & Constraints**
- Recursively flattens nested-dict keys with dotted paths.
- Reports three blocks: `missing-keys`, `cjk-in-en`, `identical-values`.
- Treats values as `review-needed` only if (a) en value == zh value, (b) value is non-empty, (c) value is more than two ASCII words.
**Dependencies**
- Inbound: `run_audit.sh` (P0).
- External: `json` from Python stdlib (P0).
**Contracts**: Batch [x]
##### Batch / Job Contract
- **Trigger**: invoked by `run_audit.sh` with the `<sha>` directory as argv[1].
- **Input / validation**: reads `locales/en.json` and `locales/zh.json` from cwd (must be invoked from repo root); fails fast on JSON parse error.
- **Output / destination**: `parity.txt` in `<sha>/`.
- **Idempotency & recovery**: pure function of catalogue contents.
**Implementation Notes**
- Integration: invoked from repo root so relative paths resolve.
- Validation: parse-on-load, both files must be objects.
- Risks: the "more than two ASCII words" heuristic may produce noise — `review-needed` is intentionally a soft label not a `gap`.
#### `classify.py`
| Field | Detail |
|-------|--------|
| Intent | Apply the 4-class label (`deliberate` / `gap` / `non-applicable` / `review-needed`) and a category tag per match |
| Requirements | 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4 |
**Responsibilities & Constraints**
- Reads `cjk-grep.txt` and `parity.txt`; emits `classified.csv` with columns: `file`, `line`, `match`, `class`, `category`, `pipeline_step`.
- Categories (closed set): `frontend-ui-string`, `frontend-regex-parser`, `backend-docstring`, `backend-comment`, `backend-log`, `backend-prompt-label`, `propagation`, `catalogue-parity`, `binary-false-positive`.
- Pipeline-step tags (closed set): `Graph Build`, `Env Setup`, `Simulation`, `Report`, `Interaction`, `Logs`, `UI`, `n/a`.
- Classification rules:
- `locales/en.json` CJK → always `gap` / `catalogue-parity` / `n/a` (R1.5).
- File path under `frontend/src/views/` or `frontend/src/components/` AND match is inside a string literal (heuristic: enclosed in `'…'`/`"…"`/`` `…` ``) → `gap` / `frontend-ui-string`.
- Match inside a `text.match(/.../)` call in a `.vue` file → `frontend-regex-parser` / `gap` (cause: backend emits CJK).
- Backend `.py` file, line starts with `#` or appears inside a triple-quoted docstring → `deliberate-blocked-by-#7` / `backend-docstring` (or `backend-comment`) — counted but not filed as a fresh follow-up since #7 already covers it.
- Backend `.py` file, line contains `logger.`, `log.`, `print(` and CJK in a string literal → `gap` / `backend-log` / appropriate step tag.
- Backend `.py` file in `services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py` and CJK appears inside an LLM-prompt context label (heuristic: a string literal not preceded by `#`) → `gap` / `backend-prompt-label`.
- Binary files (e.g. `.jpeg` ripgrep matches): `non-applicable` / `binary-false-positive`.
- Anything else: `review-needed` (forces a human look).
**Dependencies**
- Inbound: `audit_cjk.sh`, `check_parity.py` (P0).
- External: `csv` from Python stdlib.
**Contracts**: Batch [x]
##### Batch / Job Contract
- **Trigger**: invoked by `run_audit.sh` after the two preceding steps.
- **Input / validation**: `cjk-grep.txt` and `parity.txt` must exist in `<sha>/`.
- **Output / destination**: `classified.csv`.
- **Idempotency & recovery**: deterministic — same inputs → same csv.
**Implementation Notes**
- Integration: classification rules are heuristics, not a parser; correctness is bounded by careful regexes and an explicit "fallthrough = `review-needed`" rule.
- Validation: every input row produces an output row (no silent drops); a count-equality assertion runs at the end.
- Risks: false negatives (e.g. a Chinese log string that doesn't contain `logger.` on the same line) — `review-needed` fallthrough catches these.
#### `render_report.py`
| Field | Detail |
|-------|--------|
| Intent | Produce `gap-report.md` and `comment-body.md` |
| Requirements | 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2 |
**Responsibilities & Constraints**
- `gap-report.md`: Sections: Overview, Section 1 (static audit), Section 2 (parity), Section 3 (prompt verification), Section 4 (propagation), Section 5 (issue-#10 checklist mapping), Section 6 (ZH regression), Section 7 (follow-up plan).
- `comment-body.md`: Markdown comment for issue #10 — mirrors the issue's checklist with `pass` / `gap` / `manual-pending` for each line, plus a "How to re-run" footer.
- Reads `classified.csv` and the issue body (snapshot at `.ticket/10.md`).
**Dependencies**
- Inbound: `classify.py` (P0), `.ticket/10.md` (P0).
- External: Python stdlib only.
**Contracts**: Batch [x]
##### Batch / Job Contract
- **Trigger**: `run_audit.sh` after `classify.py`.
- **Input / validation**: `classified.csv` and `.ticket/10.md` must exist.
- **Output / destination**: `gap-report.md`, `comment-body.md` in `<sha>/`.
- **Idempotency & recovery**: deterministic.
**Implementation Notes**
- Integration: the comment body must include a `Run on commit <sha>` header so the comment is traceable.
- Validation: confirm every issue-body checkbox has been mapped (count check).
- Risks: rendering CJK characters in markdown — Python writes UTF-8 by default; comment body is verified to round-trip via `gh`.
#### `post_comment.sh`
| Field | Detail |
|-------|--------|
| Intent | Post `comment-body.md` as a comment on issue #10 |
| Requirements | 5.5 |
**Responsibilities & Constraints**
- `gh issue comment 10 --repo salestech-group/MiroFish --body-file <sha>/comment-body.md`.
- On non-zero exit, copies the body to `<sha>/PENDING-issue-10-comment.md` and exits non-zero.
**Dependencies**
- External: `gh` (P0; degrades to PENDING when missing).
**Contracts**: Service [x]
##### Service Interface
```text
post_comment.sh <sha-dir>
precondition: <sha-dir>/comment-body.md exists
postcondition (success): comment posted; URL printed to stdout
postcondition (failure): <sha-dir>/PENDING-issue-10-comment.md present; exit code 2
```
**Implementation Notes**
- Integration: must be the second-to-last step (so failures don't block the issue-filing fallback).
- Validation: parses `gh`'s URL output and writes it to `<sha>/comment-url.txt` on success.
- Risks: PR-time rate limits — unlikely for a single comment.
#### `file_followups.sh`
| Field | Detail |
|-------|--------|
| Intent | Open one follow-up issue per gap category |
| Requirements | 7.1, 7.2, 7.4, 7.5 |
**Responsibilities & Constraints**
- Iterates `<sha>/PENDING-followups/*.md` (which `render_report.py` always writes; the ones whose category had zero gaps stay empty placeholders).
- For each non-empty body, runs `gh issue create --repo salestech-group/MiroFish --title <title> --body-file <body> --label i18n`.
- On `gh` failure for any single category, leaves the corresponding `PENDING-followups/<n>-*.md` in place and exits non-zero at the end (after attempting all categories).
**Dependencies**
- External: `gh` (P0; degrades to PENDING).
**Contracts**: Service [x]
##### Service Interface
```text
file_followups.sh <sha-dir>
precondition: <sha-dir>/PENDING-followups/*.md exist (possibly empty placeholders)
postcondition (success): all non-empty bodies posted; URLs appended to <sha-dir>/followup-urls.txt; bodies removed from PENDING-followups/
postcondition (partial): URLs in followup-urls.txt for the ones that posted; the rest stay in PENDING-followups/; exit code 2
```
**Implementation Notes**
- Integration: must be the last step.
- Validation: post-hoc count check (`gh` URLs + remaining PENDING bodies = total categories).
- Risks: a category that the spec already considers covered (e.g. backend docstrings → blocked by #7) is not re-filed; the spec's category list is closed and excludes that case.
## Data Models
### Domain Model
The audit operates on three logical concepts:
- **Match** — a single line of `git grep` output. `(file, line, raw_text)`.
- **Classification**`(match, class ∈ {deliberate, gap, non-applicable, review-needed}, category ∈ closed-set, pipeline_step ∈ closed-set)`.
- **Follow-up**`(category, title, body, status ∈ {posted, pending}, url?)`.
Invariant: every `Match` produces exactly one `Classification`; every `Classification` with `class == gap` belongs to exactly one `Follow-up` category (which may aggregate multiple gaps).
### Logical Data Model
**`classified.csv` schema** (CSV, UTF-8, header row):
| Column | Type | Notes |
|--------|------|-------|
| `file` | string | repo-relative path |
| `line` | int | 1-indexed |
| `match` | string | trimmed grep line |
| `class` | enum | `deliberate` / `gap` / `non-applicable` / `review-needed` |
| `category` | enum | closed set listed in classify.py rules |
| `pipeline_step` | enum | closed set listed in classify.py rules |
Natural key: `(file, line)`.
**`parity.txt` structure** (text, three labelled blocks):
```
[missing-keys]
en-only: <key.path>
zh-only: <key.path>
[cjk-in-en]
<key.path>: <value snippet>
[identical-values]
<key.path>: <value> # review-needed if non-trivial English prose
```
### Data Contracts & Integration
- **`comment-body.md`** must be valid GitHub-flavoured Markdown; checkbox lines preserve the issue's original ordering.
- **Follow-up issue body** must be valid GitHub-flavoured Markdown; first line is a one-sentence summary; subsequent sections are: `## Evidence` (file:line list), `## Linked from` (#10 + comment URL), `## Acceptance` (a small checklist).
## Error Handling
### Error Strategy
- **Read-only operations** (steps 14): on any uncaught error (missing file, JSON parse error), the script aborts with a non-zero exit before any artefact is half-written. The orchestrator uses `set -euo pipefail`.
- **GitHub side effects** (steps 56): wrapped — failure routes to PENDING outputs and the orchestrator exits `2`.
### Error Categories and Responses
- **User errors**: invoked from wrong directory → fail fast with "must be run from repo root".
- **System errors**: `git`/`python3`/`gh` missing → fail fast with "install <tool>"; `gh auth status` not OK → branch to PENDING.
- **Business errors**: classification produces 0 matches but `cjk-grep.txt` non-empty → assertion failure (count-equality bug).
### Monitoring
- The orchestrator prints a one-line status per step.
- Final summary block to stdout: total matches, gaps, `manual-pending`, follow-ups posted vs PENDING.
## Testing Strategy
- **Unit tests**: not introduced — the scripts are simple enough that a one-shot dry run on the live tree is the canonical validation.
- **Integration test**: a single `bash run_audit.sh` against the working tree; success criteria below.
- **Validation checklist** (run during implementation):
- The audit produces a non-empty `cjk-grep.txt`.
- `parity.txt` reports 0 missing keys (matches the live state at HEAD).
- `classified.csv` row count == `cjk-grep.txt` line count.
- `gap-report.md` and `comment-body.md` parse as valid markdown (manual eyeball — no toolchain required).
- The classifier marks every `locales/en.json` CJK as `gap` (currently zero such matches, so this asserts the negative).
- With `gh` available: a comment is posted on #10 and follow-up issues are created.
- With `gh` simulated as absent (e.g. `PATH=/dev/null`): PENDING outputs appear under `<sha>/`.
### Out of scope for testing
- The live UI walkthrough is `manual-pending` (R5.3) and not part of the test plan.
- Performance, scalability, security: nothing to test — read-only single-shot scripts.

View File

@ -0,0 +1,136 @@
# Gap Analysis — i18n-e2e-english-verification
## 1. Current state investigation
### Domain-relevant assets in the repo
| Concern | Location | Notes |
|---|---|---|
| Locale catalogues | `locales/en.json`, `locales/zh.json`, `locales/languages.json` | Flat-namespaced JSON, loaded by `vue-i18n` and the backend logger. |
| Frontend i18n loader | `frontend/src/i18n/` | Provides `useI18n()` to components. |
| Frontend UI surface | `frontend/src/views/`, `frontend/src/components/` | Step15 components + `Process.vue` orchestrator. |
| Backend logger | `backend/app/utils/logger.py` (per CLAUDE.md) | Externalised log messages (#6 work). |
| Locale helpers | `backend/app/utils/` | Per CLAUDE.md, locale propagation lives here. |
| Prompt assets that emit user-visible text | `backend/app/services/ontology_generator.py` (#2, #3?), `oasis_profile_generator.py` (#3), `simulation_config_generator.py` (#4), `report_agent.py` (#5) | Prompts are inline Python strings, not separate files. |
| Pipeline boundaries | `backend/app/api/*.py` (Flask), `services/simulation_runner.py` + `simulation_ipc.py` (subprocess), `services/report_agent.py` (ReACT) | Locale must propagate across all of these. |
### Project conventions surfaced
- `Task` model used for any long-running operation (CLAUDE.md). Verification doesn't introduce one — it is a one-shot batch.
- Reasoning-model output stripping convention exists, irrelevant here.
- Per-project `group_id` isolation in Neo4j — verification queries should NOT touch Neo4j; we run a static audit only.
- "Match the surrounding file's style" (no enforced formatter).
### Live audit baseline (commit `9dcaecd`)
```
git grep -nP "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json | wc -l
→ 2918 lines across 36 files
```
Bucketed:
| Bucket | Files | Lines | Notes |
|---|---|---|---|
| `locales/en.json` | 0 | 0 | ✅ clean |
| `frontend/src/views/Process.vue` | 1 | 65 | hard-coded UI strings (template + JS literals), not i18n keys |
| `frontend/src/components/Step{2,3,4,5}*.vue` | 4 | ~50 (mostly Step4Report.vue regex parsers) | depends-on-backend regex parsers + a few literals |
| `backend/app/services/*.py` | 13 | majority | docstrings + comments + a few prompt assembly fragments + agent context labels (e.g. `"事实信息:"` in `oasis_profile_generator.py`) |
| `backend/app/api/*.py` | 4 | many | docstrings + comments + log-message Chinese (`build_logger.info(f"[{task_id}] 开始构建图谱...")` etc) |
| `backend/app/utils/*.py` | 7 | many | docstrings + comments + log strings (e.g. `retry.py` "函数 {func} 在 N 次重试后仍失败") |
| `backend/app/models/*.py` | 3 | docstrings | docstrings only (probably) |
### Locale catalogue parity (Python check)
```
en keys: 953
zh keys: 953
symmetric diff: 0
```
→ R2 (parity) passes. ZH backfill (#8) closed the gap and en/zh are now lock-step.
### Boundary review surface (R4)
- `backend/app/api/graph.py` `build_logger.info(f"[{task_id}] 开始构建图谱...")` shows the backend logger is still emitting Chinese on the build path — this is exactly the kind of leak #6 was supposed to externalise.
- `backend/app/utils/retry.py` `logger.error(f"函数 {func.__name__} 在 {max_retries} 次重试后仍失败...")` — same: log strings remain hard-coded Chinese.
- ReACT/agent context labels in `oasis_profile_generator.py` (`"事实信息:"`, `"相关实体:"`) feed directly into the LLM prompt — these will bias the model toward Chinese output.
## 2. Requirements feasibility
### Mapping requirements → existing assets
| Req | Need | Existing asset | Gap tag |
|---|---|---|---|
| R1 (static audit) | run `git grep` and capture output | git, ripgrep | None — straightforward |
| R1.5 (`en.json` CJK check) | inspect catalogue | already at 0 hits | None — passes |
| R2 (parity) | enumerate keys recursively, diff | small Python script | None — already passes |
| R3 (prompt verification) | read prompt strings in `services/*.py` | inline Python strings | **Constraint** — prompts are inline, not standalone files; verification must read source not assets |
| R4 (propagation) | trace locale across Flask → Task → OASIS → ReACT | source code review | **Research needed** in design phase: where exactly is locale stored today? CLAUDE.md hints `set_locale` thread-local exists but path not yet read |
| R5 (post comment) | `gh issue comment 10` | `gh` CLI | None |
| R6 (ZH regression) | confirm zh values are non-English | small Python script | None |
| R7 (file follow-ups) | `gh issue create` | `gh` CLI | None |
| R8 (capture & idempotence) | write under `.kiro/specs/.../audit/` | filesystem | None |
### Complexity signals
- Algorithmic: trivial — grep + count + diff.
- Workflow: post a comment + open follow-up issues — one-shot.
- External integrations: GitHub via `gh`. No DB, no Neo4j, no LLM calls.
### Constraints from existing architecture
- **No code edits to `backend/app/`, `frontend/src/`, `locales/`** — the spec is verification-only. The change-set is confined to `.kiro/specs/i18n-e2e-english-verification/` (audit captures, gap report, follow-up issue list) and any commit message / PR description.
- Manual UI walkthrough is not feasible in a sandboxed CLI — must be marked `manual-pending` per R5.3.
- Live `docker-compose up` likewise unavailable — same handling.
## 3. Implementation approach options
### Option A — Pure shell + Python script kept under `.kiro/specs/.../audit/`
- A single Bash + Python pipeline that emits `audit/cjk-grep.txt`, `audit/parity.txt`, `audit/gap-report.md`.
- Posts the comment via `gh` and opens follow-ups via `gh issue create`.
- Scripts are read-only against production source.
✅ Simplest, no production-code touch.
✅ Easy to re-run.
❌ Scripts only relevant to this ticket — scoped to `.kiro/specs/.../audit/scripts/`, not promoted to a reusable `tools/`.
### Option B — Build a reusable `tools/i18n-audit/` checker
- Create a permanent CLI under `tools/` so future verifiers can re-run.
- Integrates with CI (could become a check that fails when `en.json` contains CJK).
❌ Adds a tool & directory the project doesn't have. Scope creep — the spec is for one verification pass, not a CI check.
❌ A reusable tool wants its own ticket; ramming it in here violates the "no inline fixes" rule.
### Option C — Hybrid: ad-hoc script for this run, plus open a follow-up issue requesting the reusable CI check
- Run the verification with disposable scripts (Option A) AND file a follow-up issue asking for the reusable CI check (Option B as a future ticket).
✅ Keeps current ticket scoped.
✅ Captures the value of B without bloating this PR.
## 4. Out-of-scope items deferred
- Any **production code edits** that would close gaps. R7 makes this explicit.
- Live UI walkthrough / dynamic verification — captured as `manual-pending` in the report.
## 5. Effort & risk
- **Effort**: S (1 day) — auditing scripts + report writing + issue filings.
- **Risk**: Low — read-only operations, no architectural change, the failure mode (`gh` lacking permissions) is handled by R7.5 (fallback inline list).
## 6. Recommendations for design phase
- **Preferred approach**: Option C (hybrid).
- **Key decisions to make in design**:
- Concrete script layout under `.kiro/specs/i18n-e2e-english-verification/audit/`.
- Format of `audit/gap-report.md` (the artefact echoed into the issue comment).
- Exact follow-up issue grouping rule (R7.2): one issue per pipeline step? per file? per category (UI / logs / prompts / docstrings)?
- Reproducibility (R8.2): do we keep `audit/<commit-sha>/` per run, or `audit/latest/` + `audit/previous/`?
- Whether the scripts are committed to the repo (they live under `.kiro/specs/...` — yes by default) or only the captured outputs.
- **Research items to carry forward**:
- Read `backend/app/utils/` to confirm whether a locale helper / `set_locale` exists today (R4 detail).
- Read `backend/app/utils/logger.py` to confirm where externalised log keys live and how the locale is selected at log time (R4 + Step-1 logs checklist item).
- Confirm whether any `services/*.py` Chinese match is part of an LLM **prompt** vs a comment — only prompt matches block R3.

View File

@ -0,0 +1,122 @@
# Requirements Document
## Project Description (Input)
Issue #10: i18n end-to-end verification of full pipeline. Run a verification pass to prove the entire 5-step pipeline (Graph Build, Env Setup, Simulation, Report, Interaction) works cleanly in English, with locale propagating across Flask routes, background tasks, OASIS subprocess, Graphiti/Neo4j, and the ReACT report agent. Produce a verification report (posted as a comment on issue #10) summarising pass/fail per checklist item and listing any leftover Chinese strings as `file:line` refs. Run the static audit `git grep -nE "[\\x{4e00}-\\x{9fff}]" -- backend/app frontend/src locales/en.json` and confirm only deliberately-kept Chinese remains. File any newly discovered gaps as follow-up issues (do NOT patch silently in this ticket). Acceptance: all checklist items pass for both EN and ZH; report posted; no surprise Chinese in EN paths. Out of scope: fixing newly discovered gaps inline; perf/load testing; new locales beyond EN/ZH.
## Introduction
This spec covers the final verification pass for the i18n epic (#11). After issues #2#9, #12 land, the entire 5-step MiroFish pipeline must demonstrably run in English — UI, background work, LLM-generated artifacts (ontologies, agent profiles, sim configs, reports, chat replies), and backend logs — without any unintended Chinese leaking into English-locale paths. The pass also regression-checks that switching locale back to Chinese still produces fully Chinese output. Because the pipeline crosses a Flask app, background `Task` workers, an OASIS subprocess, Graphiti/Neo4j, and a ReACT report agent, the verification has both a static (grep + locale-file) component and a dynamic (live walkthrough of Step 1 → 5) component.
The deliverables are: (a) a static audit + categorization of any remaining Chinese strings under English paths, (b) a verification report posted as a comment on issue #10 summarising pass/fail per checklist item with `file:line` evidence, and (c) follow-up GitHub issues for every gap found — fixes are explicitly **out of scope** here.
## Boundary Context
- **In scope**:
- Static audit (`git grep` for CJK Unified Ideographs) of `backend/app/`, `frontend/src/`, and `locales/en.json`.
- Inspection of locale catalogues (`locales/en.json`, `locales/zh.json`) for parity, key coverage, and accidental Chinese in the EN catalogue.
- Inspection of LLM-prompt assets that drive Step 15 outputs (ontology, profile, sim-config, report-agent prompts) to confirm they emit English under EN locale.
- Inspection of locale propagation paths: HTTP request → Flask handler → `Task` background worker → OASIS subprocess → ReACT agent.
- Verification report posted as a comment on issue #10.
- Follow-up issues filed for every gap found.
- **Out of scope**:
- Fixing any newly discovered gaps inline in this ticket — they are filed as separate issues.
- Performance or load testing.
- Adding new locales beyond EN/ZH.
- The live UI walkthrough with screenshots, when no human or browser is available — the static audit results plus prompt/locale-catalogue evidence stand in. The verification report explicitly marks UI-only checklist items as "manual-pending" if not run live.
- **Adjacent expectations**:
- Closes the i18n epic #11 once #12 also lands.
- Depends on (and re-verifies) the work in #2, #3, #4, #5, #6, #8, #9, #12.
## Requirements
### Requirement 1: Static CJK audit of English code paths
**Objective:** As an i18n verifier, I want a deterministic grep-based audit of files that should be English-only, so that any Chinese leaking into the EN-locale code path is detected and recorded.
#### Acceptance Criteria
1. The Verification System shall execute `git grep -nE "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json` and capture every match with `file:line` precision.
2. The Verification System shall classify each match as one of: (a) `deliberate` (e.g. test fixture demonstrating ZH input, doc example, comment explicitly retained per project convention), (b) `gap` (unintended Chinese in EN-facing code), or (c) `non-applicable` (false positive such as a regex character class).
3. When a match is classified as `gap`, the Verification System shall record `file:line`, the Chinese substring, and the affected pipeline step (Graph Build / Env Setup / Simulation / Report / Interaction / Logs / UI).
4. The Verification System shall not modify any matched file as part of this audit; remediation is filed as a follow-up issue per Requirement 7.
5. While the audit is running, the Verification System shall additionally inspect `locales/en.json` for entries whose value contains CJK characters and report those separately (an EN catalogue value containing Chinese is always a `gap`).
### Requirement 2: Locale catalogue parity check
**Objective:** As an i18n verifier, I want to confirm that the EN and ZH catalogues stay in lockstep, so that switching locale never falls back to a missing key or leaks the other locale.
#### Acceptance Criteria
1. The Verification System shall enumerate the key set of `locales/en.json` and `locales/zh.json` (recursively across nested objects) and compute the symmetric difference.
2. If a key is present in `en.json` but missing from `zh.json` (or vice versa), the Verification System shall record the missing key path and treat it as a `gap`.
3. If any value in `en.json` contains a CJK character, the Verification System shall record it as a `gap` (as in Requirement 1.5).
4. If any value in `zh.json` is identical to its `en.json` counterpart and the EN value is non-trivial English prose (more than two ASCII words), the Verification System shall flag it as a candidate untranslated entry — these are reported as `review-needed`, not auto-classified `gap`, since some technical terms (URLs, identifiers, single tokens) legitimately stay identical.
5. The Verification System shall not edit either catalogue file as part of this check.
### Requirement 3: LLM-prompt locale verification
**Objective:** As an i18n verifier, I want to confirm that every LLM prompt that drives a Step 15 output respects the requested locale, so that ontology entries, agent profiles, simulation configs, report prose, and chat replies render in the user's selected language.
#### Acceptance Criteria
1. The Verification System shall enumerate the prompt files that produce user-visible output for Steps 15 (e.g. ontology generator, OASIS profile generator, simulation-config generator, report agent prompts, interview chat).
2. For each prompt file, the Verification System shall confirm that it either (a) is fully English with an explicit "respond in ${locale}" directive, or (b) is rendered through a locale-aware template that injects the active locale.
3. If a prompt file hard-codes a Chinese-only directive (e.g. "请用中文回答") on the EN code path, the Verification System shall record it as a `gap`.
4. The Verification System shall confirm that the prompt files referenced by issues #3, #4, #5 are no longer Chinese-only post-merge; if any still are, they are recorded as `gap` blocking #10.
### Requirement 4: Locale propagation surface review
**Objective:** As an i18n verifier, I want to confirm that the active locale survives every process boundary, so that an EN request still produces EN output after it crosses into a `Task` worker, the OASIS subprocess, or the ReACT agent.
#### Acceptance Criteria
1. The Verification System shall identify each handoff boundary: HTTP → Flask handler, Flask handler → `Task` worker, `Task` worker → OASIS subprocess, ReACT agent → tool calls.
2. For each handoff, the Verification System shall confirm that the locale is either (a) carried explicitly in the call payload / kwargs, or (b) re-derived deterministically (e.g. from per-project config, `Accept-Language` header, or `set_locale` thread-local equivalent) on the receiving side.
3. If a boundary discards the locale and the receiving side defaults silently to Chinese (or any non-EN locale) under an EN request, the Verification System shall record the boundary as a `gap`.
4. The Verification System shall examine the backend logger to confirm that log messages on the EN code path resolve to English templates (depends on #6).
### Requirement 5: Verification report comment on issue #10
**Objective:** As the issue owner, I want a single canonical verification report posted as a comment on issue #10, so that reviewers can see pass/fail per checklist item and trace every `gap` to a `file:line` and a follow-up issue.
#### Acceptance Criteria
1. When the static audit, parity check, prompt verification, and propagation review are complete, the Verification System shall compose a markdown comment on issue #10 that lists every checklist item from the ticket body with one of the statuses `pass` / `gap` / `manual-pending`.
2. For each `gap` status, the comment shall include `file:line` references and a link to the follow-up issue filed per Requirement 7.
3. For each `manual-pending` status, the comment shall state explicitly that the item requires a live UI walkthrough (or full-stack run) which was not performed in this verification environment, and shall list the exact reproduction steps the next reviewer needs to run.
4. The comment shall include the raw output (or a path to the captured output) of the `git grep` audit so future verifiers can diff against the baseline.
5. The Verification System shall post the comment using `gh issue comment 10 --repo salestech-group/MiroFish` and shall record the resulting comment URL in the spec / commit message.
### Requirement 6: ZH regression check
**Objective:** As an i18n verifier, I want to confirm that the ZH locale still renders fully Chinese, so that the EN work has not regressed the original-language experience.
#### Acceptance Criteria
1. The Verification System shall confirm that `locales/zh.json` covers every key present in `locales/en.json` (Requirement 2) so that no UI string falls back to English under ZH.
2. The Verification System shall confirm that prompts rendered through locale-aware templates produce a Chinese variant when locale=zh (i.e. the templating mechanism is symmetric between EN and ZH).
3. If a UI string is English-only under ZH (i.e. `zh.json` value is identical to the EN value and the value is non-trivial English prose), the Verification System shall flag it per Requirement 2.4 as `review-needed`.
4. The Verification System shall record any ZH-specific regression as a separate `gap` and file a follow-up issue per Requirement 7.
### Requirement 7: Follow-up issues for every discovered gap
**Objective:** As the project owner, I want every gap discovered in this verification pass tracked as its own GitHub issue, so that fixes are sequenced separately and #10 stays scoped to verification only.
#### Acceptance Criteria
1. When a `gap` is recorded by Requirements 16, the Verification System shall file a GitHub issue against `salestech-group/MiroFish` containing: a one-sentence summary, the affected pipeline step, the `file:line` evidence, and a link back to issue #10 and to the verification report comment.
2. If grouping is sensible (e.g. five `gap`s in a single locale-catalogue file), the Verification System shall consolidate them into a single follow-up issue with a checklist body, instead of filing five micro-issues.
3. The Verification System shall not patch any gap inline in this ticket; the spec change-set must be limited to the verification artefacts (spec docs + report capture under `.kiro/specs/i18n-e2e-english-verification/`) and must not modify production source files under `backend/app/`, `frontend/src/`, or `locales/`.
4. The Verification System shall label every follow-up issue with the `i18n` label (and `bug` if the gap is regressing existing behaviour) so they aggregate under the i18n epic.
5. If the verification environment cannot file issues (e.g. no `gh` permissions), the Verification System shall list the would-be issues inline in the verification report as a fallback so a human can file them, and shall mark the corresponding checklist item `gap-pending-issue` instead of `gap`.
### Requirement 8: Reproducibility and idempotence
**Objective:** As a future verifier, I want this verification pass to be re-runnable, so that we can re-baseline after each subsequent merge to the i18n epic.
#### Acceptance Criteria
1. The Verification System shall capture the raw audit output to `.kiro/specs/i18n-e2e-english-verification/audit/` so the next verifier can diff against the previous run.
2. While a previous capture exists, the Verification System shall preserve it (timestamped or under a `previous/` subdirectory) rather than overwriting it silently.
3. The Verification System shall record the commit SHA at the time of the audit so the report comment can be tied to a specific tree state.
4. If the audit is re-run and the gap set is unchanged, the Verification System shall produce a no-op report comment that confirms parity rather than spamming a new gap list.

View File

@ -0,0 +1,112 @@
# Research & Design Decisions — i18n-e2e-english-verification
## Summary
- **Feature**: `i18n-e2e-english-verification`
- **Discovery Scope**: Extension (verification-only against existing i18n surface)
- **Key Findings**:
- `locales/en.json` is already CJK-clean (0 hits) and `locales/zh.json` is at perfect parity (953/953 keys).
- Bulk of remaining CJK is in backend Python source (~26 files across `services/`, `api/`, `utils/`, `models/`) — overwhelmingly docstrings, comments, and a non-trivial number of log strings + LLM-prompt context labels. This is blocked by issue #7 (translate Chinese docstrings/comments).
- Frontend `Process.vue` still has ~65 hard-coded Chinese strings in template/JS literals (not routed through `t()` keys); 4 step components have a smaller surface (mainly Step4Report's regex parsers that match Chinese backend output).
- Live UI/full-stack walkthrough is not feasible in this sandboxed CLI environment — that portion of the verification will be reported as `manual-pending` with reproduction steps.
## Research Log
### Audit baseline
- **Context**: R1 requires running the canonical `git grep` audit and bucketing the matches.
- **Sources consulted**: ripgrep / `git grep -P` against the working tree at `9dcaecd` (HEAD of `docs/i18n-9-translate-frontend-comments`).
- **Findings**:
- Total CJK lines: **2918** across **36** files (counting 2 binary `.jpeg` false positives that ripgrep matches when scanning the assets folder).
- Bucket distribution: `locales/en.json` 0 / `frontend/src` 7 files (5 source + 2 binary) / `backend/app` 29 files.
- The shell-style regex `[\x{4e00}-\x{9fff}]` in the issue body must be passed to `git grep` with `-P` (PCRE) — POSIX ERE rejects `\x{...}` ranges. The verification scripts must use `-P` or document the deviation.
- **Implications**: The audit script must use PCRE; binary files should be excluded explicitly so the `.jpeg` false positives do not pollute the gap report.
### Locale-catalogue parity
- **Context**: R2 demands key-set parity between `en.json` and `zh.json`.
- **Sources consulted**: small Python diff over the catalogues (recursive nested-dict key flattening).
- **Findings**: 953 keys each, symmetric difference 0. Already passing.
- **Implications**: R2.1, R2.2 will trivially pass; R2.4 (untranslated-but-identical entries) still needs running.
### Locale propagation surface
- **Context**: R4 requires confirming that locale survives Flask handler → `Task` → OASIS subprocess → ReACT agent.
- **Sources consulted**: `backend/app/api/graph.py`, `backend/app/services/` skim, CLAUDE.md (mentions `set_locale` thread-local).
- **Findings**:
- `backend/app/api/graph.py` line 385 etc still emit Chinese log strings inline (`build_logger.info(f"[{task_id}] 开始构建图谱...")`) — the log externalisation work (#6) didn't reach these call sites.
- `backend/app/utils/retry.py` log strings are still hard-coded Chinese (`logger.error(f"函数 {func.__name__} ...")`).
- `oasis_profile_generator.py` LLM-prompt context labels (`"事实信息:"`, `"相关实体:"`) feed into the agent prompt verbatim — these will bias the LLM toward Chinese output even under EN locale.
- **Implications**: R4.3 (locale discarded silently → defaults non-EN) has live evidence; multiple `gap` items will be filed.
## Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|--------|-------------|-----------|---------------------|-------|
| Pure shell + Python script (Option A) | One-shot scripts in `.kiro/specs/.../audit/scripts/` produce `audit/<sha>/*.txt` and `audit/<sha>/gap-report.md` | Simplest; no production-code touch; easy to re-run; fits R8 capture format | Scoped to this ticket — not a permanent CI guard | Selected |
| Reusable `tools/i18n-audit/` CLI (Option B) | Promote the audit to a permanent project tool wired into CI | Long-term safety net; future PRs would fail on regressions | Out of scope per R7.3 (verification-only); adds new top-level directory | Filed as a follow-up issue, not implemented here |
| Hybrid (Option C) | Run Option A now; file an issue requesting Option B as future work | Captures B's value without bloating this PR | None material | Adopted |
## Design Decisions
### Decision: Audit lives entirely under `.kiro/specs/i18n-e2e-english-verification/`
- **Context**: R7.3 forbids modifying production source in this ticket; the verification artefacts (scripts and captures) need a home.
- **Alternatives considered**:
1. Top-level `tools/i18n-audit/` — rejected (creates a long-lived asset out of a one-shot ticket).
2. `scripts/` next to existing project scripts — rejected (project has no convention for verification scripts; `.kiro/specs/` is the canonical home for spec-scoped work).
3. `.kiro/specs/.../audit/` — selected.
- **Selected approach**: Scripts at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` and outputs at `.kiro/specs/.../audit/<commit-sha>/`.
- **Rationale**: Co-locates spec, requirements, design, and the artefacts a future verifier needs to re-run the pass. Honours the steering rule that the spec dir is the source of truth for spec-scoped state.
- **Trade-offs**: Scripts aren't reused beyond this ticket. Re-runs require checking out the spec dir (which is committed).
- **Follow-up**: File a follow-up issue suggesting Option B (a permanent CI guard) for the next iteration of the i18n epic.
### Decision: Manual UI walkthrough → `manual-pending`, not `gap`
- **Context**: R5.3 already permits `manual-pending` when a checklist item requires running the live stack. This run is sandboxed CLI — no browser, no Docker.
- **Alternatives considered**:
1. Mark UI items `gap` because they weren't proven — rejected (a `gap` is a *known* failure; UI items are simply untested in this run).
2. Skip them silently — rejected (R5.1 requires every checklist item to have a status).
3. Mark `manual-pending` with reproduction steps — selected.
- **Rationale**: Honest about the verification environment's limits. Future verifiers can flip `manual-pending` to `pass` or `gap` after running the live walkthrough.
- **Trade-offs**: Issue #10 cannot be fully closed by this run alone; the verification-pass comment will say so explicitly.
### Decision: Gap classification = (deliberate / gap / non-applicable / review-needed)
- **Context**: R1.2 lists three classes; R2.4 introduces a fourth (`review-needed`).
- **Alternatives considered**:
1. Three-class only — rejected (forces premature decisions on identical en/zh values).
2. Four-class with explicit semantics — selected.
- **Rationale**: A four-class scheme keeps the `gap` count truthful (it counts only known-bad lines), and `review-needed` is a soft signal that a human should re-check.
- **Trade-offs**: Slightly more complex schema; mitigated by documenting the four labels at the top of `gap-report.md`.
### Decision: Follow-up grouping by category, not by file
- **Context**: R7.2 allows consolidation. There are too many CJK-bearing files (29) to file one issue each.
- **Alternatives considered**:
1. One issue per file — rejected (29 micro-issues).
2. One issue per pipeline step (R1.3 step tag) — feasible but cross-cuts existing per-component issues like #7.
3. One issue per **gap category** — selected: (a) frontend hard-coded UI strings, (b) backend log strings, (c) backend LLM-prompt context labels, (d) recommend a permanent CI check.
- **Rationale**: Categories already align with how the i18n epic broke down work (#3, #4, #5, #6 = LLM-prompts; #7 = docstrings/comments; #9 = frontend comments). Categories also map cleanly to single PRs, which is how subsequent fixes will land.
- **Trade-offs**: Some files appear in multiple categories. Mitigated by listing `file:line` evidence inside each category issue.
### Decision: Issue-comment fallback when `gh` is unavailable
- **Context**: R7.5 mandates a fallback if `gh` permissions are missing.
- **Selected approach**: If `gh` posts fail, the script writes the comment body to `audit/<sha>/PENDING-issue-10-comment.md` and the would-be follow-up issue bodies to `audit/<sha>/PENDING-followups/*.md` so a human can paste them.
- **Rationale**: Keeps the audit re-runnable offline; keeps the artefact set faithful to what *would* have been posted.
- **Trade-offs**: Verification doesn't truly close until a human posts. Surfaced loudly in the run-summary.
## Risks & Mitigations
- **Risk**: A `gap` is mis-classified as `non-applicable` (e.g. a regex character class versus a real Chinese label) → Mitigation: classification tracked in a small CSV alongside the raw grep, so re-classification is auditable.
- **Risk**: `gh` rate limits hit when filing follow-ups → Mitigation: file at most 4 follow-ups (one per category) — far below any rate limit.
- **Risk**: Re-running the audit on a divergent branch produces a noisy diff → Mitigation: `audit/<commit-sha>/` directories preserve history; comparison is opt-in via `diff -ru`.
- **Risk**: Live walkthrough never happens, leaving #10 in `manual-pending` indefinitely → Mitigation: the verification report comment names a concrete "next reviewer" reproduction script; `manual-pending` items have explicit acceptance criteria.
## References
- Issue #10 — https://github.com/salestech-group/MiroFish/issues/10
- Epic #11 — https://github.com/salestech-group/MiroFish/issues/11
- `gap-analysis.md` — bucketed audit baseline
- `requirements.md` — EARS acceptance criteria for this spec

View File

@ -0,0 +1,24 @@
{
"feature_name": "i18n-e2e-english-verification",
"created_at": "2026-05-07T18:25:18Z",
"updated_at": "2026-05-07T18:25:18Z",
"language": "en",
"phase": "tasks-generated",
"ticket": 10,
"ticket_url": "https://github.com/salestech-group/MiroFish/issues/10",
"approvals": {
"requirements": {
"generated": true,
"approved": true
},
"design": {
"generated": true,
"approved": true
},
"tasks": {
"generated": true,
"approved": true
}
},
"ready_for_implementation": true
}

View File

@ -0,0 +1,87 @@
# Tasks — i18n-e2e-english-verification
## 1. Foundation — audit workspace and entrypoint
- [x] 1.1 Create the audit script directory and the read-only orchestrator skeleton
- Establish `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` with a `run_audit.sh` skeleton that uses `set -euo pipefail`.
- The orchestrator captures HEAD sha (`git rev-parse HEAD`) and creates `.kiro/specs/i18n-e2e-english-verification/audit/<sha>/` as the artefact root.
- Observable completion: running `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh` from repo root creates an empty `audit/<sha>/` directory and exits `0`.
- _Requirements: 1.4, 7.3, 8.1, 8.2, 8.3, 8.4_
- _Boundary: run_audit.sh_
## 2. Core — read-only audit producers
- [x] 2.1 (P) Implement the canonical CJK grep with PCRE
- `audit_cjk.sh` runs `git grep -nP '[\x{4e00}-\x{9fff}]' -- backend/app frontend/src locales/en.json` and writes the raw output to `<sha>/cjk-grep.txt`.
- Produces a partitioned `<sha>/cjk-grep-bucketed.txt` with one section per top-level path (`backend/app`, `frontend/src`, `locales/en.json`).
- Excludes binary file matches (e.g. `.jpeg`) by skipping paths whose `git check-attr` reports `binary` (or by file-extension allowlist if check-attr is unset).
- Observable completion: `<sha>/cjk-grep.txt` contains exactly the same lines as a manual `git grep -nP …` run, and `<sha>/cjk-grep-bucketed.txt` has the three labelled sections with line counts.
- _Requirements: 1.1, 1.5_
- _Boundary: audit_cjk.sh_
- [x] 2.2 (P) Implement the locale-catalogue parity diff
- `check_parity.py` loads `locales/en.json` and `locales/zh.json`, recursively flattens nested-dict keys with dotted paths, and writes `<sha>/parity.txt` with three labelled blocks: `[missing-keys]`, `[cjk-in-en]`, `[identical-values]`.
- The `[identical-values]` block flags entries only when EN value equals ZH value AND the value is non-empty AND has more than two ASCII words.
- Observable completion: `<sha>/parity.txt` exists; on the current tree `[missing-keys]` is empty and `[cjk-in-en]` is empty (matching the gap-analysis baseline).
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 6.1, 6.3_
- _Boundary: check_parity.py_
- [x] 2.3 Implement the four-class classifier
- `classify.py` consumes `<sha>/cjk-grep.txt` and `<sha>/parity.txt` and writes `<sha>/classified.csv` with columns `file,line,match,class,category,pipeline_step`.
- Implements the closed-set rules from design.md "classify.py": `locales/en.json` CJK → `gap`/`catalogue-parity`; `frontend/src/{views,components}/*.vue` string literal → `gap`/`frontend-ui-string`; `text.match(/.../)` regex pattern with CJK → `gap`/`frontend-regex-parser`; `.py` line starting with `#` or inside a triple-quoted block → `deliberate`/`backend-{comment,docstring}`; `.py` `logger.|log.|print(` line with CJK in a string literal → `gap`/`backend-log` with appropriate step tag; `.py` LLM-prompt label in `services/{ontology,oasis_profile,simulation_config,report_agent}_generator.py``gap`/`backend-prompt-label`; binary file → `non-applicable`/`binary-false-positive`; everything else → `review-needed`.
- Asserts row-count equality with the input grep (no silent drops).
- Observable completion: `<sha>/classified.csv` row count == `cjk-grep.txt` line count, and at least one row of each non-empty class is present (verified by counting per-class rows in stdout summary).
- _Requirements: 1.2, 1.3, 1.5, 3.1, 3.2, 3.3, 3.4, 4.3, 4.4, 6.4_
- _Boundary: classify.py_
- _Depends: 2.1, 2.2_
## 3. Core — report assembly
- [x] 3.1 Render the gap report and the issue-#10 comment body
- `render_report.py` reads `<sha>/classified.csv` and `.ticket/10.md`; writes `<sha>/gap-report.md` (with the seven sections from design.md) and `<sha>/comment-body.md` (mirroring the issue's checklist with `pass`/`gap`/`manual-pending` per line + a "How to re-run" footer + a `Run on commit <sha>` header).
- Section 4 of `gap-report.md` enumerates the four propagation boundaries and reports each as `pass`/`gap`/`unknown`, with file:line evidence drawn from `classified.csv`.
- Section 5 maps every checklist item from `.ticket/10.md` to a `pass` / `gap` / `manual-pending` status. UI-checklist items default to `manual-pending` (live walkthrough not feasible in sandbox) and include a concrete reproduction script.
- Always writes the four follow-up issue body templates to `<sha>/PENDING-followups/`: `01-frontend-ui-strings.md`, `02-backend-log-strings.md`, `03-backend-prompt-labels.md`, `04-permanent-ci-guard.md` — empty placeholder if the corresponding category had zero `gap` rows.
- Observable completion: `<sha>/gap-report.md`, `<sha>/comment-body.md`, and `<sha>/PENDING-followups/01..04-*.md` all exist; opening `<sha>/comment-body.md` shows every checkbox from `.ticket/10.md` mapped to a status.
- _Requirements: 4.1, 4.2, 5.1, 5.2, 5.3, 5.4, 6.2_
- _Boundary: render_report.py_
## 4. Integration — orchestrator and GitHub side effects
- [x] 4.1 Wire run_audit.sh to the four producer steps and add the GitHub posting hooks
- `run_audit.sh` invokes (in order) `audit_cjk.sh`, `check_parity.py`, `classify.py`, `render_report.py`, then `post_comment.sh` and `file_followups.sh`.
- On any error in steps 1-4 the orchestrator aborts (`set -euo pipefail`) before any subsequent step runs.
- On `gh` failure in steps 5 or 6, the orchestrator continues to the next step but exits `2` at the end (audit succeeded, side effects didn't fully apply).
- Observable completion: a clean run on the current tree creates a complete `<sha>/` directory; if `gh` is forced absent (e.g. `PATH=$(pwd)/empty bash run_audit.sh`), the orchestrator still produces all four producer artefacts and the `PENDING-followups/` and exits with `2`.
- _Requirements: 1.4, 7.3, 8.1, 8.2, 8.3, 8.4_
- _Boundary: run_audit.sh_
- _Depends: 2.3, 3.1_
- [x] 4.2 Implement post_comment.sh and file_followups.sh with PENDING fallback
- `post_comment.sh` calls `gh issue comment 10 --repo salestech-group/MiroFish --body-file <sha>/comment-body.md`; on failure it copies the body to `<sha>/PENDING-issue-10-comment.md` and exits non-zero. On success it writes the resulting URL to `<sha>/comment-url.txt`.
- `file_followups.sh` iterates `<sha>/PENDING-followups/*.md`; for each non-empty body it calls `gh issue create --repo salestech-group/MiroFish --title <title-from-body-first-line> --body-file <body> --label i18n` (and `--label bug` when the body's frontmatter declares regression). On per-category failure it leaves that body in place; on success it removes the body and appends the issue URL to `<sha>/followup-urls.txt`.
- Observable completion: with `gh` available, the comment URL appears in `<sha>/comment-url.txt` and any non-empty follow-up body produces an issue URL in `<sha>/followup-urls.txt`; with `gh` absent, both bodies stay under `<sha>/PENDING-*` and exit codes are non-zero.
- _Requirements: 5.5, 7.1, 7.2, 7.4, 7.5_
- _Boundary: post_comment.sh, file_followups.sh_
- _Depends: 3.1_
## 5. Validation — execute the verification pass
- [x] 5.1 Execute the audit on the current tree and capture a baseline run
- Run `bash .kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh` from repo root.
- Confirm `<sha>/cjk-grep.txt`, `cjk-grep-bucketed.txt`, `parity.txt`, `classified.csv`, `gap-report.md`, `comment-body.md`, and `PENDING-followups/01..04-*.md` all exist and are non-empty (the placeholders for empty categories may be empty by design).
- Confirm `parity.txt` `[missing-keys]` and `[cjk-in-en]` blocks are empty (matches the gap-analysis baseline).
- Confirm `classified.csv` row count matches `cjk-grep.txt` line count exactly.
- Observable completion: the baseline `<sha>/` directory is committed under `.kiro/specs/i18n-e2e-english-verification/audit/`.
- _Requirements: 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 8.1, 8.3_
- _Boundary: run_audit.sh and producer scripts_
- _Depends: 4.1_
- [x] 5.2 Post the comment on issue #10 and file the follow-up issues
- Run `post_comment.sh <sha-dir>` and `file_followups.sh <sha-dir>` (or rely on `run_audit.sh` to invoke them) so the verification report comment is posted and follow-up issues are filed for non-empty categories.
- Capture `comment-url.txt` and `followup-urls.txt` under `<sha>/` so the PR description can link to them.
- If `gh` lacks permissions for any of the calls, the corresponding `PENDING-*` file is left in place per R7.5; the run summary surfaces the partial state.
- Observable completion: a comment appears on https://github.com/salestech-group/MiroFish/issues/10 mirroring `comment-body.md`; follow-up issues for non-empty categories exist and carry the `i18n` label.
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 6.4, 7.1, 7.2, 7.4, 7.5_
- _Boundary: post_comment.sh, file_followups.sh_
- _Depends: 4.2, 5.1_