9.4 KiB
9.4 KiB
Research & Design Decisions — i18n-e2e-english-verification
Summary
- Feature:
i18n-e2e-english-verification - Discovery Scope: Extension (verification-only against existing i18n surface)
- Key Findings:
locales/en.jsonis already CJK-clean (0 hits) andlocales/zh.jsonis at perfect parity (953/953 keys).- Bulk of remaining CJK is in backend Python source (~26 files across
services/,api/,utils/,models/) — overwhelmingly docstrings, comments, and a non-trivial number of log strings + LLM-prompt context labels. This is blocked by issue #7 (translate Chinese docstrings/comments). - Frontend
Process.vuestill has ~65 hard-coded Chinese strings in template/JS literals (not routed throught()keys); 4 step components have a smaller surface (mainly Step4Report's regex parsers that match Chinese backend output). - Live UI/full-stack walkthrough is not feasible in this sandboxed CLI environment — that portion of the verification will be reported as
manual-pendingwith reproduction steps.
Research Log
Audit baseline
- Context: R1 requires running the canonical
git grepaudit and bucketing the matches. - Sources consulted: ripgrep /
git grep -Pagainst the working tree at9dcaecd(HEAD ofdocs/i18n-9-translate-frontend-comments). - Findings:
- Total CJK lines: 2918 across 36 files (counting 2 binary
.jpegfalse positives that ripgrep matches when scanning the assets folder). - Bucket distribution:
locales/en.json0 /frontend/src7 files (5 source + 2 binary) /backend/app29 files. - The shell-style regex
[\x{4e00}-\x{9fff}]in the issue body must be passed togit grepwith-P(PCRE) — POSIX ERE rejects\x{...}ranges. The verification scripts must use-Por document the deviation.
- Total CJK lines: 2918 across 36 files (counting 2 binary
- Implications: The audit script must use PCRE; binary files should be excluded explicitly so the
.jpegfalse positives do not pollute the gap report.
Locale-catalogue parity
- Context: R2 demands key-set parity between
en.jsonandzh.json. - Sources consulted: small Python diff over the catalogues (recursive nested-dict key flattening).
- Findings: 953 keys each, symmetric difference 0. Already passing.
- Implications: R2.1, R2.2 will trivially pass; R2.4 (untranslated-but-identical entries) still needs running.
Locale propagation surface
- Context: R4 requires confirming that locale survives Flask handler →
Task→ OASIS subprocess → ReACT agent. - Sources consulted:
backend/app/api/graph.py,backend/app/services/skim, CLAUDE.md (mentionsset_localethread-local). - Findings:
backend/app/api/graph.pyline 385 etc still emit Chinese log strings inline (build_logger.info(f"[{task_id}] 开始构建图谱...")) — the log externalisation work (#6) didn't reach these call sites.backend/app/utils/retry.pylog strings are still hard-coded Chinese (logger.error(f"函数 {func.__name__} ...")).oasis_profile_generator.pyLLM-prompt context labels ("事实信息:","相关实体:") feed into the agent prompt verbatim — these will bias the LLM toward Chinese output even under EN locale.
- Implications: R4.3 (locale discarded silently → defaults non-EN) has live evidence; multiple
gapitems will be filed.
Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|---|---|---|---|---|
| Pure shell + Python script (Option A) | One-shot scripts in .kiro/specs/.../audit/scripts/ produce audit/<sha>/*.txt and audit/<sha>/gap-report.md |
Simplest; no production-code touch; easy to re-run; fits R8 capture format | Scoped to this ticket — not a permanent CI guard | Selected |
Reusable tools/i18n-audit/ CLI (Option B) |
Promote the audit to a permanent project tool wired into CI | Long-term safety net; future PRs would fail on regressions | Out of scope per R7.3 (verification-only); adds new top-level directory | Filed as a follow-up issue, not implemented here |
| Hybrid (Option C) | Run Option A now; file an issue requesting Option B as future work | Captures B's value without bloating this PR | None material | Adopted |
Design Decisions
Decision: Audit lives entirely under .kiro/specs/i18n-e2e-english-verification/
- Context: R7.3 forbids modifying production source in this ticket; the verification artefacts (scripts and captures) need a home.
- Alternatives considered:
- Top-level
tools/i18n-audit/— rejected (creates a long-lived asset out of a one-shot ticket). scripts/next to existing project scripts — rejected (project has no convention for verification scripts;.kiro/specs/is the canonical home for spec-scoped work)..kiro/specs/.../audit/— selected.
- Top-level
- Selected approach: Scripts at
.kiro/specs/i18n-e2e-english-verification/audit/scripts/and outputs at.kiro/specs/.../audit/<commit-sha>/. - Rationale: Co-locates spec, requirements, design, and the artefacts a future verifier needs to re-run the pass. Honours the steering rule that the spec dir is the source of truth for spec-scoped state.
- Trade-offs: Scripts aren't reused beyond this ticket. Re-runs require checking out the spec dir (which is committed).
- Follow-up: File a follow-up issue suggesting Option B (a permanent CI guard) for the next iteration of the i18n epic.
Decision: Manual UI walkthrough → manual-pending, not gap
- Context: R5.3 already permits
manual-pendingwhen a checklist item requires running the live stack. This run is sandboxed CLI — no browser, no Docker. - Alternatives considered:
- Mark UI items
gapbecause they weren't proven — rejected (agapis a known failure; UI items are simply untested in this run). - Skip them silently — rejected (R5.1 requires every checklist item to have a status).
- Mark
manual-pendingwith reproduction steps — selected.
- Mark UI items
- Rationale: Honest about the verification environment's limits. Future verifiers can flip
manual-pendingtopassorgapafter running the live walkthrough. - Trade-offs: Issue #10 cannot be fully closed by this run alone; the verification-pass comment will say so explicitly.
Decision: Gap classification = (deliberate / gap / non-applicable / review-needed)
- Context: R1.2 lists three classes; R2.4 introduces a fourth (
review-needed). - Alternatives considered:
- Three-class only — rejected (forces premature decisions on identical en/zh values).
- Four-class with explicit semantics — selected.
- Rationale: A four-class scheme keeps the
gapcount truthful (it counts only known-bad lines), andreview-neededis a soft signal that a human should re-check. - Trade-offs: Slightly more complex schema; mitigated by documenting the four labels at the top of
gap-report.md.
Decision: Follow-up grouping by category, not by file
- Context: R7.2 allows consolidation. There are too many CJK-bearing files (29) to file one issue each.
- Alternatives considered:
- One issue per file — rejected (29 micro-issues).
- One issue per pipeline step (R1.3 step tag) — feasible but cross-cuts existing per-component issues like #7.
- One issue per gap category — selected: (a) frontend hard-coded UI strings, (b) backend log strings, (c) backend LLM-prompt context labels, (d) recommend a permanent CI check.
- Rationale: Categories already align with how the i18n epic broke down work (#3, #4, #5, #6 = LLM-prompts; #7 = docstrings/comments; #9 = frontend comments). Categories also map cleanly to single PRs, which is how subsequent fixes will land.
- Trade-offs: Some files appear in multiple categories. Mitigated by listing
file:lineevidence inside each category issue.
Decision: Issue-comment fallback when gh is unavailable
- Context: R7.5 mandates a fallback if
ghpermissions are missing. - Selected approach: If
ghposts fail, the script writes the comment body toaudit/<sha>/PENDING-issue-10-comment.mdand the would-be follow-up issue bodies toaudit/<sha>/PENDING-followups/*.mdso a human can paste them. - Rationale: Keeps the audit re-runnable offline; keeps the artefact set faithful to what would have been posted.
- Trade-offs: Verification doesn't truly close until a human posts. Surfaced loudly in the run-summary.
Risks & Mitigations
- Risk: A
gapis mis-classified asnon-applicable(e.g. a regex character class versus a real Chinese label) → Mitigation: classification tracked in a small CSV alongside the raw grep, so re-classification is auditable. - Risk:
ghrate limits hit when filing follow-ups → Mitigation: file at most 4 follow-ups (one per category) — far below any rate limit. - Risk: Re-running the audit on a divergent branch produces a noisy diff → Mitigation:
audit/<commit-sha>/directories preserve history; comparison is opt-in viadiff -ru. - Risk: Live walkthrough never happens, leaving #10 in
manual-pendingindefinitely → Mitigation: the verification report comment names a concrete "next reviewer" reproduction script;manual-pendingitems have explicit acceptance criteria.
References
- Issue #10 — https://github.com/salestech-group/MiroFish/issues/10
- Epic #11 — https://github.com/salestech-group/MiroFish/issues/11
gap-analysis.md— bucketed audit baselinerequirements.md— EARS acceptance criteria for this spec