9.4 KiB

Raw Blame History

Research & Design Decisions — i18n-e2e-english-verification

Summary

Feature: i18n-e2e-english-verification
Discovery Scope: Extension (verification-only against existing i18n surface)
Key Findings:
- locales/en.json is already CJK-clean (0 hits) and locales/zh.json is at perfect parity (953/953 keys).
- Bulk of remaining CJK is in backend Python source (~26 files across services/, api/, utils/, models/) — overwhelmingly docstrings, comments, and a non-trivial number of log strings + LLM-prompt context labels. This is blocked by issue #7 (translate Chinese docstrings/comments).
- Frontend Process.vue still has ~65 hard-coded Chinese strings in template/JS literals (not routed through t() keys); 4 step components have a smaller surface (mainly Step4Report's regex parsers that match Chinese backend output).
- Live UI/full-stack walkthrough is not feasible in this sandboxed CLI environment — that portion of the verification will be reported as manual-pending with reproduction steps.

Research Log

Audit baseline

Context: R1 requires running the canonical git grep audit and bucketing the matches.
Sources consulted: ripgrep / git grep -P against the working tree at 9dcaecd (HEAD of docs/i18n-9-translate-frontend-comments).
Findings:
- Total CJK lines: 2918 across 36 files (counting 2 binary .jpeg false positives that ripgrep matches when scanning the assets folder).
- Bucket distribution: locales/en.json 0 / frontend/src 7 files (5 source + 2 binary) / backend/app 29 files.
- The shell-style regex [\x{4e00}-\x{9fff}] in the issue body must be passed to git grep with -P (PCRE) — POSIX ERE rejects \x{...} ranges. The verification scripts must use -P or document the deviation.
Implications: The audit script must use PCRE; binary files should be excluded explicitly so the .jpeg false positives do not pollute the gap report.

Locale-catalogue parity

Context: R2 demands key-set parity between en.json and zh.json.
Sources consulted: small Python diff over the catalogues (recursive nested-dict key flattening).
Findings: 953 keys each, symmetric difference 0. Already passing.
Implications: R2.1, R2.2 will trivially pass; R2.4 (untranslated-but-identical entries) still needs running.

Locale propagation surface

Context: R4 requires confirming that locale survives Flask handler → Task → OASIS subprocess → ReACT agent.
Sources consulted: backend/app/api/graph.py, backend/app/services/ skim, CLAUDE.md (mentions set_locale thread-local).
Findings:
- backend/app/api/graph.py line 385 etc still emit Chinese log strings inline (build_logger.info(f"[{task_id}] 开始构建图谱...")) — the log externalisation work (#6) didn't reach these call sites.
- backend/app/utils/retry.py log strings are still hard-coded Chinese (logger.error(f"函数 {func.__name__} ...")).
- oasis_profile_generator.py LLM-prompt context labels ("事实信息:", "相关实体:") feed into the agent prompt verbatim — these will bias the LLM toward Chinese output even under EN locale.
Implications: R4.3 (locale discarded silently → defaults non-EN) has live evidence; multiple gap items will be filed.

Architecture Pattern Evaluation

Option	Description	Strengths	Risks / Limitations	Notes
Pure shell + Python script (Option A)	One-shot scripts in `.kiro/specs/.../audit/scripts/` produce `audit/<sha>/*.txt` and `audit/<sha>/gap-report.md`	Simplest; no production-code touch; easy to re-run; fits R8 capture format	Scoped to this ticket — not a permanent CI guard	Selected
Reusable `tools/i18n-audit/` CLI (Option B)	Promote the audit to a permanent project tool wired into CI	Long-term safety net; future PRs would fail on regressions	Out of scope per R7.3 (verification-only); adds new top-level directory	Filed as a follow-up issue, not implemented here
Hybrid (Option C)	Run Option A now; file an issue requesting Option B as future work	Captures B's value without bloating this PR	None material	Adopted

Design Decisions

Decision: Audit lives entirely under `.kiro/specs/i18n-e2e-english-verification/`

Context: R7.3 forbids modifying production source in this ticket; the verification artefacts (scripts and captures) need a home.
Alternatives considered:
1. Top-level tools/i18n-audit/ — rejected (creates a long-lived asset out of a one-shot ticket).
2. scripts/ next to existing project scripts — rejected (project has no convention for verification scripts; .kiro/specs/ is the canonical home for spec-scoped work).
3. .kiro/specs/.../audit/ — selected.
Selected approach: Scripts at .kiro/specs/i18n-e2e-english-verification/audit/scripts/ and outputs at .kiro/specs/.../audit/<commit-sha>/.
Rationale: Co-locates spec, requirements, design, and the artefacts a future verifier needs to re-run the pass. Honours the steering rule that the spec dir is the source of truth for spec-scoped state.
Trade-offs: Scripts aren't reused beyond this ticket. Re-runs require checking out the spec dir (which is committed).
Follow-up: File a follow-up issue suggesting Option B (a permanent CI guard) for the next iteration of the i18n epic.

Decision: Manual UI walkthrough → `manual-pending`, not `gap`

Context: R5.3 already permits manual-pending when a checklist item requires running the live stack. This run is sandboxed CLI — no browser, no Docker.
Alternatives considered:
1. Mark UI items gap because they weren't proven — rejected (a gap is a known failure; UI items are simply untested in this run).
2. Skip them silently — rejected (R5.1 requires every checklist item to have a status).
3. Mark manual-pending with reproduction steps — selected.
Rationale: Honest about the verification environment's limits. Future verifiers can flip manual-pending to pass or gap after running the live walkthrough.
Trade-offs: Issue #10 cannot be fully closed by this run alone; the verification-pass comment will say so explicitly.

Decision: Gap classification = (deliberate / gap / non-applicable / review-needed)

Context: R1.2 lists three classes; R2.4 introduces a fourth (review-needed).
Alternatives considered:
1. Three-class only — rejected (forces premature decisions on identical en/zh values).
2. Four-class with explicit semantics — selected.
Rationale: A four-class scheme keeps the gap count truthful (it counts only known-bad lines), and review-needed is a soft signal that a human should re-check.
Trade-offs: Slightly more complex schema; mitigated by documenting the four labels at the top of gap-report.md.

Decision: Follow-up grouping by category, not by file

Context: R7.2 allows consolidation. There are too many CJK-bearing files (29) to file one issue each.
Alternatives considered:
1. One issue per file — rejected (29 micro-issues).
2. One issue per pipeline step (R1.3 step tag) — feasible but cross-cuts existing per-component issues like #7.
3. One issue per gap category — selected: (a) frontend hard-coded UI strings, (b) backend log strings, (c) backend LLM-prompt context labels, (d) recommend a permanent CI check.
Rationale: Categories already align with how the i18n epic broke down work (#3, #4, #5, #6 = LLM-prompts; #7 = docstrings/comments; #9 = frontend comments). Categories also map cleanly to single PRs, which is how subsequent fixes will land.
Trade-offs: Some files appear in multiple categories. Mitigated by listing file:line evidence inside each category issue.

Decision: Issue-comment fallback when `gh` is unavailable

Context: R7.5 mandates a fallback if gh permissions are missing.
Selected approach: If gh posts fail, the script writes the comment body to audit/<sha>/PENDING-issue-10-comment.md and the would-be follow-up issue bodies to audit/<sha>/PENDING-followups/*.md so a human can paste them.
Rationale: Keeps the audit re-runnable offline; keeps the artefact set faithful to what would have been posted.
Trade-offs: Verification doesn't truly close until a human posts. Surfaced loudly in the run-summary.

Risks & Mitigations

Risk: A gap is mis-classified as non-applicable (e.g. a regex character class versus a real Chinese label) → Mitigation: classification tracked in a small CSV alongside the raw grep, so re-classification is auditable.
Risk: gh rate limits hit when filing follow-ups → Mitigation: file at most 4 follow-ups (one per category) — far below any rate limit.
Risk: Re-running the audit on a divergent branch produces a noisy diff → Mitigation: audit/<commit-sha>/ directories preserve history; comparison is opt-in via diff -ru.
Risk: Live walkthrough never happens, leaving #10 in manual-pending indefinitely → Mitigation: the verification report comment names a concrete "next reviewer" reproduction script; manual-pending items have explicit acceptance criteria.

References

Issue #10 — https://github.com/salestech-group/MiroFish/issues/10
Epic #11 — https://github.com/salestech-group/MiroFish/issues/11
gap-analysis.md — bucketed audit baseline
requirements.md — EARS acceptance criteria for this spec

9.4 KiB Raw Blame History

Research & Design Decisions — i18n-e2e-english-verification

Summary

Research Log

Audit baseline

Locale-catalogue parity

Locale propagation surface

Architecture Pattern Evaluation

Design Decisions

Decision: Audit lives entirely under .kiro/specs/i18n-e2e-english-verification/

Decision: Manual UI walkthrough → manual-pending, not gap

Decision: Gap classification = (deliberate / gap / non-applicable / review-needed)

Decision: Follow-up grouping by category, not by file

Decision: Issue-comment fallback when gh is unavailable

Risks & Mitigations

References

9.4 KiB

Raw Blame History

Decision: Audit lives entirely under `.kiro/specs/i18n-e2e-english-verification/`

Decision: Manual UI walkthrough → `manual-pending`, not `gap`

Decision: Issue-comment fallback when `gh` is unavailable