MicroFish/.kiro/specs/i18n-e2e-english-verification/requirements.md

13 KiB
Raw Blame History

Requirements Document

Project Description (Input)

Issue #10: i18n end-to-end verification of full pipeline. Run a verification pass to prove the entire 5-step pipeline (Graph Build, Env Setup, Simulation, Report, Interaction) works cleanly in English, with locale propagating across Flask routes, background tasks, OASIS subprocess, Graphiti/Neo4j, and the ReACT report agent. Produce a verification report (posted as a comment on issue #10) summarising pass/fail per checklist item and listing any leftover Chinese strings as file:line refs. Run the static audit git grep -nE "[\\x{4e00}-\\x{9fff}]" -- backend/app frontend/src locales/en.json and confirm only deliberately-kept Chinese remains. File any newly discovered gaps as follow-up issues (do NOT patch silently in this ticket). Acceptance: all checklist items pass for both EN and ZH; report posted; no surprise Chinese in EN paths. Out of scope: fixing newly discovered gaps inline; perf/load testing; new locales beyond EN/ZH.

Introduction

This spec covers the final verification pass for the i18n epic (#11). After issues #2#9, #12 land, the entire 5-step MiroFish pipeline must demonstrably run in English — UI, background work, LLM-generated artifacts (ontologies, agent profiles, sim configs, reports, chat replies), and backend logs — without any unintended Chinese leaking into English-locale paths. The pass also regression-checks that switching locale back to Chinese still produces fully Chinese output. Because the pipeline crosses a Flask app, background Task workers, an OASIS subprocess, Graphiti/Neo4j, and a ReACT report agent, the verification has both a static (grep + locale-file) component and a dynamic (live walkthrough of Step 1 → 5) component.

The deliverables are: (a) a static audit + categorization of any remaining Chinese strings under English paths, (b) a verification report posted as a comment on issue #10 summarising pass/fail per checklist item with file:line evidence, and (c) follow-up GitHub issues for every gap found — fixes are explicitly out of scope here.

Boundary Context

  • In scope:
    • Static audit (git grep for CJK Unified Ideographs) of backend/app/, frontend/src/, and locales/en.json.
    • Inspection of locale catalogues (locales/en.json, locales/zh.json) for parity, key coverage, and accidental Chinese in the EN catalogue.
    • Inspection of LLM-prompt assets that drive Step 15 outputs (ontology, profile, sim-config, report-agent prompts) to confirm they emit English under EN locale.
    • Inspection of locale propagation paths: HTTP request → Flask handler → Task background worker → OASIS subprocess → ReACT agent.
    • Verification report posted as a comment on issue #10.
    • Follow-up issues filed for every gap found.
  • Out of scope:
    • Fixing any newly discovered gaps inline in this ticket — they are filed as separate issues.
    • Performance or load testing.
    • Adding new locales beyond EN/ZH.
    • The live UI walkthrough with screenshots, when no human or browser is available — the static audit results plus prompt/locale-catalogue evidence stand in. The verification report explicitly marks UI-only checklist items as "manual-pending" if not run live.
  • Adjacent expectations:
    • Closes the i18n epic #11 once #12 also lands.
    • Depends on (and re-verifies) the work in #2, #3, #4, #5, #6, #8, #9, #12.

Requirements

Requirement 1: Static CJK audit of English code paths

Objective: As an i18n verifier, I want a deterministic grep-based audit of files that should be English-only, so that any Chinese leaking into the EN-locale code path is detected and recorded.

Acceptance Criteria

  1. The Verification System shall execute git grep -nE "[\x{4e00}-\x{9fff}]" -- backend/app frontend/src locales/en.json and capture every match with file:line precision.
  2. The Verification System shall classify each match as one of: (a) deliberate (e.g. test fixture demonstrating ZH input, doc example, comment explicitly retained per project convention), (b) gap (unintended Chinese in EN-facing code), or (c) non-applicable (false positive such as a regex character class).
  3. When a match is classified as gap, the Verification System shall record file:line, the Chinese substring, and the affected pipeline step (Graph Build / Env Setup / Simulation / Report / Interaction / Logs / UI).
  4. The Verification System shall not modify any matched file as part of this audit; remediation is filed as a follow-up issue per Requirement 7.
  5. While the audit is running, the Verification System shall additionally inspect locales/en.json for entries whose value contains CJK characters and report those separately (an EN catalogue value containing Chinese is always a gap).

Requirement 2: Locale catalogue parity check

Objective: As an i18n verifier, I want to confirm that the EN and ZH catalogues stay in lockstep, so that switching locale never falls back to a missing key or leaks the other locale.

Acceptance Criteria

  1. The Verification System shall enumerate the key set of locales/en.json and locales/zh.json (recursively across nested objects) and compute the symmetric difference.
  2. If a key is present in en.json but missing from zh.json (or vice versa), the Verification System shall record the missing key path and treat it as a gap.
  3. If any value in en.json contains a CJK character, the Verification System shall record it as a gap (as in Requirement 1.5).
  4. If any value in zh.json is identical to its en.json counterpart and the EN value is non-trivial English prose (more than two ASCII words), the Verification System shall flag it as a candidate untranslated entry — these are reported as review-needed, not auto-classified gap, since some technical terms (URLs, identifiers, single tokens) legitimately stay identical.
  5. The Verification System shall not edit either catalogue file as part of this check.

Requirement 3: LLM-prompt locale verification

Objective: As an i18n verifier, I want to confirm that every LLM prompt that drives a Step 15 output respects the requested locale, so that ontology entries, agent profiles, simulation configs, report prose, and chat replies render in the user's selected language.

Acceptance Criteria

  1. The Verification System shall enumerate the prompt files that produce user-visible output for Steps 15 (e.g. ontology generator, OASIS profile generator, simulation-config generator, report agent prompts, interview chat).
  2. For each prompt file, the Verification System shall confirm that it either (a) is fully English with an explicit "respond in ${locale}" directive, or (b) is rendered through a locale-aware template that injects the active locale.
  3. If a prompt file hard-codes a Chinese-only directive (e.g. "请用中文回答") on the EN code path, the Verification System shall record it as a gap.
  4. The Verification System shall confirm that the prompt files referenced by issues #3, #4, #5 are no longer Chinese-only post-merge; if any still are, they are recorded as gap blocking #10.

Requirement 4: Locale propagation surface review

Objective: As an i18n verifier, I want to confirm that the active locale survives every process boundary, so that an EN request still produces EN output after it crosses into a Task worker, the OASIS subprocess, or the ReACT agent.

Acceptance Criteria

  1. The Verification System shall identify each handoff boundary: HTTP → Flask handler, Flask handler → Task worker, Task worker → OASIS subprocess, ReACT agent → tool calls.
  2. For each handoff, the Verification System shall confirm that the locale is either (a) carried explicitly in the call payload / kwargs, or (b) re-derived deterministically (e.g. from per-project config, Accept-Language header, or set_locale thread-local equivalent) on the receiving side.
  3. If a boundary discards the locale and the receiving side defaults silently to Chinese (or any non-EN locale) under an EN request, the Verification System shall record the boundary as a gap.
  4. The Verification System shall examine the backend logger to confirm that log messages on the EN code path resolve to English templates (depends on #6).

Requirement 5: Verification report comment on issue #10

Objective: As the issue owner, I want a single canonical verification report posted as a comment on issue #10, so that reviewers can see pass/fail per checklist item and trace every gap to a file:line and a follow-up issue.

Acceptance Criteria

  1. When the static audit, parity check, prompt verification, and propagation review are complete, the Verification System shall compose a markdown comment on issue #10 that lists every checklist item from the ticket body with one of the statuses pass / gap / manual-pending.
  2. For each gap status, the comment shall include file:line references and a link to the follow-up issue filed per Requirement 7.
  3. For each manual-pending status, the comment shall state explicitly that the item requires a live UI walkthrough (or full-stack run) which was not performed in this verification environment, and shall list the exact reproduction steps the next reviewer needs to run.
  4. The comment shall include the raw output (or a path to the captured output) of the git grep audit so future verifiers can diff against the baseline.
  5. The Verification System shall post the comment using gh issue comment 10 --repo salestech-group/MiroFish and shall record the resulting comment URL in the spec / commit message.

Requirement 6: ZH regression check

Objective: As an i18n verifier, I want to confirm that the ZH locale still renders fully Chinese, so that the EN work has not regressed the original-language experience.

Acceptance Criteria

  1. The Verification System shall confirm that locales/zh.json covers every key present in locales/en.json (Requirement 2) so that no UI string falls back to English under ZH.
  2. The Verification System shall confirm that prompts rendered through locale-aware templates produce a Chinese variant when locale=zh (i.e. the templating mechanism is symmetric between EN and ZH).
  3. If a UI string is English-only under ZH (i.e. zh.json value is identical to the EN value and the value is non-trivial English prose), the Verification System shall flag it per Requirement 2.4 as review-needed.
  4. The Verification System shall record any ZH-specific regression as a separate gap and file a follow-up issue per Requirement 7.

Requirement 7: Follow-up issues for every discovered gap

Objective: As the project owner, I want every gap discovered in this verification pass tracked as its own GitHub issue, so that fixes are sequenced separately and #10 stays scoped to verification only.

Acceptance Criteria

  1. When a gap is recorded by Requirements 16, the Verification System shall file a GitHub issue against salestech-group/MiroFish containing: a one-sentence summary, the affected pipeline step, the file:line evidence, and a link back to issue #10 and to the verification report comment.
  2. If grouping is sensible (e.g. five gaps in a single locale-catalogue file), the Verification System shall consolidate them into a single follow-up issue with a checklist body, instead of filing five micro-issues.
  3. The Verification System shall not patch any gap inline in this ticket; the spec change-set must be limited to the verification artefacts (spec docs + report capture under .kiro/specs/i18n-e2e-english-verification/) and must not modify production source files under backend/app/, frontend/src/, or locales/.
  4. The Verification System shall label every follow-up issue with the i18n label (and bug if the gap is regressing existing behaviour) so they aggregate under the i18n epic.
  5. If the verification environment cannot file issues (e.g. no gh permissions), the Verification System shall list the would-be issues inline in the verification report as a fallback so a human can file them, and shall mark the corresponding checklist item gap-pending-issue instead of gap.

Requirement 8: Reproducibility and idempotence

Objective: As a future verifier, I want this verification pass to be re-runnable, so that we can re-baseline after each subsequent merge to the i18n epic.

Acceptance Criteria

  1. The Verification System shall capture the raw audit output to .kiro/specs/i18n-e2e-english-verification/audit/ so the next verifier can diff against the previous run.
  2. While a previous capture exists, the Verification System shall preserve it (timestamped or under a previous/ subdirectory) rather than overwriting it silently.
  3. The Verification System shall record the commit SHA at the time of the audit so the report comment can be tied to a specific tree state.
  4. If the audit is re-run and the gap set is unchanged, the Verification System shall produce a no-op report comment that confirms parity rather than spamming a new gap list.