MicroFish/.kiro/specs/i18n-report-agent-prompts/research.md

14 KiB
Raw Blame History

Research & Design Decisions — i18n-report-agent-prompts

Summary

  • Feature: i18n-report-agent-prompts
  • Discovery Scope: Extension (single-file in-place translation, established sibling-spec precedent)
  • Key Findings:
    • The full LLM-message-stream Chinese surface in report_agent.py is ~2680 chars across 7 top-level prompt constants, 4 tool-description blocks, ~10 inline f-strings/templates inside _generate_section_react and chat, the _execute_tool error returns, and the plan_outline defaults — ~4× the ticket's "~609 char" estimate.
    • The four sibling i18n PRs (#2/#3/#4 commits 0806832, 9d1d29b, 6c2a412) established in-place translation in a single file as the pattern; reviewer expectations and PR shape are already locked in.
    • Two cross-cutting literals must be preserved byte-for-byte: the trigger string Final Answer: (matched by _generate_section_react line 1327) and the XML tag <tool_call> (matched by _parse_tool_calls regex). All translated prompts continue to reference these literals exactly.

Research Log

Topic: How does locale switching work today and what guarantees does it give?

  • Context: R9 of requirements depends on get_language_instruction() continuing to bias the model into the requested locale even after the base prompt is English.
  • Sources Consulted: backend/app/utils/locale.py; locales/languages.json; locales/en.json; locales/zh.json; sibling-spec gap-analysis for issue #4 (.kiro/specs/i18n-simulation-config-generator-prompts/gap-analysis.md).
  • Findings:
    • get_language_instruction() resolves locale from the Flask Accept-Language header (or thread-local in background threads) and returns a per-locale postfix string (Please respond in English. for en, 请使用中文回答。 for zh, etc.).
    • languages.json registers zh, en, es, fr, pt, ru, de. All non-zh postfixes are already in English.
    • Sibling spec #4 verified that an English-base prompt + 请使用中文回答。 postfix produces Chinese output of equivalent quality to the prior Chinese-base prompt for the simulation-config flow. The same mechanism applies here — there is no report-agent-specific locale path.
  • Implications: The translation does not need to touch locale.py or /locales/*. Preserving the three get_language_instruction() call sites verbatim is sufficient for R9.

Topic: Which literal trigger strings does the ReACT loop parser match?

  • Context: R2 acceptance criterion 6 and R4 acceptance criterion 4 require that translated prompts continue to use literal trigger strings the parser depends on.
  • Sources Consulted: backend/app/services/report_agent.py lines 10671126 (_parse_tool_calls, _is_valid_tool_call); 1327 (has_final_answer = "Final Answer:" in response); 1838, 1874 (chat regex <tool_call>.*?</tool_call>).
  • Findings:
    • Final Answer: is matched as a Python literal substring (case-sensitive). Translation must keep this English token byte-for-byte.
    • <tool_call> and </tool_call> are matched by re.search(r'<tool_call>(.*?)</tool_call>', response, re.DOTALL) (line 1080-ish, in _parse_tool_calls). Translation must keep these XML tags byte-for-byte.
    • _is_valid_tool_call accepts both {"name": ..., "parameters": ...} and {"tool": ..., "params": ...} shapes, normalizing to name/parameters. Translation does not affect this.
  • Implications: Translated prompts continue to instruct the model using the same literal example block; only the surrounding natural-language Chinese is rewritten in English.

Topic: Are there Chinese illustrations embedded inside the section system prompt that must also translate?

  • Context: SECTION_SYSTEM_PROMPT_TEMPLATE (615767) contains code-fenced "正确示例" / "错误示例" blocks with Chinese sample text. These are formatting-contract illustrations, not data.
  • Sources Consulted: report_agent.py lines 678703.
  • Findings:
    • The "正确示例" block (lines 678694) shows a sample paragraph using **bold**, > block quotes, and lists — Chinese text demonstrating the no-headings rule.
    • The "错误示例" block (lines 696703) shows wrong patterns (## 执行摘要, ### 一、首发阶段, etc.) with Chinese text labeled as errors.
    • These are illustrative only — the model uses them to internalize the format contract. Translating them to semantically equivalent English (sample paragraph about, e.g., a generic event using English bold/quotation/list patterns; wrong patterns showing English headings labeled as errors) preserves the contract.
  • Implications: The section system prompt translation must rewrite both example blocks in English while keeping the structural rule (use **bold**, >, lists; do not use #, ##, ###, ####).

Topic: Are there Chinese strings that flow through t(...) keys (vs raw literals)?

  • Context: R12 carves out logger.* calls already routed via t('...') (issue #6). Need to confirm we are not double-counting strings.
  • Sources Consulted: report_agent.py; locales/en.json; locales/zh.json.
  • Findings:
    • 47 of 48 logger.* calls in report_agent.py already use t('report.*') or t('progress.*') keys — those are out of scope.
    • One raw Chinese f-string remains: logger.debug(f"LLM响应: {response[:200]}...") at line 1322. This is a logger call (not a prompt string sent to the LLM). It belongs to issue #6, and leaving it untouched is consistent with the ticket boundary "logger calls in this file are covered by #6".
    • progress_callback(...) calls receive t('progress.*') localized strings — those flow to the frontend, not to the LLM, and are out of scope.
  • Implications: After translation, a single Chinese f-string in report_agent.py remains (line 1322 logger.debug). This is acceptable per the ticket's R12 carve-out.

Topic: How do consumers downstream of Report.to_dict() and chat() deal with localized output?

  • Context: R10 / R11 — preserving the public surface so the report API and the chat endpoint continue to work unchanged.
  • Sources Consulted: backend/app/api/report.py; frontend/src/api/report.js; frontend/src/components/Step5*.vue.
  • Findings:
    • The report API blueprint hands Report.to_dict() and the chat response payload to the frontend without locale-specific post-processing. The frontend renders report.title, report.summary, and report.sections[*].title/content as plain text/Markdown.
    • There are no string-equality checks against Chinese substrings on the consumer side. Translating the fallback outline to English is safe.
  • Implications: R8 (translate fallback outline) and R10 (preserve surface) are independently verifiable — no consumer-side adaptation required.

Architecture Pattern Evaluation

Option Description Strengths Risks / Limitations Notes
In-place translation (A) Edit Chinese string-literals in report_agent.py directly Matches precedent of #2/#3/#4; minimal blast radius; no new files Translations are baked in; non-zh/en locales still rely on postfix bias Selected — same pattern, same scope
Externalize to /locales/ (B) Move all prompt content to locales/en.json / locales/zh.json and resolve via t(...) Genuinely locale-agnostic; could later support es/fr/pt/ru/de natively Touches /locales/ (forbidden by R12); diverges from sibling pattern; brace-escape risk in JSON Rejected — breaks R12
Hybrid externalization (C) Externalize top-level constants; keep inline f-strings in code Captures largest blocks in localizable form Two-tier inconsistency; same R12 violation; no precedent Rejected — same R12 issue

Design Decisions

Decision: In-place translation in a single file

  • Context: Translate ~2680 Chinese characters across all LLM-facing string-literals in backend/app/services/report_agent.py.
  • Alternatives Considered:
    1. Externalize all prompts to /locales/*.json and resolve via t(...).
    2. Hybrid: externalize the seven top-level constants only.
  • Selected Approach: In-place translation in report_agent.py. Edit each string-literal directly. No new files, no new abstractions.
  • Rationale: Four sibling i18n PRs (issues #2/#3/#4) used the same pattern. Precedent is locked in; reviewer expectations are clear; PR shape is predictable. R12 explicitly forbids /locales/ edits.
  • Trade-offs: smallest blast radius, matches reviewer pattern. es/fr/pt/ru/de still rely on postfix bias (already true today; not a regression).
  • Follow-up: Run python -m py_compile backend/app/services/report_agent.py post-edit; run a regex sweep verifying zero Chinese chars in any LLM-facing string-literal (the line-1322 logger.debug is exempt — issue #6).

Decision: Translate embedded Chinese examples in SECTION_SYSTEM_PROMPT_TEMPLATE

  • Context: The section system prompt contains "正确示例" / "错误示例" code blocks (lines 678703) with Chinese sample text. R2 AC1 demands zero Chinese in any string-literal content.
  • Alternatives Considered:
    1. Drop the example blocks entirely (shorter prompt, less guidance for the model).
    2. Translate to semantically equivalent English illustrations.
    3. Keep Chinese examples and append an English translation in parallel.
  • Selected Approach: Translate to English. The "Correct Example" shows a sample paragraph about a generic scenario using **bold**, > block quotes, lists, and no headings. The "Wrong Example" shows wrong English headings (## Executive Summary, ### 1. First Stage, etc.) labeled as errors.
  • Rationale: The examples drive the model's understanding of the no-headings format contract. Removing them risks regressing format compliance. Parallel Chinese-then-English bloats the prompt and re-introduces Chinese tokens. English-only is the cleanest match for an English-base prompt.
  • Trade-offs: preserves format contract, single-language base prompt. slight prompt-length growth (negligible vs total context).
  • Follow-up: Spot-check a single end-to-end report run under Accept-Language: en to confirm the model still avoids Markdown headings in section bodies.

Decision: Switch "、".join(unused_tools) to ", ".join(...)

  • Context: Line 1454 currently does unused_tools_str = "、".join(unused_tools), where is the Chinese enumeration comma. This list flows into REACT_OBSERVATION_TEMPLATE and into the inline f-strings at lines 1380 and 1476.
  • Alternatives Considered:
    1. Keep "、" (Chinese punctuation).
    2. Switch to ", " (English-friendly).
    3. Keep "、" for zh, ", " for en (locale-conditional).
  • Selected Approach: Switch to ", " unconditionally.
  • Rationale: The join result is interpolated into the now-English ReACT templates. Keeping the Chinese enumeration comma in English context reads as a typo. Locale-conditional behavior here would re-introduce Chinese tokens into the message stream when zh is the locale (acceptable but inconsistent with the rest of the message). The model already follows get_language_instruction() for output, so the join punctuation does not need to localize.
  • Trade-offs: natural English rendering, single code path. a zh-locale developer reading the code might find the all-English separator slightly off — minor stylistic concern only.
  • Follow-up: None — this is a one-line change.

Decision: Standard English phrasing for recurring framing terms

  • Context: The Chinese prompts use recurring framing tokens that need consistent English equivalents. Inconsistent translations (e.g. "scenario" in one place, "brief" in another) hurt the prompt's coherence.
  • Selected Approach: Pick once, use everywhere:
    • 上帝视角 → "all-seeing observer's perspective" / "god's-eye view" (use the latter; shorter, more idiomatic)
    • 未来预演 → "forecast simulation" / "simulated future"
    • 模拟需求 → "simulation requirement" (matches the variable name simulation_requirement)
    • 上下文 → "context"
    • 章节 → "section"
    • 报告 → "report"
    • 大纲 → "outline"
    • 正确示例 → "Correct Example"
    • 错误示例 → "Wrong Example"
    • 重要 → "IMPORTANT"
    • 注意 → "Note"
    • 工具 → "tool"
    • 检索 → "retrieval"
    • 章节标题 → "section title"
    • 模拟世界 → "simulated world"
    • 引用 → "quote" / "quotation"
  • Rationale: Internal consistency lets the model build a coherent mental model of the task vocabulary. Aligning with variable names (e.g. simulation_requirement) reduces translation surface ambiguity.
  • Trade-offs: consistent vocabulary across translated regions. none.
  • Follow-up: None — list serves as a glossary for the implementer.

Risks & Mitigations

  • Risk: Translated section system prompt drops a structural cue the Chinese version was carrying, regressing zh quality. Mitigation: Preserve all interpolations, the JSON schema, the no-headings rule, the language-consistency rule, the format-contract examples (now in English), and the get_language_instruction() postfix. Spot-check a zh run if feasible.
  • Risk: A Chinese substring slips through (e.g. inside a hard-to-spot ReACT message). Mitigation: Run a regex sweep re.findall(r'[一-鿿]', source) after the edit; the only allowed remaining match is the line-1322 logger.debug Chinese f-string.
  • Risk: Reformatting SECTION_SYSTEM_PROMPT_TEMPLATE damages the literal <tool_call> example or shifts the Final Answer: token. Mitigation: Use targeted Edit replacements that preserve the surrounding code block; verify after edit that "Final Answer:" in response still triggers the parser branch.
  • Risk: The "、".join(...) separator change leaks into a Chinese-language render path. Mitigation: The separator only flows into ReACT templates that are already monolingually English in this PR; no zh-specific render path consumes it.

References

  • Issue #5 ticket body: .ticket/5.md.
  • Sibling spec: .kiro/specs/i18n-simulation-config-generator-prompts/{requirements,design,gap-analysis}.md.
  • Sibling commits: 0806832 (#2 ontology), 9d1d29b (#3 oasis profile), 6c2a412 (#4 simulation config).
  • Locale module: backend/app/utils/locale.py.
  • Locale registry: locales/languages.json, locales/en.json, locales/zh.json.