14 KiB
14 KiB
Research & Design Decisions — i18n-report-agent-prompts
Summary
- Feature:
i18n-report-agent-prompts - Discovery Scope: Extension (single-file in-place translation, established sibling-spec precedent)
- Key Findings:
- The full LLM-message-stream Chinese surface in
report_agent.pyis ~2680 chars across 7 top-level prompt constants, 4 tool-description blocks, ~10 inline f-strings/templates inside_generate_section_reactandchat, the_execute_toolerror returns, and theplan_outlinedefaults — ~4× the ticket's "~609 char" estimate. - The four sibling i18n PRs (#2/#3/#4 commits
0806832,9d1d29b,6c2a412) established in-place translation in a single file as the pattern; reviewer expectations and PR shape are already locked in. - Two cross-cutting literals must be preserved byte-for-byte: the trigger string
Final Answer:(matched by_generate_section_reactline 1327) and the XML tag<tool_call>(matched by_parse_tool_callsregex). All translated prompts continue to reference these literals exactly.
- The full LLM-message-stream Chinese surface in
Research Log
Topic: How does locale switching work today and what guarantees does it give?
- Context: R9 of requirements depends on
get_language_instruction()continuing to bias the model into the requested locale even after the base prompt is English. - Sources Consulted:
backend/app/utils/locale.py;locales/languages.json;locales/en.json;locales/zh.json; sibling-spec gap-analysis for issue #4 (.kiro/specs/i18n-simulation-config-generator-prompts/gap-analysis.md). - Findings:
get_language_instruction()resolves locale from the FlaskAccept-Languageheader (or thread-local in background threads) and returns a per-locale postfix string (Please respond in English.foren,请使用中文回答。forzh, etc.).languages.jsonregisterszh,en,es,fr,pt,ru,de. All non-zhpostfixes are already in English.- Sibling spec #4 verified that an English-base prompt +
请使用中文回答。postfix produces Chinese output of equivalent quality to the prior Chinese-base prompt for the simulation-config flow. The same mechanism applies here — there is no report-agent-specific locale path.
- Implications: The translation does not need to touch
locale.pyor/locales/*. Preserving the threeget_language_instruction()call sites verbatim is sufficient for R9.
Topic: Which literal trigger strings does the ReACT loop parser match?
- Context: R2 acceptance criterion 6 and R4 acceptance criterion 4 require that translated prompts continue to use literal trigger strings the parser depends on.
- Sources Consulted:
backend/app/services/report_agent.pylines 1067–1126 (_parse_tool_calls,_is_valid_tool_call); 1327 (has_final_answer = "Final Answer:" in response); 1838, 1874 (chat regex<tool_call>.*?</tool_call>). - Findings:
Final Answer:is matched as a Python literal substring (case-sensitive). Translation must keep this English token byte-for-byte.<tool_call>and</tool_call>are matched byre.search(r'<tool_call>(.*?)</tool_call>', response, re.DOTALL)(line 1080-ish, in_parse_tool_calls). Translation must keep these XML tags byte-for-byte._is_valid_tool_callaccepts both{"name": ..., "parameters": ...}and{"tool": ..., "params": ...}shapes, normalizing toname/parameters. Translation does not affect this.
- Implications: Translated prompts continue to instruct the model using the same literal example block; only the surrounding natural-language Chinese is rewritten in English.
Topic: Are there Chinese illustrations embedded inside the section system prompt that must also translate?
- Context:
SECTION_SYSTEM_PROMPT_TEMPLATE(615–767) contains code-fenced "正确示例" / "错误示例" blocks with Chinese sample text. These are formatting-contract illustrations, not data. - Sources Consulted:
report_agent.pylines 678–703. - Findings:
- The "正确示例" block (lines 678–694) shows a sample paragraph using
**bold**,>block quotes, and lists — Chinese text demonstrating the no-headings rule. - The "错误示例" block (lines 696–703) shows wrong patterns (
## 执行摘要,### 一、首发阶段, etc.) with Chinese text labeled as errors. - These are illustrative only — the model uses them to internalize the format contract. Translating them to semantically equivalent English (sample paragraph about, e.g., a generic event using English bold/quotation/list patterns; wrong patterns showing English headings labeled as errors) preserves the contract.
- The "正确示例" block (lines 678–694) shows a sample paragraph using
- Implications: The section system prompt translation must rewrite both example blocks in English while keeping the structural rule (use
**bold**,>, lists; do not use#,##,###,####).
Topic: Are there Chinese strings that flow through t(...) keys (vs raw literals)?
- Context: R12 carves out
logger.*calls already routed viat('...')(issue #6). Need to confirm we are not double-counting strings. - Sources Consulted:
report_agent.py;locales/en.json;locales/zh.json. - Findings:
- 47 of 48
logger.*calls inreport_agent.pyalready uset('report.*')ort('progress.*')keys — those are out of scope. - One raw Chinese f-string remains:
logger.debug(f"LLM响应: {response[:200]}...")at line 1322. This is a logger call (not a prompt string sent to the LLM). It belongs to issue #6, and leaving it untouched is consistent with the ticket boundary "logger calls in this file are covered by #6". progress_callback(...)calls receivet('progress.*')localized strings — those flow to the frontend, not to the LLM, and are out of scope.
- 47 of 48
- Implications: After translation, a single Chinese f-string in
report_agent.pyremains (line 1322 logger.debug). This is acceptable per the ticket's R12 carve-out.
Topic: How do consumers downstream of Report.to_dict() and chat() deal with localized output?
- Context: R10 / R11 — preserving the public surface so the report API and the chat endpoint continue to work unchanged.
- Sources Consulted:
backend/app/api/report.py;frontend/src/api/report.js;frontend/src/components/Step5*.vue. - Findings:
- The report API blueprint hands
Report.to_dict()and the chat response payload to the frontend without locale-specific post-processing. The frontend rendersreport.title,report.summary, andreport.sections[*].title/contentas plain text/Markdown. - There are no string-equality checks against Chinese substrings on the consumer side. Translating the fallback outline to English is safe.
- The report API blueprint hands
- Implications: R8 (translate fallback outline) and R10 (preserve surface) are independently verifiable — no consumer-side adaptation required.
Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|---|---|---|---|---|
| In-place translation (A) | Edit Chinese string-literals in report_agent.py directly |
Matches precedent of #2/#3/#4; minimal blast radius; no new files | Translations are baked in; non-zh/en locales still rely on postfix bias |
Selected — same pattern, same scope |
Externalize to /locales/ (B) |
Move all prompt content to locales/en.json / locales/zh.json and resolve via t(...) |
Genuinely locale-agnostic; could later support es/fr/pt/ru/de natively | Touches /locales/ (forbidden by R12); diverges from sibling pattern; brace-escape risk in JSON |
Rejected — breaks R12 |
| Hybrid externalization (C) | Externalize top-level constants; keep inline f-strings in code | Captures largest blocks in localizable form | Two-tier inconsistency; same R12 violation; no precedent | Rejected — same R12 issue |
Design Decisions
Decision: In-place translation in a single file
- Context: Translate ~2680 Chinese characters across all LLM-facing string-literals in
backend/app/services/report_agent.py. - Alternatives Considered:
- Externalize all prompts to
/locales/*.jsonand resolve viat(...). - Hybrid: externalize the seven top-level constants only.
- Externalize all prompts to
- Selected Approach: In-place translation in
report_agent.py. Edit each string-literal directly. No new files, no new abstractions. - Rationale: Four sibling i18n PRs (issues #2/#3/#4) used the same pattern. Precedent is locked in; reviewer expectations are clear; PR shape is predictable. R12 explicitly forbids
/locales/edits. - Trade-offs: ✅ smallest blast radius, ✅ matches reviewer pattern. ❌ es/fr/pt/ru/de still rely on postfix bias (already true today; not a regression).
- Follow-up: Run
python -m py_compile backend/app/services/report_agent.pypost-edit; run a regex sweep verifying zero Chinese chars in any LLM-facing string-literal (the line-1322logger.debugis exempt — issue #6).
Decision: Translate embedded Chinese examples in SECTION_SYSTEM_PROMPT_TEMPLATE
- Context: The section system prompt contains "正确示例" / "错误示例" code blocks (lines 678–703) with Chinese sample text. R2 AC1 demands zero Chinese in any string-literal content.
- Alternatives Considered:
- Drop the example blocks entirely (shorter prompt, less guidance for the model).
- Translate to semantically equivalent English illustrations.
- Keep Chinese examples and append an English translation in parallel.
- Selected Approach: Translate to English. The "Correct Example" shows a sample paragraph about a generic scenario using
**bold**,>block quotes, lists, and no headings. The "Wrong Example" shows wrong English headings (## Executive Summary,### 1. First Stage, etc.) labeled as errors. - Rationale: The examples drive the model's understanding of the no-headings format contract. Removing them risks regressing format compliance. Parallel Chinese-then-English bloats the prompt and re-introduces Chinese tokens. English-only is the cleanest match for an English-base prompt.
- Trade-offs: ✅ preserves format contract, ✅ single-language base prompt. ❌ slight prompt-length growth (negligible vs total context).
- Follow-up: Spot-check a single end-to-end report run under
Accept-Language: ento confirm the model still avoids Markdown headings in section bodies.
Decision: Switch "、".join(unused_tools) to ", ".join(...)
- Context: Line 1454 currently does
unused_tools_str = "、".join(unused_tools), where、is the Chinese enumeration comma. This list flows intoREACT_OBSERVATION_TEMPLATEand into the inline f-strings at lines 1380 and 1476. - Alternatives Considered:
- Keep
"、"(Chinese punctuation). - Switch to
", "(English-friendly). - Keep
"、"forzh,", "foren(locale-conditional).
- Keep
- Selected Approach: Switch to
", "unconditionally. - Rationale: The join result is interpolated into the now-English ReACT templates. Keeping the Chinese enumeration comma in English context reads as a typo. Locale-conditional behavior here would re-introduce Chinese tokens into the message stream when
zhis the locale (acceptable but inconsistent with the rest of the message). The model already followsget_language_instruction()for output, so the join punctuation does not need to localize. - Trade-offs: ✅ natural English rendering, ✅ single code path. ❌ a
zh-locale developer reading the code might find the all-English separator slightly off — minor stylistic concern only. - Follow-up: None — this is a one-line change.
Decision: Standard English phrasing for recurring framing terms
- Context: The Chinese prompts use recurring framing tokens that need consistent English equivalents. Inconsistent translations (e.g. "scenario" in one place, "brief" in another) hurt the prompt's coherence.
- Selected Approach: Pick once, use everywhere:
- 上帝视角 → "all-seeing observer's perspective" / "god's-eye view" (use the latter; shorter, more idiomatic)
- 未来预演 → "forecast simulation" / "simulated future"
- 模拟需求 → "simulation requirement" (matches the variable name
simulation_requirement) - 上下文 → "context"
- 章节 → "section"
- 报告 → "report"
- 大纲 → "outline"
- 正确示例 → "Correct Example"
- 错误示例 → "Wrong Example"
- 重要 → "IMPORTANT"
- 注意 → "Note"
- 工具 → "tool"
- 检索 → "retrieval"
- 章节标题 → "section title"
- 模拟世界 → "simulated world"
- 引用 → "quote" / "quotation"
- Rationale: Internal consistency lets the model build a coherent mental model of the task vocabulary. Aligning with variable names (e.g.
simulation_requirement) reduces translation surface ambiguity. - Trade-offs: ✅ consistent vocabulary across translated regions. ❌ none.
- Follow-up: None — list serves as a glossary for the implementer.
Risks & Mitigations
- Risk: Translated section system prompt drops a structural cue the Chinese version was carrying, regressing
zhquality. Mitigation: Preserve all interpolations, the JSON schema, the no-headings rule, the language-consistency rule, the format-contract examples (now in English), and theget_language_instruction()postfix. Spot-check azhrun if feasible. - Risk: A Chinese substring slips through (e.g. inside a hard-to-spot ReACT message). Mitigation: Run a regex sweep
re.findall(r'[一-鿿]', source)after the edit; the only allowed remaining match is the line-1322logger.debugChinese f-string. - Risk: Reformatting
SECTION_SYSTEM_PROMPT_TEMPLATEdamages the literal<tool_call>example or shifts theFinal Answer:token. Mitigation: Use targetedEditreplacements that preserve the surrounding code block; verify after edit that"Final Answer:" in responsestill triggers the parser branch. - Risk: The
"、".join(...)separator change leaks into a Chinese-language render path. Mitigation: The separator only flows into ReACT templates that are already monolingually English in this PR; nozh-specific render path consumes it.
References
- Issue #5 ticket body:
.ticket/5.md. - Sibling spec:
.kiro/specs/i18n-simulation-config-generator-prompts/{requirements,design,gap-analysis}.md. - Sibling commits:
0806832(#2 ontology),9d1d29b(#3 oasis profile),6c2a412(#4 simulation config). - Locale module:
backend/app/utils/locale.py. - Locale registry:
locales/languages.json,locales/en.json,locales/zh.json.