MicroFish/.kiro/specs/i18n-report-agent-prompts/research.md

146 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Research & Design Decisions — i18n-report-agent-prompts
## Summary
- **Feature**: `i18n-report-agent-prompts`
- **Discovery Scope**: Extension (single-file in-place translation, established sibling-spec precedent)
- **Key Findings**:
- The full LLM-message-stream Chinese surface in `report_agent.py` is ~2680 chars across 7 top-level prompt constants, 4 tool-description blocks, ~10 inline f-strings/templates inside `_generate_section_react` and `chat`, the `_execute_tool` error returns, and the `plan_outline` defaults — ~4× the ticket's "~609 char" estimate.
- The four sibling i18n PRs (#2/#3/#4 commits `0806832`, `9d1d29b`, `6c2a412`) established in-place translation in a single file as the pattern; reviewer expectations and PR shape are already locked in.
- Two cross-cutting literals must be preserved byte-for-byte: the trigger string `Final Answer:` (matched by `_generate_section_react` line 1327) and the XML tag `<tool_call>` (matched by `_parse_tool_calls` regex). All translated prompts continue to reference these literals exactly.
## Research Log
### Topic: How does locale switching work today and what guarantees does it give?
- **Context**: R9 of requirements depends on `get_language_instruction()` continuing to bias the model into the requested locale even after the base prompt is English.
- **Sources Consulted**: `backend/app/utils/locale.py`; `locales/languages.json`; `locales/en.json`; `locales/zh.json`; sibling-spec gap-analysis for issue #4 (`.kiro/specs/i18n-simulation-config-generator-prompts/gap-analysis.md`).
- **Findings**:
- `get_language_instruction()` resolves locale from the Flask `Accept-Language` header (or thread-local in background threads) and returns a per-locale postfix string (`Please respond in English.` for `en`, `请使用中文回答。` for `zh`, etc.).
- `languages.json` registers `zh`, `en`, `es`, `fr`, `pt`, `ru`, `de`. All non-`zh` postfixes are already in English.
- Sibling spec #4 verified that an English-base prompt + `请使用中文回答。` postfix produces Chinese output of equivalent quality to the prior Chinese-base prompt for the simulation-config flow. The same mechanism applies here — there is no report-agent-specific locale path.
- **Implications**: The translation does not need to touch `locale.py` or `/locales/*`. Preserving the three `get_language_instruction()` call sites verbatim is sufficient for R9.
### Topic: Which literal trigger strings does the ReACT loop parser match?
- **Context**: R2 acceptance criterion 6 and R4 acceptance criterion 4 require that translated prompts continue to use literal trigger strings the parser depends on.
- **Sources Consulted**: `backend/app/services/report_agent.py` lines 10671126 (`_parse_tool_calls`, `_is_valid_tool_call`); 1327 (`has_final_answer = "Final Answer:" in response`); 1838, 1874 (chat regex `<tool_call>.*?</tool_call>`).
- **Findings**:
- `Final Answer:` is matched as a Python literal substring (case-sensitive). Translation must keep this English token byte-for-byte.
- `<tool_call>` and `</tool_call>` are matched by `re.search(r'<tool_call>(.*?)</tool_call>', response, re.DOTALL)` (line 1080-ish, in `_parse_tool_calls`). Translation must keep these XML tags byte-for-byte.
- `_is_valid_tool_call` accepts both `{"name": ..., "parameters": ...}` and `{"tool": ..., "params": ...}` shapes, normalizing to `name`/`parameters`. Translation does not affect this.
- **Implications**: Translated prompts continue to instruct the model using the same literal example block; only the surrounding natural-language Chinese is rewritten in English.
### Topic: Are there Chinese illustrations embedded inside the section system prompt that must also translate?
- **Context**: `SECTION_SYSTEM_PROMPT_TEMPLATE` (615767) contains code-fenced "正确示例" / "错误示例" blocks with Chinese sample text. These are formatting-contract illustrations, not data.
- **Sources Consulted**: `report_agent.py` lines 678703.
- **Findings**:
- The "正确示例" block (lines 678694) shows a sample paragraph using `**bold**`, `>` block quotes, and lists — Chinese text demonstrating the no-headings rule.
- The "错误示例" block (lines 696703) shows wrong patterns (`## 执行摘要`, `### 一、首发阶段`, etc.) with Chinese text labeled as errors.
- These are illustrative only — the model uses them to internalize the format contract. Translating them to semantically equivalent English (sample paragraph about, e.g., a generic event using English bold/quotation/list patterns; wrong patterns showing English headings labeled as errors) preserves the contract.
- **Implications**: The section system prompt translation must rewrite both example blocks in English while keeping the structural rule (use `**bold**`, `>`, lists; do not use `#`, `##`, `###`, `####`).
### Topic: Are there Chinese strings that flow through `t(...)` keys (vs raw literals)?
- **Context**: R12 carves out `logger.*` calls already routed via `t('...')` (issue #6). Need to confirm we are not double-counting strings.
- **Sources Consulted**: `report_agent.py`; `locales/en.json`; `locales/zh.json`.
- **Findings**:
- 47 of 48 `logger.*` calls in `report_agent.py` already use `t('report.*')` or `t('progress.*')` keys — those are out of scope.
- One raw Chinese f-string remains: `logger.debug(f"LLM响应: {response[:200]}...")` at line 1322. This is a logger call (not a prompt string sent to the LLM). It belongs to issue #6, and leaving it untouched is consistent with the ticket boundary "logger calls in this file are covered by #6".
- `progress_callback(...)` calls receive `t('progress.*')` localized strings — those flow to the frontend, not to the LLM, and are out of scope.
- **Implications**: After translation, a single Chinese f-string in `report_agent.py` remains (line 1322 logger.debug). This is acceptable per the ticket's R12 carve-out.
### Topic: How do consumers downstream of `Report.to_dict()` and `chat()` deal with localized output?
- **Context**: R10 / R11 — preserving the public surface so the report API and the chat endpoint continue to work unchanged.
- **Sources Consulted**: `backend/app/api/report.py`; `frontend/src/api/report.js`; `frontend/src/components/Step5*.vue`.
- **Findings**:
- The report API blueprint hands `Report.to_dict()` and the chat response payload to the frontend without locale-specific post-processing. The frontend renders `report.title`, `report.summary`, and `report.sections[*].title/content` as plain text/Markdown.
- There are no string-equality checks against Chinese substrings on the consumer side. Translating the fallback outline to English is safe.
- **Implications**: R8 (translate fallback outline) and R10 (preserve surface) are independently verifiable — no consumer-side adaptation required.
## Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|--------|-------------|-----------|---------------------|-------|
| In-place translation (A) | Edit Chinese string-literals in `report_agent.py` directly | Matches precedent of #2/#3/#4; minimal blast radius; no new files | Translations are baked in; non-`zh`/`en` locales still rely on postfix bias | **Selected** — same pattern, same scope |
| Externalize to `/locales/` (B) | Move all prompt content to `locales/en.json` / `locales/zh.json` and resolve via `t(...)` | Genuinely locale-agnostic; could later support es/fr/pt/ru/de natively | Touches `/locales/` (forbidden by R12); diverges from sibling pattern; brace-escape risk in JSON | Rejected — breaks R12 |
| Hybrid externalization (C) | Externalize top-level constants; keep inline f-strings in code | Captures largest blocks in localizable form | Two-tier inconsistency; same R12 violation; no precedent | Rejected — same R12 issue |
## Design Decisions
### Decision: In-place translation in a single file
- **Context**: Translate ~2680 Chinese characters across all LLM-facing string-literals in `backend/app/services/report_agent.py`.
- **Alternatives Considered**:
1. Externalize all prompts to `/locales/*.json` and resolve via `t(...)`.
2. Hybrid: externalize the seven top-level constants only.
- **Selected Approach**: In-place translation in `report_agent.py`. Edit each string-literal directly. No new files, no new abstractions.
- **Rationale**: Four sibling i18n PRs (issues #2/#3/#4) used the same pattern. Precedent is locked in; reviewer expectations are clear; PR shape is predictable. R12 explicitly forbids `/locales/` edits.
- **Trade-offs**: ✅ smallest blast radius, ✅ matches reviewer pattern. ❌ es/fr/pt/ru/de still rely on postfix bias (already true today; not a regression).
- **Follow-up**: Run `python -m py_compile backend/app/services/report_agent.py` post-edit; run a regex sweep verifying zero Chinese chars in any LLM-facing string-literal (the line-1322 `logger.debug` is exempt — issue #6).
### Decision: Translate embedded Chinese examples in `SECTION_SYSTEM_PROMPT_TEMPLATE`
- **Context**: The section system prompt contains "正确示例" / "错误示例" code blocks (lines 678703) with Chinese sample text. R2 AC1 demands zero Chinese in any string-literal content.
- **Alternatives Considered**:
1. Drop the example blocks entirely (shorter prompt, less guidance for the model).
2. Translate to semantically equivalent English illustrations.
3. Keep Chinese examples and append an English translation in parallel.
- **Selected Approach**: Translate to English. The "Correct Example" shows a sample paragraph about a generic scenario using `**bold**`, `>` block quotes, lists, and no headings. The "Wrong Example" shows wrong English headings (`## Executive Summary`, `### 1. First Stage`, etc.) labeled as errors.
- **Rationale**: The examples drive the model's understanding of the no-headings format contract. Removing them risks regressing format compliance. Parallel Chinese-then-English bloats the prompt and re-introduces Chinese tokens. English-only is the cleanest match for an English-base prompt.
- **Trade-offs**: ✅ preserves format contract, ✅ single-language base prompt. ❌ slight prompt-length growth (negligible vs total context).
- **Follow-up**: Spot-check a single end-to-end report run under `Accept-Language: en` to confirm the model still avoids Markdown headings in section bodies.
### Decision: Switch `"、".join(unused_tools)` to `", ".join(...)`
- **Context**: Line 1454 currently does `unused_tools_str = "、".join(unused_tools)`, where `、` is the Chinese enumeration comma. This list flows into `REACT_OBSERVATION_TEMPLATE` and into the inline f-strings at lines 1380 and 1476.
- **Alternatives Considered**:
1. Keep `"、"` (Chinese punctuation).
2. Switch to `", "` (English-friendly).
3. Keep `"、"` for `zh`, `", "` for `en` (locale-conditional).
- **Selected Approach**: Switch to `", "` unconditionally.
- **Rationale**: The join result is interpolated into the now-English ReACT templates. Keeping the Chinese enumeration comma in English context reads as a typo. Locale-conditional behavior here would re-introduce Chinese tokens into the message stream when `zh` is the locale (acceptable but inconsistent with the rest of the message). The model already follows `get_language_instruction()` for output, so the join punctuation does not need to localize.
- **Trade-offs**: ✅ natural English rendering, ✅ single code path. ❌ a `zh`-locale developer reading the code might find the all-English separator slightly off — minor stylistic concern only.
- **Follow-up**: None — this is a one-line change.
### Decision: Standard English phrasing for recurring framing terms
- **Context**: The Chinese prompts use recurring framing tokens that need consistent English equivalents. Inconsistent translations (e.g. "scenario" in one place, "brief" in another) hurt the prompt's coherence.
- **Selected Approach**: Pick once, use everywhere:
- 上帝视角 → "all-seeing observer's perspective" / "god's-eye view" (use the latter; shorter, more idiomatic)
- 未来预演 → "forecast simulation" / "simulated future"
- 模拟需求 → "simulation requirement" (matches the variable name `simulation_requirement`)
- 上下文 → "context"
- 章节 → "section"
- 报告 → "report"
- 大纲 → "outline"
- 正确示例 → "Correct Example"
- 错误示例 → "Wrong Example"
- 重要 → "IMPORTANT"
- 注意 → "Note"
- 工具 → "tool"
- 检索 → "retrieval"
- 章节标题 → "section title"
- 模拟世界 → "simulated world"
- 引用 → "quote" / "quotation"
- **Rationale**: Internal consistency lets the model build a coherent mental model of the task vocabulary. Aligning with variable names (e.g. `simulation_requirement`) reduces translation surface ambiguity.
- **Trade-offs**: ✅ consistent vocabulary across translated regions. ❌ none.
- **Follow-up**: None — list serves as a glossary for the implementer.
## Risks & Mitigations
- **Risk**: Translated section system prompt drops a structural cue the Chinese version was carrying, regressing `zh` quality. **Mitigation**: Preserve all interpolations, the JSON schema, the no-headings rule, the language-consistency rule, the format-contract examples (now in English), and the `get_language_instruction()` postfix. Spot-check a `zh` run if feasible.
- **Risk**: A Chinese substring slips through (e.g. inside a hard-to-spot ReACT message). **Mitigation**: Run a regex sweep `re.findall(r'[一-鿿]', source)` after the edit; the only allowed remaining match is the line-1322 `logger.debug` Chinese f-string.
- **Risk**: Reformatting `SECTION_SYSTEM_PROMPT_TEMPLATE` damages the literal `<tool_call>` example or shifts the `Final Answer:` token. **Mitigation**: Use targeted `Edit` replacements that preserve the surrounding code block; verify after edit that `"Final Answer:" in response` still triggers the parser branch.
- **Risk**: The `"、".join(...)` separator change leaks into a Chinese-language render path. **Mitigation**: The separator only flows into ReACT templates that are already monolingually English in this PR; no `zh`-specific render path consumes it.
## References
- Issue #5 ticket body: `.ticket/5.md`.
- Sibling spec: `.kiro/specs/i18n-simulation-config-generator-prompts/{requirements,design,gap-analysis}.md`.
- Sibling commits: `0806832` (#2 ontology), `9d1d29b` (#3 oasis profile), `6c2a412` (#4 simulation config).
- Locale module: `backend/app/utils/locale.py`.
- Locale registry: `locales/languages.json`, `locales/en.json`, `locales/zh.json`.