MicroFish/.kiro/specs/i18n-report-agent-prompts/gap-analysis.md

155 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Gap Analysis — i18n-report-agent-prompts
## 1. Current-State Investigation
### Domain assets
- **Target file**: `backend/app/services/report_agent.py` (2572 lines).
- **Tool description constants** (LLM-facing, injected via `_get_tools_description`):
- `TOOL_DESC_INSIGHT_FORGE` (lines 476492)
- `TOOL_DESC_PANORAMA_SEARCH` (lines 494509)
- `TOOL_DESC_QUICK_SEARCH` (lines 511521)
- `TOOL_DESC_INTERVIEW_AGENTS` (lines 523548)
- **PLAN-phase prompts** (`plan_outline`, line ~1137):
- `PLAN_SYSTEM_PROMPT` (lines 552589) — `system_prompt = f"{PLAN_SYSTEM_PROMPT}\n\n{get_language_instruction()}"` at line 1166.
- `PLAN_USER_PROMPT_TEMPLATE` (lines 591611).
- **EXEC-phase prompts** (`_generate_section_react`, line ~1221):
- `SECTION_SYSTEM_PROMPT_TEMPLATE` (lines 615767) — appended postfix at line 1262.
- `SECTION_USER_PROMPT_TEMPLATE` (lines 769792).
- **ReACT loop conversation templates** (consumed inside `_generate_section_react`):
- `REACT_OBSERVATION_TEMPLATE` (796806)
- `REACT_INSUFFICIENT_TOOLS_MSG` (808811)
- `REACT_INSUFFICIENT_TOOLS_MSG_ALT` (813816)
- `REACT_TOOL_LIMIT_MSG` (818821)
- `REACT_UNUSED_TOOLS_HINT` (823)
- `REACT_FORCE_FINAL_MSG` (825)
- **CHAT-phase prompts** (`chat`, line ~1766):
- `CHAT_SYSTEM_PROMPT_TEMPLATE` (829855) — appended postfix at line 1808.
- `CHAT_OBSERVATION_SUFFIX` (857).
- **Inline LLM-visible Chinese strings** (sent into `messages`):
- `_define_tools` parameter-description dict values (925952).
- `_get_tools_description` leader `"可用工具:"` (1129).
- `_execute_tool` error returns `f"未知工具: {tool_name}..."` (1058) and `f"工具执行失败: {str(e)}"` (1062).
- `_generate_section_react`: `report_context = f"章节标题: ...\n模拟需求: ..."` (1294); empty-response messages `"(响应为空)"` / `"请继续生成内容。"` (13161317); conflict-handling block (13421346); inline `unused_hint` literals at 1380 and 1476.
- `chat`: report-truncated marker `"\n\n... [报告内容已截断] ..."` (1799); no-report fallback `"(暂无报告)"` (1805); observation joiner `f"[{r['tool']}结果]\n{r['result']}"` (1861).
- **Default / fallback outline content** in `plan_outline()`:
- Success-path default title `"模拟分析报告"` (1197).
- Exception-path fallback `ReportOutline` title `"未来预测报告"`, summary `"基于模拟预测的未来趋势与风险分析"`, three section titles (12121218).
- **Locale resolution**: `backend/app/utils/locale.py` `get_locale`/`get_language_instruction`/`t` resolves locale from `Accept-Language` header (or thread-local in background threads). `languages.json` registers `zh`, `en`, `es`, `fr`, `pt`, `ru`, `de`.
### Counts (verified, in-scope only)
| Region | Approx Chinese chars |
| --- | --- |
| `TOOL_DESC_INSIGHT_FORGE` | 110 |
| `TOOL_DESC_PANORAMA_SEARCH` | 95 |
| `TOOL_DESC_QUICK_SEARCH` | 50 |
| `TOOL_DESC_INTERVIEW_AGENTS` | 215 |
| `PLAN_SYSTEM_PROMPT` | 250 |
| `PLAN_USER_PROMPT_TEMPLATE` | 130 |
| `SECTION_SYSTEM_PROMPT_TEMPLATE` | 950 |
| `SECTION_USER_PROMPT_TEMPLATE` | 150 |
| `REACT_*` templates | 130 |
| `CHAT_SYSTEM_PROMPT_TEMPLATE` + `CHAT_OBSERVATION_SUFFIX` | 130 |
| `_define_tools` parameter dict values | 110 |
| `_execute_tool` error returns | 30 |
| `_generate_section_react` inline messages | 230 |
| `chat` inline messages | 60 |
| `plan_outline` defaults | 50 |
| **In-scope total** | **~2680** |
The ticket's "~609 Chinese characters" undercounts — it apparently only counted the three system-prompt blocks. The full LLM-message-stream Chinese surface is ~4× that. Logger calls (~17), docstrings, and module/class/method/inline `#` comments are out of scope (covered by #6 / #7).
### Conventions (extracted)
- Sister specs `i18n-ontology-generator-prompts` (commit `0806832`, issue #2), `i18n-oasis-profile-generator-prompts` (commit `9d1d29b`, issue #3), and `i18n-simulation-config-generator-prompts` (commit `6c2a412`, issue #4) established the pattern: **in-place translation of all LLM-facing string literals in a single file; preserve `get_language_instruction()` call sites; preserve all interpolations; do not touch logger, docstrings, comments, or other files.**
- 4-space indent, snake_case, double quotes for strings, `f"""..."""` for multi-line prompts. Existing Chinese-then-English mix is acceptable in comments/docstrings (steering tech.md: "preserve both; do not translate one into the other unless asked").
- No linter/formatter — match surrounding style.
- File mixes top-level constant prompts (e.g. `PLAN_SYSTEM_PROMPT`) with inline f-strings and `.format()` templates inside method bodies. Translation must respect both placement conventions.
### Integration surfaces
- `ReportAgent.generate_report(...)` is called from the report API blueprint (`api/report.py`). The returned `Report.to_dict()` payload is consumed by the frontend report panel; field shapes and types must remain unchanged.
- `ReportAgent.chat(...)` is called from the chat endpoint; the returned `{"response", "tool_calls", "sources"}` shape is consumed by the frontend chat UI.
- The four primary tools (`insight_forge`, `panorama_search`, `quick_search`, `interview_agents`) are dispatched in `_execute_tool` to `self.zep_tools.*` — those callees are unchanged.
- `_parse_tool_calls` matches the literal `<tool_call>...</tool_call>` XML tag and a fallback bare-JSON form via regex. Translation must preserve those literals byte-for-byte.
- `chat()` strips `<tool_call>` blocks from the user-visible response via `re.sub(r'<tool_call>.*?</tool_call>', '', ...)` (lines 1838, 1874). Translation does not affect this.
- `_clean_section_content` and `_post_process_report` post-process generated section content under the assumption that the LLM does not emit Markdown headings (`#`, `##`, `###`, etc.) inside section bodies. The translated `SECTION_SYSTEM_PROMPT_TEMPLATE` must continue to forbid headings.
- Locale-switching contract: when locale = `zh`, `get_language_instruction()` returns `请使用中文回答。`; when `en`, `Please respond in English.` — verified.
## 2. Requirement-to-Asset Map
| Requirement | Existing asset | Gap | Tag |
| --- | --- | --- | --- |
| R1 — PLAN prompts EN | `PLAN_SYSTEM_PROMPT` (552), `PLAN_USER_PROMPT_TEMPLATE` (591) | Translate text; preserve JSON schema (`title`, `summary`, `sections[]` w/ `title`, `description`); preserve 25 section count; preserve all interpolations | Missing (translation) |
| R2 — EXEC prompts EN | `SECTION_SYSTEM_PROMPT_TEMPLATE` (615), `SECTION_USER_PROMPT_TEMPLATE` (769) | Translate text; preserve `Final Answer:` / `<tool_call>` literals; preserve no-headings instruction; preserve language-consistency rule; preserve interpolation tokens | Missing (translation) |
| R3 — CHAT prompts EN | `CHAT_SYSTEM_PROMPT_TEMPLATE` (829), `CHAT_OBSERVATION_SUFFIX` (857) | Translate text; preserve `<tool_call>` literal; preserve `MAX_TOOL_CALLS_PER_CHAT` semantics | Missing (translation) |
| R4 — ReACT loop templates EN | `REACT_OBSERVATION_TEMPLATE` and 5 message constants (796825) | Translate text; preserve `Final Answer:` literal; preserve emoji/box-drawing visuals; reconcile `"、".join(...)` separator | Missing (translation) |
| R5 — Tool-description constants EN | 4 `TOOL_DESC_*` blocks (476548); `_define_tools` parameter dict (925952); `_get_tools_description` leader (1129) | Translate text; preserve tool-name literals; preserve parameter dict keys; preserve OASIS-running warning | Missing (translation) |
| R6 — Inline LLM-visible strings EN | 7 inline strings across `_generate_section_react` and `chat` (1294, 13161317, 13421346, 1380, 1476, 1799, 1805, 1861) | Translate text; preserve `{section.title}`, `{self.simulation_requirement}`, `{r['tool']}`, `{r['result']}`, `{', '.join(unused_tools)}` interpolations | Missing (translation) |
| R7 — `_execute_tool` error returns EN | 2 f-strings (1058, 1062) | Translate text; preserve `{tool_name}` and `{str(e)}` interpolations; remain locale-agnostic | Missing (translation) |
| R8 — `plan_outline` defaults EN | 1 success-path default (1197), 5 exception-path strings (12121218) | Translate text; remain locale-agnostic; preserve `ReportOutline` shape (3 sections) | Missing (translation) |
| R9 — Locale switching preserved | `get_language_instruction()` calls at 1166, 1262, 1808 | None — keep call sites untouched | Constraint |
| R10 — Public API stable | Class/method/dataclass surface | None — text-only changes | Constraint |
| R11 — End-to-end parity | API blueprint, frontend report panel, OASIS interview API | Verification only — `Report.to_dict()` shape unchanged | Constraint |
| R12 — Out-of-scope guardrails | logger calls (~17 in this file), docstrings, comments, all other files | None — leave untouched | Constraint |
### Unknown / Research-needed
- **R11 verification feasibility**: Running an end-to-end report generation flow under `Accept-Language: en` and `Accept-Language: zh` requires Neo4j, an LLM key, a populated graph, and a running OASIS simulation (for `interview_agents`). In a sandboxed CI-like environment, this is not practical. Defer to a lightweight fixture-based check, matching the precedent set by issues #2/#3/#4: (a) `python -m py_compile` lint pass on `report_agent.py`; (b) zero-Chinese assertion on the in-scope string set via a script that imports the module and inspects each constant + a regex sweep over the module source; (c) shape parity by constructing a mock `ReportAgent` and confirming `_get_tools_description()`, `system_prompt`, and `user_prompt` render to the expected interpolation set without raising. **Decision (autonomous run)**: adopt option (c) — reviewer-trust is the precedent for issues #2/#3/#4 and the scope here is identical (single-file translation).
- **`logger.debug(f"LLM响应: {response[:200]}...")` at line 1322**: This is the one raw-Chinese logger call in this file (all others use `t('...')`). It is OUT OF SCOPE for issue #5 — it falls under issue #6 (logger translation). Note for the reviewer: this leaves one Chinese f-string in `report_agent.py` after this PR; the acceptance criterion in the ticket explicitly carves out logger lines.
- **`SECTION_SYSTEM_PROMPT_TEMPLATE` includes a "正确示例" / "错误示例" code block (lines 678703) with embedded Chinese sample text** (`微博`, `抖音`, `校方` etc.). These are example illustrations of the formatting contract, not data. Translating them to English is required (R2 acceptance criterion 1: "zero Chinese characters"). The translated examples should still illustrate the same format rule (use `**bold**` not `##`, use `>` for block quotes, no headings).
## 3. Implementation Approach Options
### Option A — In-place translation (recommended)
**What**: Edit every Chinese string-literal in `backend/app/services/report_agent.py` directly, in place. No new files.
**Trade-offs**:
- ✅ Matches the precedent set by commits `0806832` (issue #2), `9d1d29b` (issue #3), and `6c2a412` (issue #4) — same file scope, same approach. Reviewer pattern recognition is the lowest possible.
- ✅ Smallest possible diff at the file system level (1 file).
- ✅ No new abstractions, no new files, no dependency churn.
- ❌ Translations are baked in — switching to `es`/`fr`/`pt`/`ru`/`de` still relies on the `get_language_instruction()` postfix to bias the model. (This is also true under the current Chinese-base baseline; not a regression.)
- ❌ The diff is non-trivial (~2680 chars to retranslate plus structural rewriting of the section system prompt). Reviewer must read the prompts side-by-side; line counts shift.
### Option B — Externalize prompts to `/locales/`
**What**: Move all prompt content to `locales/en.json` / `locales/zh.json` and look them up via `t('prompts.report.plan.system')` etc.
**Trade-offs**:
- ✅ Genuinely locale-agnostic prompts; could deliver native-quality Spanish, French, etc. with future translation work.
- ✅ Separates content from code, easing future prompt edits without code review.
- ❌ Diverges from the established pattern of issues #2/#3/#4 — those translated in place. Adopting a new pattern for the same kind of work re-opens architectural design questions and inflates this PR's blast radius.
- ❌ Touches `backend/app/utils/locale.py` (or its caller surface) and `/locales/`, which the spec's R12 and the ticket's "Out of scope" boundary explicitly forbid.
- ❌ Increases JSON-escape-hell risk for the section system prompt's literal `{{` and `}}` braces and triple-quote contents.
### Option C — Hybrid (top-level constants stay externalized; inline strings stay in code)
**What**: Externalize only the seven top-level prompt constants (`PLAN_*`, `SECTION_*`, `CHAT_*`, `TOOL_DESC_*`) to `/locales/`; translate inline f-strings in code in place.
**Trade-offs**:
- ✅ Captures the largest blocks (highest character count) in a localizable way.
- ❌ Two-tier inconsistency: some prompt content in `/locales/`, some in code. Future maintainers must trace both.
- ❌ Same R12 violation as Option B (touches `/locales/`).
- ❌ No precedent in the four sibling i18n efforts already in flight.
## 4. Implementation Complexity & Risk
- **Effort**: **M** (35 days for one focused engineer). Larger than the 247-char ticket estimate suggested, but smaller than a typical M because the work is mechanical translation with strict guardrails. Most of the work is high-quality English rewriting of the section system prompt (~950 Chinese chars, the largest block in the file), getting reviewer-acceptable phrasing for the "上帝视角" / "未来预演" framing, and verifying that the no-headings instruction stays semantically equivalent.
- **Risk**: **Low**. Familiar tech, established sibling-spec precedent, clear guardrails (R9R12), single file, no new dependencies, no API changes. The only non-trivial risk is a regression in `zh` quality if a translated prompt drops a structural cue the Chinese version was carrying — mitigated by preserving every interpolation, the JSON schema, the format-contract instructions, and the `get_language_instruction()` postfix.
## 5. Recommendations for Design Phase
- **Preferred approach**: **Option A — in-place translation in `backend/app/services/report_agent.py`.** Rationale: matches the four sibling i18n PRs, smallest blast radius, respects R12.
- **Key decisions to lock in design**:
1. Translation of the Chinese **examples inside** `SECTION_SYSTEM_PROMPT_TEMPLATE` (lines 678703): replace with semantically equivalent English illustrations of the same formatting contract (use `**bold**`, `>` block quotes, no headings).
2. Treatment of the `"、".join(unused_tools)` separator at line 1454 → switch to `", ".join(...)` for natural English rendering, since the join result is interpolated into now-English `REACT_OBSERVATION_TEMPLATE` and `REACT_UNUSED_TOOLS_HINT`.
3. Standard English phrasing for the recurring framing terms: `上帝视角` → "all-seeing observer", `未来预演` → "future rehearsal" (or "forecast simulation"), `模拟需求` → "simulation prompt" / "scenario brief", `上下文` → "context". Pick once, use everywhere.
4. Handling of the `_get_tools_description` leader (1129): English equivalent `"Available tools:"` (verified by precedent in `_build_context` translation in #4).
5. Treatment of the conflict-handling message (lines 13421346): keep the same two-mode contract, but rephrase in English while preserving the literal `<tool_call>` tag and `'Final Answer:'` mentions.
- **Research items to carry forward**:
1. Confirm that the `Final Answer:` literal is matched case-sensitively in `_generate_section_react` (it is — line 1327: `"Final Answer:" in response`). Translation must keep it byte-for-byte.
2. Confirm that no tooling outside this file consumes the Chinese fallback outline strings as keys (e.g. translation tables, frontend lookups). Quick grep confirms none do — the strings flow into `Report.title` / `ReportOutline.title` only.
3. Verify after translation that `python -m py_compile backend/app/services/report_agent.py` passes and that the file's net Chinese-character count drops to the 17 logger lines + docstrings + comments scope (i.e. zero Chinese in any string literal that is sent into an LLM messages array).