MicroFish/.kiro/specs/i18n-report-agent-prompts/design.md

259 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Design Document — i18n-report-agent-prompts
## Overview
**Purpose**: Translate every Chinese string-literal that flows into the LLM message stream of `backend/app/services/report_agent.py` into English so that, under `Accept-Language: en`, the Report-Agent produces English-flavoured analytical reports and chat replies — and not the Chinese-biased output that today's Chinese-base prompts produce despite the `get_language_instruction()` English postfix.
**Users**: MiroFish operators running the 5-step pipeline under English locale; reviewers tracking the i18n epic (#11); developers maintaining sibling i18n issues (#6, #7, #8, #10) downstream of this change.
**Impact**: Behavioural — under `Accept-Language: en`, the report's section titles, section bodies, embedded quotations, and chat replies become English-flavoured. No public-API change. No `Report.to_dict()` shape change. No new dependencies.
### Goals
- Replace every Chinese string-literal in `report_agent.py` that is sent to the LLM (system prompt, user prompt, ReACT loop messages, tool descriptions, `_define_tools` parameter hints, `_execute_tool` error returns, `plan_outline` defaults) with English equivalents.
- Preserve every variable interpolation, every JSON schema key, every literal trigger string (`Final Answer:`, `<tool_call>`, tool-name strings), every `get_language_instruction()` call site.
- Keep the public surface of `ReportAgent`, `ReportManager`, `Report`, `ReportOutline`, `ReportSection`, `ReportStatus` byte-for-byte equivalent in shape.
### Non-Goals
- Logger calls (`logger.info`, `logger.warning`, `logger.error`, `logger.debug`) inside the same file — owned by issue #6. Notably, the single raw-Chinese `logger.debug(f"LLM响应: ...")` at line 1322 is left untouched.
- Module docstring (lines 111), class docstrings, dataclass docstrings, method docstrings, inline `#` comments — owned by issue #7.
- Refactoring prompt structure, the JSON output schema of `PLAN_SYSTEM_PROMPT`, the ReACT loop control flow, conflict-resolution branches, or the chat tool-budget caps.
- Externalizing prompts into `/locales/*.json`.
- Live end-to-end report generation under both `en` and `zh` (deferred to fixture-based static checks; reviewer trust on quality parity, matching the precedent of issues #2/#3/#4).
## Boundary Commitments
### This Spec Owns
- The string-literal **content** of all LLM-facing regions in `backend/app/services/report_agent.py`:
- Tool description constants `TOOL_DESC_INSIGHT_FORGE` (476492), `TOOL_DESC_PANORAMA_SEARCH` (494509), `TOOL_DESC_QUICK_SEARCH` (511521), `TOOL_DESC_INTERVIEW_AGENTS` (523548).
- PLAN-phase prompts `PLAN_SYSTEM_PROMPT` (552589), `PLAN_USER_PROMPT_TEMPLATE` (591611).
- EXEC-phase prompts `SECTION_SYSTEM_PROMPT_TEMPLATE` (615767), `SECTION_USER_PROMPT_TEMPLATE` (769792), including the embedded "Correct Example" / "Wrong Example" code blocks.
- ReACT loop conversation templates `REACT_OBSERVATION_TEMPLATE` (796806), `REACT_INSUFFICIENT_TOOLS_MSG` (808811), `REACT_INSUFFICIENT_TOOLS_MSG_ALT` (813816), `REACT_TOOL_LIMIT_MSG` (818821), `REACT_UNUSED_TOOLS_HINT` (823), `REACT_FORCE_FINAL_MSG` (825).
- CHAT-phase prompts `CHAT_SYSTEM_PROMPT_TEMPLATE` (829855), `CHAT_OBSERVATION_SUFFIX` (857).
- The `_define_tools` parameter-description dict values (925952) and the `_get_tools_description` leader `"可用工具:"` (1129).
- The `_execute_tool` error returns at lines 1058 and 1062.
- The inline LLM-visible strings inside `_generate_section_react`: `report_context` f-string (1294), empty-response retry (13161317), conflict-handling block (13421346), inline `unused_hint` literals (1380, 1476).
- The inline LLM-visible strings inside `chat`: report-truncated marker (1799), no-report fallback (1805), observation joiner (1861).
- The default / fallback outline content in `plan_outline`: success-path default title (1197), exception-path fallback `ReportOutline` (12121218).
- The `unused_tools_str` join separator at line 1454 — switch from `"、"` to `", "` for natural English rendering inside the now-English ReACT templates.
### Out of Boundary
- All `logger.*` calls in this file (issue #6), including the one raw-Chinese `logger.debug` at line 1322.
- All `"""..."""` docstrings and `#` comments in this file (issue #7).
- `backend/app/utils/locale.py`, `/locales/*.json`, `/locales/languages.json`.
- `backend/app/services/zep_tools.py`, `zep_entity_reader.py`, `zep_graph_memory_updater.py`.
- `backend/app/api/report.py`, `backend/app/api/simulation.py`, `backend/app/api/graph.py`.
- `backend/app/services/simulation_runner.py`, `simulation_ipc.py`, OASIS subprocess source.
- `backend/app/config.py` constants.
- `backend/pyproject.toml`, `backend/uv.lock`.
- All other files in the repository.
### Allowed Dependencies
- Read access to `get_language_instruction()` from `backend/app/utils/locale.py` — three call sites preserved verbatim (lines 1166, 1262, 1808).
- Read access to `t(...)` from `backend/app/utils/locale.py` — call sites preserved verbatim.
- No new external dependencies.
### Revalidation Triggers
- A change to the `Report.to_dict()` payload shape would force the report API blueprint and the frontend report panel to re-validate. **This spec does not change the shape.**
- A change to the `PLAN_SYSTEM_PROMPT` JSON output schema (`title`, `summary`, `sections[].title`, `sections[].description`) would force `plan_outline()`'s response parser to re-validate. **This spec preserves the schema verbatim.**
- A change to the `Final Answer:` literal trigger or the `<tool_call>...</tool_call>` XML tag would force `_generate_section_react`'s parser branches to re-validate. **This spec preserves both byte-for-byte.**
- A change to the four primary tool names (`insight_forge`, `panorama_search`, `quick_search`, `interview_agents`) or the legacy aliases (`search_graph`, `get_graph_statistics`, `get_entity_summary`, `get_simulation_context`, `get_entities_by_type`) would force `_execute_tool` and `_is_valid_tool_call` to re-validate. **This spec does not rename tools.**
## Architecture
### Existing Architecture Analysis
`ReportAgent` is a single Python class in `backend/app/services/report_agent.py`. The three LLM invocation paths (PLAN, SECTION, CHAT) follow a uniform pattern:
```
system_prompt = <chinese system prompt template>
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"
user_prompt = <chinese user prompt template with {interpolations}>
response = self.llm.chat(messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
])
```
`_generate_section_react` extends this with a multi-turn ReACT loop where the user-role messages re-injected after each tool call (`REACT_OBSERVATION_TEMPLATE`, etc.) are also Chinese today. There is no abstraction layer between prompt construction and LLM invocation — the prompt text and the call site are colocated. This matches sister modules (`simulation_config_generator.py`, `oasis_profile_generator.py`, `ontology_generator.py`).
### Architecture Pattern & Boundary Map
**Selected pattern**: In-place string-literal translation. No new components, no new modules, no new abstractions.
```mermaid
flowchart TB
subgraph Caller["Caller — api/report.py"]
api["POST /api/report/generate<br/>POST /api/report/chat"]
end
subgraph ReportAgentMod["report_agent.py — IN SCOPE"]
plan["plan_outline<br/>**translate PLAN_*, defaults**"]
sec["_generate_section_react<br/>**translate SECTION_*, REACT_*, inline strings**"]
chat["chat<br/>**translate CHAT_*, inline strings**"]
tools["_define_tools / _get_tools_description<br/>**translate TOOL_DESC_*, params, leader**"]
exec["_execute_tool<br/>**translate error returns**"]
parse["_parse_tool_calls<br/>UNCHANGED (matches literals)"]
manager["ReportManager<br/>UNCHANGED (persistence)"]
end
subgraph Locale["utils/locale.py — UNCHANGED"]
gli[get_language_instruction]
tr[t]
end
subgraph ZepTools["services/zep_tools.py — UNCHANGED"]
zt[ZepTools dispatch]
end
api --> plan
api --> sec
api --> chat
plan --> gli
sec --> gli
chat --> gli
sec --> tools
chat --> tools
sec --> parse
sec --> exec
chat --> parse
chat --> exec
exec --> zt
plan --> manager
sec --> manager
```
**Architecture Integration**:
- Selected pattern: in-place string-literal translation; matches the precedent of issues #2/#3/#4.
- Domain/feature boundaries: prompt-content is the only boundary that moves. Logger / docstring / comment boundaries (issues #6, #7) and persistence-layer boundary (`ReportManager`) are explicitly preserved.
- Existing patterns preserved: `get_language_instruction()` postfix injection at three call sites; `<tool_call>` XML protocol; `Final Answer:` literal trigger; tool-name registry; JSON output schema for outline planning.
- New components rationale: none — no new components.
- Steering compliance: respects `tech.md` "preserve both styles working" for comments/docstrings (those are out of scope); respects `structure.md` per-project file isolation; respects `commits.md` Conventional Commits format for the eventual commit message.
### Technology Stack
| Layer | Choice / Version | Role in Feature | Notes |
|-------|------------------|-----------------|-------|
| Frontend / CLI | n/a | Frontend renders the translated `Report` payload as plain text/Markdown | No frontend change required |
| Backend / Services | Python 3.11, Flask 3.0 | Hosts `ReportAgent` and the report API | Single-file edit |
| Data / Storage | Neo4j + Graphiti | Source of retrieval results consumed by `zep_tools` | Unchanged |
| Messaging / Events | n/a | Report generation runs as a background `Task` | Unchanged |
| Infrastructure / Runtime | uv-managed venv | Backend dependency manager | No new dependencies |
> No new external dependencies, libraries, or infrastructure components are introduced. Detailed locale-resolution mechanics are documented in `research.md`.
## File Structure Plan
### Modified Files
- `backend/app/services/report_agent.py` — translate every Chinese string-literal that is sent to the LLM, plus the one separator literal at line 1454. No structural code changes; no new methods; no new constants. Line counts will shift due to the typically larger English character count, but the file's overall organization is unchanged.
### Unmodified Files (explicitly verified)
- `backend/app/utils/locale.py`
- `backend/app/services/zep_tools.py`, `zep_entity_reader.py`, `zep_graph_memory_updater.py`
- `backend/app/api/report.py`, `simulation.py`, `graph.py`
- `backend/app/services/simulation_runner.py`, `simulation_ipc.py`
- `backend/app/config.py`
- `backend/pyproject.toml`, `backend/uv.lock`
- `/locales/en.json`, `/locales/zh.json`, `/locales/languages.json`
- All frontend files
## System Flows
The PLAN / SECTION / CHAT flows are unchanged at the control-flow level — only the string content of system / user / observation messages is translated. No new diagram is required; `research.md` records the relevant parser-trigger details.
## Requirements Traceability
| Requirement | Summary | Components | Interfaces | Flows |
|-------------|---------|------------|------------|-------|
| 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 | Translate `PLAN_SYSTEM_PROMPT` and `PLAN_USER_PROMPT_TEMPLATE`; preserve schema, count limits, interpolations, postfix call site | `PLAN_SYSTEM_PROMPT` (552), `PLAN_USER_PROMPT_TEMPLATE` (591), `plan_outline` (1137) | `plan_outline()` LLM `chat_json` invocation at line 1177 | PLAN flow |
| 2.12.9 | Translate `SECTION_SYSTEM_PROMPT_TEMPLATE` (incl. examples) and `SECTION_USER_PROMPT_TEMPLATE`; preserve `Final Answer:` / `<tool_call>` literals; preserve no-headings instruction | `SECTION_SYSTEM_PROMPT_TEMPLATE` (615), `SECTION_USER_PROMPT_TEMPLATE` (769), `_generate_section_react` (1221) | `_generate_section_react()` LLM `chat` invocation at line 1305 | SECTION ReACT flow |
| 3.13.7 | Translate `CHAT_SYSTEM_PROMPT_TEMPLATE` and `CHAT_OBSERVATION_SUFFIX`; preserve `<tool_call>` literal and prefix-injection contract | `CHAT_SYSTEM_PROMPT_TEMPLATE` (829), `CHAT_OBSERVATION_SUFFIX` (857), `chat` (1766) | `chat()` LLM `chat` invocations at lines 1828, 1868 | CHAT flow |
| 4.14.6 | Translate ReACT loop conversation templates; preserve `Final Answer:` literal; switch separator to `", "` | `REACT_*` constants (796825) | `_generate_section_react()` ReACT loop branches | SECTION ReACT flow |
| 5.15.7 | Translate four `TOOL_DESC_*` blocks, `_define_tools` parameter dict values, `_get_tools_description` leader; preserve tool names | `TOOL_DESC_*` (476548), `_define_tools` (919), `_get_tools_description` (1127) | `_define_tools()` and `_get_tools_description()` return values | SECTION + CHAT flows |
| 6.16.7 | Translate inline LLM-visible strings in `_generate_section_react` and `chat` | Inline strings at 1294, 13161317, 13421346, 1380, 1476, 1799, 1805, 1861 | Direct `messages.append(...)` calls | SECTION + CHAT flows |
| 7.17.3 | Translate `_execute_tool` error returns | f-strings at 1058, 1062 | `_execute_tool()` return value | SECTION + CHAT flows (error path) |
| 8.18.4 | Translate `plan_outline` defaults; preserve `ReportOutline` shape | `plan_outline` defaults at 1197, 12121218 | `plan_outline()` return value | PLAN flow (default + fallback paths) |
| 9.19.5 | Locale switching continues to work | `get_language_instruction()` call sites at 1166, 1262, 1808 | unchanged | All flows |
| 10.110.5 | Public API stable | `ReportAgent`, `ReportManager`, `Report`, `ReportOutline`, `ReportSection`, `ReportStatus` | unchanged | All flows |
| 11.111.5 | End-to-end Step 4 / Step 5 parity | Verification only | unchanged | All flows |
| 12.112.6 | Out-of-scope guardrail | None edited | unchanged | n/a |
## Components and Interfaces
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|-----------|--------------|--------|--------------|--------------------------|-----------|
| Tool description constants | Module-scope constants in `report_agent.py` | LLM-facing tool catalog injected into SECTION + CHAT system prompts via `_get_tools_description` | 5.1, 5.2, 5.7 | `_define_tools` (P0), `_get_tools_description` (P0) | State (string literals only) |
| `PLAN_*` prompts | Module-scope constants | Outline planning system + user prompts | 1.1, 1.2, 1.5, 1.6 | `get_language_instruction` (P0), `plan_outline` (P0) | State |
| `SECTION_*` prompts | Module-scope constants | Section ReACT system + user prompts | 2.1, 2.2, 2.3, 2.4, 2.6, 2.7 | `get_language_instruction` (P0), `_generate_section_react` (P0), `_get_tools_description` (P1) | State |
| `REACT_*` templates | Module-scope constants | ReACT loop user-role messages re-injected after tool calls | 4.1, 4.2, 4.3, 4.4, 4.5 | `_generate_section_react` (P0) | State |
| `CHAT_*` prompts | Module-scope constants | Chat system prompt + observation suffix | 3.1, 3.2, 3.3, 3.4, 3.5, 3.6 | `get_language_instruction` (P0), `chat` (P0), `_get_tools_description` (P1) | State |
| `_define_tools` parameter dict | `ReportAgent` instance method | Catalog of tools + parameter hints, exposed to LLM via `_get_tools_description` | 5.3, 5.4, 5.6 | `_get_tools_description` (P0) | Service |
| `_get_tools_description` | `ReportAgent` instance method | Renders `_define_tools` output as a single string for SECTION + CHAT prompts | 5.5 | `_define_tools` (P0) | Service |
| `_execute_tool` error returns | `ReportAgent` instance method | Returns observation strings to the LLM for unknown-tool / execution-error paths | 7.1, 7.2, 7.3 | `_execute_tool` (P0) | Service |
| `_generate_section_react` inline strings | `ReportAgent` instance method body | LLM-visible strings appended to `messages` during ReACT loop | 6.1, 6.2, 6.3, 6.4 | `_generate_section_react` (P0) | Service |
| `chat` inline strings | `ReportAgent` instance method body | LLM-visible strings appended to `messages` during chat loop | 6.5, 6.6 | `chat` (P0) | Service |
| `plan_outline` defaults | `ReportAgent` instance method body | Default / fallback `ReportOutline` content emitted on success-without-title or exception path | 8.1, 8.2, 8.3, 8.4 | `plan_outline` (P0) | State |
> All components are existing module-scope constants or method-internal expressions. None require a full detail block — the responsibility boundary is "translate the string content; preserve the structural shape". The summary table above plus the requirement-level acceptance criteria in `requirements.md` form a complete contract.
### Implementation Notes (cross-cutting)
- **Translation glossary** (consistent across all components — see `research.md` Decision: Standard English phrasing): 上帝视角 → "god's-eye view"; 未来预演 → "forecast simulation" / "simulated future"; 模拟需求 → "simulation requirement"; 模拟世界 → "simulated world"; 章节 → "section"; 大纲 → "outline"; 引用 → "quote"/"quotation"; 正确示例 → "Correct Example"; 错误示例 → "Wrong Example"; 注意 → "Note"; 重要 → "IMPORTANT"; 工具 → "tool"; 检索 → "retrieval".
- **Literal preservation**: `Final Answer:`, `<tool_call>`, `</tool_call>`, all tool names (`insight_forge`, `panorama_search`, `quick_search`, `interview_agents`, plus legacy aliases), all `{interpolation}` tokens, all JSON schema keys, all emoji / box-drawing characters (`💡`, `═`).
- **Locale-agnostic strings**: `_execute_tool` error returns and `plan_outline` default / fallback outline content are returned regardless of locale (no `get_language_instruction()` injection at those sites). They become locale-agnostic English under this PR.
- **Separator change**: `unused_tools_str = "、".join(unused_tools)` at line 1454 → `", ".join(unused_tools)`. This is the only non-string-literal code change.
## Data Models
No data-model changes. `Report`, `ReportOutline`, `ReportSection`, `ReportStatus`, `Task`, and the report API JSON contract are all preserved verbatim. `Report.to_dict()` and `ReportOutline.to_dict()` shapes are unchanged. The persistence schema under `reports/<id>/` (`meta.json`, `outline.json`, `progress.json`, `section_NN.md`, `full_report.md`, `agent_log.jsonl`, `console_log.txt`) is unchanged.
## Error Handling
### Error Strategy
No new error types or recovery strategies. The translated `_execute_tool` error returns and `plan_outline` exception-path fallback continue to behave identically — the only change is the string content.
### Error Categories and Responses
- **Unknown-tool error**: `_execute_tool` returns a translated English string `"Unknown tool: {tool_name}. Please use one of: insight_forge, panorama_search, quick_search"`. The string is fed back to the LLM as the next user-role observation.
- **Tool-execution exception**: `_execute_tool` returns a translated English string `"Tool execution failed: {str(e)}"`. Same flow.
- **`plan_outline` LLM exception**: returns the translated English fallback `ReportOutline` (3 sections). Downstream report assembly proceeds normally.
- **Empty-response retry / conflict-handling / insufficient-tools**: translated English messages re-injected into the LLM message stream (R6, R4 acceptance criteria). Loop control flow unchanged.
## Testing Strategy
### Default sections (adapted to translation work)
- **Static lint**: `python -m py_compile backend/app/services/report_agent.py` — must pass.
- **Zero-Chinese assertion** (in-scope regions): a verification harness (a small ad-hoc script under `scripts/` if needed, deleted before PR) imports `report_agent` and runs `re.findall(r'[一-鿿]', literal)` over each in-scope constant, expecting an empty list. The single permitted Chinese remnant is the `logger.debug` f-string at line 1322 (not in scope).
- **Interpolation-shape parity**: invoke `PLAN_USER_PROMPT_TEMPLATE.format(simulation_requirement="x", total_nodes=0, total_edges=0, entity_types=[], total_entities=0, related_facts_json="[]")`, `SECTION_SYSTEM_PROMPT_TEMPLATE.format(report_title="x", report_summary="y", simulation_requirement="z", section_title="t", tools_description="d")`, `SECTION_USER_PROMPT_TEMPLATE.format(previous_content="x", section_title="t")`, `CHAT_SYSTEM_PROMPT_TEMPLATE.format(simulation_requirement="x", report_content="r", tools_description="d")`, `REACT_OBSERVATION_TEMPLATE.format(tool_name="x", result="y", tool_calls_count=1, max_tool_calls=5, used_tools_str="a, b", unused_hint="z")`, etc. — each must render without raising `KeyError`.
- **Trigger-literal preservation**: assert that `"Final Answer:"` is a substring of the translated `SECTION_SYSTEM_PROMPT_TEMPLATE`, `SECTION_USER_PROMPT_TEMPLATE`, `REACT_OBSERVATION_TEMPLATE`, `REACT_TOOL_LIMIT_MSG`, and `REACT_FORCE_FINAL_MSG`; assert that `"<tool_call>"` is a substring of the translated `SECTION_SYSTEM_PROMPT_TEMPLATE` and `CHAT_SYSTEM_PROMPT_TEMPLATE`.
- **Tool-name preservation**: assert that all four primary tool names appear unchanged in the translated `_define_tools` keys and in the translated `TOOL_DESC_*` blocks.
- **End-to-end (deferred)**: per the precedent of issues #2/#3/#4, full pipeline runs under `Accept-Language: en` and `Accept-Language: zh` are not part of CI for this PR. Reviewer trust applies. If feasible in the implementer's local environment, a single sample run under `en` to confirm no Markdown headings leak into section bodies and a single sample run under `zh` to confirm Chinese output quality is preserved — both optional confidence boosters, not gates.
## Security Considerations
No new security surface. Translated prompts do not expose new endpoints, do not add new external calls, and do not change authorization semantics. The `_execute_tool` error returns continue to expose `str(e)` from any caught exception — pre-existing behavior, unchanged by this PR.
## Performance & Scalability
No performance regression expected. English prompts may be ~1030% longer in token count than the equivalent Chinese (English requires more tokens for the same semantic content), but this is well within the 4096 `max_tokens` ceiling on the section LLM call and the model's overall context budget. No caching, no batching, no concurrency change.
## Migration Strategy
No data or schema migration. The change is a single in-place edit. Rollback strategy: revert the single commit on `feat/i18n-5-translate-report-agent-prompts` if a regression is detected.
## Supporting References
- Detailed discovery, alternatives evaluation, decision rationale, and risk register: `.kiro/specs/i18n-report-agent-prompts/research.md`.
- Sibling spec (i18n-simulation-config-generator-prompts): `.kiro/specs/i18n-simulation-config-generator-prompts/{requirements,design,gap-analysis,research}.md`.
- Sibling commits: `0806832` (#2), `9d1d29b` (#3), `6c2a412` (#4).
- Ticket snapshot: `.ticket/5.md`.