23 KiB

Raw Blame History

Design Document — i18n-report-agent-prompts

Overview

Purpose: Translate every Chinese string-literal that flows into the LLM message stream of backend/app/services/report_agent.py into English so that, under Accept-Language: en, the Report-Agent produces English-flavoured analytical reports and chat replies — and not the Chinese-biased output that today's Chinese-base prompts produce despite the get_language_instruction() English postfix.

Users: MiroFish operators running the 5-step pipeline under English locale; reviewers tracking the i18n epic (#11); developers maintaining sibling i18n issues (#6, #7, #8, #10) downstream of this change.

Impact: Behavioural — under Accept-Language: en, the report's section titles, section bodies, embedded quotations, and chat replies become English-flavoured. No public-API change. No Report.to_dict() shape change. No new dependencies.

Goals

Replace every Chinese string-literal in report_agent.py that is sent to the LLM (system prompt, user prompt, ReACT loop messages, tool descriptions, _define_tools parameter hints, _execute_tool error returns, plan_outline defaults) with English equivalents.
Preserve every variable interpolation, every JSON schema key, every literal trigger string (Final Answer:, <tool_call>, tool-name strings), every get_language_instruction() call site.
Keep the public surface of ReportAgent, ReportManager, Report, ReportOutline, ReportSection, ReportStatus byte-for-byte equivalent in shape.

Non-Goals

Logger calls (logger.info, logger.warning, logger.error, logger.debug) inside the same file — owned by issue #6. Notably, the single raw-Chinese logger.debug(f"LLM响应: ...") at line 1322 is left untouched.
Module docstring (lines 1–11), class docstrings, dataclass docstrings, method docstrings, inline # comments — owned by issue #7.
Refactoring prompt structure, the JSON output schema of PLAN_SYSTEM_PROMPT, the ReACT loop control flow, conflict-resolution branches, or the chat tool-budget caps.
Externalizing prompts into /locales/*.json.
Live end-to-end report generation under both en and zh (deferred to fixture-based static checks; reviewer trust on quality parity, matching the precedent of issues #2/#3/#4).

Boundary Commitments

This Spec Owns

The string-literal content of all LLM-facing regions in backend/app/services/report_agent.py:
- Tool description constants TOOL_DESC_INSIGHT_FORGE (476–492), TOOL_DESC_PANORAMA_SEARCH (494–509), TOOL_DESC_QUICK_SEARCH (511–521), TOOL_DESC_INTERVIEW_AGENTS (523–548).
- PLAN-phase prompts PLAN_SYSTEM_PROMPT (552–589), PLAN_USER_PROMPT_TEMPLATE (591–611).
- EXEC-phase prompts SECTION_SYSTEM_PROMPT_TEMPLATE (615–767), SECTION_USER_PROMPT_TEMPLATE (769–792), including the embedded "Correct Example" / "Wrong Example" code blocks.
- ReACT loop conversation templates REACT_OBSERVATION_TEMPLATE (796–806), REACT_INSUFFICIENT_TOOLS_MSG (808–811), REACT_INSUFFICIENT_TOOLS_MSG_ALT (813–816), REACT_TOOL_LIMIT_MSG (818–821), REACT_UNUSED_TOOLS_HINT (823), REACT_FORCE_FINAL_MSG (825).
- CHAT-phase prompts CHAT_SYSTEM_PROMPT_TEMPLATE (829–855), CHAT_OBSERVATION_SUFFIX (857).
- The _define_tools parameter-description dict values (925–952) and the _get_tools_description leader "可用工具：" (1129).
- The _execute_tool error returns at lines 1058 and 1062.
- The inline LLM-visible strings inside _generate_section_react: report_context f-string (1294), empty-response retry (1316–1317), conflict-handling block (1342–1346), inline unused_hint literals (1380, 1476).
- The inline LLM-visible strings inside chat: report-truncated marker (1799), no-report fallback (1805), observation joiner (1861).
- The default / fallback outline content in plan_outline: success-path default title (1197), exception-path fallback ReportOutline (1212–1218).
The unused_tools_str join separator at line 1454 — switch from "、" to ", " for natural English rendering inside the now-English ReACT templates.

Out of Boundary

All logger.* calls in this file (issue #6), including the one raw-Chinese logger.debug at line 1322.
All """...""" docstrings and # comments in this file (issue #7).
backend/app/utils/locale.py, /locales/*.json, /locales/languages.json.
backend/app/services/zep_tools.py, zep_entity_reader.py, zep_graph_memory_updater.py.
backend/app/api/report.py, backend/app/api/simulation.py, backend/app/api/graph.py.
backend/app/services/simulation_runner.py, simulation_ipc.py, OASIS subprocess source.
backend/app/config.py constants.
backend/pyproject.toml, backend/uv.lock.
All other files in the repository.

Allowed Dependencies

Read access to get_language_instruction() from backend/app/utils/locale.py — three call sites preserved verbatim (lines 1166, 1262, 1808).
Read access to t(...) from backend/app/utils/locale.py — call sites preserved verbatim.
No new external dependencies.

Revalidation Triggers

A change to the Report.to_dict() payload shape would force the report API blueprint and the frontend report panel to re-validate. This spec does not change the shape.
A change to the PLAN_SYSTEM_PROMPT JSON output schema (title, summary, sections[].title, sections[].description) would force plan_outline()'s response parser to re-validate. This spec preserves the schema verbatim.
A change to the Final Answer: literal trigger or the <tool_call>...</tool_call> XML tag would force _generate_section_react's parser branches to re-validate. This spec preserves both byte-for-byte.
A change to the four primary tool names (insight_forge, panorama_search, quick_search, interview_agents) or the legacy aliases (search_graph, get_graph_statistics, get_entity_summary, get_simulation_context, get_entities_by_type) would force _execute_tool and _is_valid_tool_call to re-validate. This spec does not rename tools.

Architecture

Existing Architecture Analysis

ReportAgent is a single Python class in backend/app/services/report_agent.py. The three LLM invocation paths (PLAN, SECTION, CHAT) follow a uniform pattern:

system_prompt = <chinese system prompt template>
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"
user_prompt = <chinese user prompt template with {interpolations}>
response = self.llm.chat(messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
])

_generate_section_react extends this with a multi-turn ReACT loop where the user-role messages re-injected after each tool call (REACT_OBSERVATION_TEMPLATE, etc.) are also Chinese today. There is no abstraction layer between prompt construction and LLM invocation — the prompt text and the call site are colocated. This matches sister modules (simulation_config_generator.py, oasis_profile_generator.py, ontology_generator.py).

Architecture Pattern & Boundary Map

Selected pattern: In-place string-literal translation. No new components, no new modules, no new abstractions.

flowchart TB
    subgraph Caller["Caller — api/report.py"]
        api["POST /api/report/generate<br/>POST /api/report/chat"]
    end

    subgraph ReportAgentMod["report_agent.py — IN SCOPE"]
        plan["plan_outline<br/>**translate PLAN_*, defaults**"]
        sec["_generate_section_react<br/>**translate SECTION_*, REACT_*, inline strings**"]
        chat["chat<br/>**translate CHAT_*, inline strings**"]
        tools["_define_tools / _get_tools_description<br/>**translate TOOL_DESC_*, params, leader**"]
        exec["_execute_tool<br/>**translate error returns**"]
        parse["_parse_tool_calls<br/>UNCHANGED (matches literals)"]
        manager["ReportManager<br/>UNCHANGED (persistence)"]
    end

    subgraph Locale["utils/locale.py — UNCHANGED"]
        gli[get_language_instruction]
        tr[t]
    end

    subgraph ZepTools["services/zep_tools.py — UNCHANGED"]
        zt[ZepTools dispatch]
    end

    api --> plan
    api --> sec
    api --> chat
    plan --> gli
    sec --> gli
    chat --> gli
    sec --> tools
    chat --> tools
    sec --> parse
    sec --> exec
    chat --> parse
    chat --> exec
    exec --> zt
    plan --> manager
    sec --> manager

Architecture Integration:

Selected pattern: in-place string-literal translation; matches the precedent of issues #2/#3/#4.
Domain/feature boundaries: prompt-content is the only boundary that moves. Logger / docstring / comment boundaries (issues #6, #7) and persistence-layer boundary (ReportManager) are explicitly preserved.
Existing patterns preserved: get_language_instruction() postfix injection at three call sites; <tool_call> XML protocol; Final Answer: literal trigger; tool-name registry; JSON output schema for outline planning.
New components rationale: none — no new components.
Steering compliance: respects tech.md "preserve both styles working" for comments/docstrings (those are out of scope); respects structure.md per-project file isolation; respects commits.md Conventional Commits format for the eventual commit message.

Technology Stack

Layer	Choice / Version	Role in Feature	Notes
Frontend / CLI	n/a	Frontend renders the translated `Report` payload as plain text/Markdown	No frontend change required
Backend / Services	Python 3.11, Flask 3.0	Hosts `ReportAgent` and the report API	Single-file edit
Data / Storage	Neo4j + Graphiti	Source of retrieval results consumed by `zep_tools`	Unchanged
Messaging / Events	n/a	Report generation runs as a background `Task`	Unchanged
Infrastructure / Runtime	uv-managed venv	Backend dependency manager	No new dependencies

No new external dependencies, libraries, or infrastructure components are introduced. Detailed locale-resolution mechanics are documented in research.md.

File Structure Plan

Modified Files

backend/app/services/report_agent.py — translate every Chinese string-literal that is sent to the LLM, plus the one separator literal at line 1454. No structural code changes; no new methods; no new constants. Line counts will shift due to the typically larger English character count, but the file's overall organization is unchanged.

Unmodified Files (explicitly verified)

backend/app/utils/locale.py
backend/app/services/zep_tools.py, zep_entity_reader.py, zep_graph_memory_updater.py
backend/app/api/report.py, simulation.py, graph.py
backend/app/services/simulation_runner.py, simulation_ipc.py
backend/app/config.py
backend/pyproject.toml, backend/uv.lock
/locales/en.json, /locales/zh.json, /locales/languages.json
All frontend files

System Flows

The PLAN / SECTION / CHAT flows are unchanged at the control-flow level — only the string content of system / user / observation messages is translated. No new diagram is required; research.md records the relevant parser-trigger details.

Requirements Traceability

Requirement	Summary	Components	Interfaces	Flows
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7	Translate `PLAN_SYSTEM_PROMPT` and `PLAN_USER_PROMPT_TEMPLATE`; preserve schema, count limits, interpolations, postfix call site	`PLAN_SYSTEM_PROMPT` (552), `PLAN_USER_PROMPT_TEMPLATE` (591), `plan_outline` (1137)	`plan_outline()` LLM `chat_json` invocation at line 1177	PLAN flow
2.1–2.9	Translate `SECTION_SYSTEM_PROMPT_TEMPLATE` (incl. examples) and `SECTION_USER_PROMPT_TEMPLATE`; preserve `Final Answer:` / `<tool_call>` literals; preserve no-headings instruction	`SECTION_SYSTEM_PROMPT_TEMPLATE` (615), `SECTION_USER_PROMPT_TEMPLATE` (769), `_generate_section_react` (1221)	`_generate_section_react()` LLM `chat` invocation at line 1305	SECTION ReACT flow
3.1–3.7	Translate `CHAT_SYSTEM_PROMPT_TEMPLATE` and `CHAT_OBSERVATION_SUFFIX`; preserve `<tool_call>` literal and prefix-injection contract	`CHAT_SYSTEM_PROMPT_TEMPLATE` (829), `CHAT_OBSERVATION_SUFFIX` (857), `chat` (1766)	`chat()` LLM `chat` invocations at lines 1828, 1868	CHAT flow
4.1–4.6	Translate ReACT loop conversation templates; preserve `Final Answer:` literal; switch separator to `", "`	`REACT_*` constants (796–825)	`_generate_section_react()` ReACT loop branches	SECTION ReACT flow
5.1–5.7	Translate four `TOOL_DESC_*` blocks, `_define_tools` parameter dict values, `_get_tools_description` leader; preserve tool names	`TOOL_DESC_*` (476–548), `_define_tools` (919), `_get_tools_description` (1127)	`_define_tools()` and `_get_tools_description()` return values	SECTION + CHAT flows
6.1–6.7	Translate inline LLM-visible strings in `_generate_section_react` and `chat`	Inline strings at 1294, 1316–1317, 1342–1346, 1380, 1476, 1799, 1805, 1861	Direct `messages.append(...)` calls	SECTION + CHAT flows
7.1–7.3	Translate `_execute_tool` error returns	f-strings at 1058, 1062	`_execute_tool()` return value	SECTION + CHAT flows (error path)
8.1–8.4	Translate `plan_outline` defaults; preserve `ReportOutline` shape	`plan_outline` defaults at 1197, 1212–1218	`plan_outline()` return value	PLAN flow (default + fallback paths)
9.1–9.5	Locale switching continues to work	`get_language_instruction()` call sites at 1166, 1262, 1808	unchanged	All flows
10.1–10.5	Public API stable	`ReportAgent`, `ReportManager`, `Report`, `ReportOutline`, `ReportSection`, `ReportStatus`	unchanged	All flows
11.1–11.5	End-to-end Step 4 / Step 5 parity	Verification only	unchanged	All flows
12.1–12.6	Out-of-scope guardrail	None edited	unchanged	n/a

Components and Interfaces

Component	Domain/Layer	Intent	Req Coverage	Key Dependencies (P0/P1)	Contracts
Tool description constants	Module-scope constants in `report_agent.py`	LLM-facing tool catalog injected into SECTION + CHAT system prompts via `_get_tools_description`	5.1, 5.2, 5.7	`_define_tools` (P0), `_get_tools_description` (P0)	State (string literals only)
`PLAN_*` prompts	Module-scope constants	Outline planning system + user prompts	1.1, 1.2, 1.5, 1.6	`get_language_instruction` (P0), `plan_outline` (P0)	State
`SECTION_*` prompts	Module-scope constants	Section ReACT system + user prompts	2.1, 2.2, 2.3, 2.4, 2.6, 2.7	`get_language_instruction` (P0), `_generate_section_react` (P0), `_get_tools_description` (P1)	State
`REACT_*` templates	Module-scope constants	ReACT loop user-role messages re-injected after tool calls	4.1, 4.2, 4.3, 4.4, 4.5	`_generate_section_react` (P0)	State
`CHAT_*` prompts	Module-scope constants	Chat system prompt + observation suffix	3.1, 3.2, 3.3, 3.4, 3.5, 3.6	`get_language_instruction` (P0), `chat` (P0), `_get_tools_description` (P1)	State
`_define_tools` parameter dict	`ReportAgent` instance method	Catalog of tools + parameter hints, exposed to LLM via `_get_tools_description`	5.3, 5.4, 5.6	`_get_tools_description` (P0)	Service
`_get_tools_description`	`ReportAgent` instance method	Renders `_define_tools` output as a single string for SECTION + CHAT prompts	5.5	`_define_tools` (P0)	Service
`_execute_tool` error returns	`ReportAgent` instance method	Returns observation strings to the LLM for unknown-tool / execution-error paths	7.1, 7.2, 7.3	`_execute_tool` (P0)	Service
`_generate_section_react` inline strings	`ReportAgent` instance method body	LLM-visible strings appended to `messages` during ReACT loop	6.1, 6.2, 6.3, 6.4	`_generate_section_react` (P0)	Service
`chat` inline strings	`ReportAgent` instance method body	LLM-visible strings appended to `messages` during chat loop	6.5, 6.6	`chat` (P0)	Service
`plan_outline` defaults	`ReportAgent` instance method body	Default / fallback `ReportOutline` content emitted on success-without-title or exception path	8.1, 8.2, 8.3, 8.4	`plan_outline` (P0)	State

All components are existing module-scope constants or method-internal expressions. None require a full detail block — the responsibility boundary is "translate the string content; preserve the structural shape". The summary table above plus the requirement-level acceptance criteria in requirements.md form a complete contract.

Implementation Notes (cross-cutting)

Translation glossary (consistent across all components — see research.md Decision: Standard English phrasing): 上帝视角 → "god's-eye view"; 未来预演 → "forecast simulation" / "simulated future"; 模拟需求 → "simulation requirement"; 模拟世界 → "simulated world"; 章节 → "section"; 大纲 → "outline"; 引用 → "quote"/"quotation"; 正确示例 → "Correct Example"; 错误示例 → "Wrong Example"; 注意 → "Note"; 重要 → "IMPORTANT"; 工具 → "tool"; 检索 → "retrieval".
Literal preservation: Final Answer:, <tool_call>, </tool_call>, all tool names (insight_forge, panorama_search, quick_search, interview_agents, plus legacy aliases), all {interpolation} tokens, all JSON schema keys, all emoji / box-drawing characters (💡, ═).
Locale-agnostic strings: _execute_tool error returns and plan_outline default / fallback outline content are returned regardless of locale (no get_language_instruction() injection at those sites). They become locale-agnostic English under this PR.
Separator change: unused_tools_str = "、".join(unused_tools) at line 1454 → ", ".join(unused_tools). This is the only non-string-literal code change.

Data Models

No data-model changes. Report, ReportOutline, ReportSection, ReportStatus, Task, and the report API JSON contract are all preserved verbatim. Report.to_dict() and ReportOutline.to_dict() shapes are unchanged. The persistence schema under reports/<id>/ (meta.json, outline.json, progress.json, section_NN.md, full_report.md, agent_log.jsonl, console_log.txt) is unchanged.

Error Handling

Error Strategy

No new error types or recovery strategies. The translated _execute_tool error returns and plan_outline exception-path fallback continue to behave identically — the only change is the string content.

Error Categories and Responses

Unknown-tool error: _execute_tool returns a translated English string "Unknown tool: {tool_name}. Please use one of: insight_forge, panorama_search, quick_search". The string is fed back to the LLM as the next user-role observation.
Tool-execution exception: _execute_tool returns a translated English string "Tool execution failed: {str(e)}". Same flow.
plan_outline LLM exception: returns the translated English fallback ReportOutline (3 sections). Downstream report assembly proceeds normally.
Empty-response retry / conflict-handling / insufficient-tools: translated English messages re-injected into the LLM message stream (R6, R4 acceptance criteria). Loop control flow unchanged.

Testing Strategy

Default sections (adapted to translation work)

Static lint: python -m py_compile backend/app/services/report_agent.py — must pass.
Zero-Chinese assertion (in-scope regions): a verification harness (a small ad-hoc script under scripts/ if needed, deleted before PR) imports report_agent and runs re.findall(r'[一-鿿]', literal) over each in-scope constant, expecting an empty list. The single permitted Chinese remnant is the logger.debug f-string at line 1322 (not in scope).
Interpolation-shape parity: invoke PLAN_USER_PROMPT_TEMPLATE.format(simulation_requirement="x", total_nodes=0, total_edges=0, entity_types=[], total_entities=0, related_facts_json="[]"), SECTION_SYSTEM_PROMPT_TEMPLATE.format(report_title="x", report_summary="y", simulation_requirement="z", section_title="t", tools_description="d"), SECTION_USER_PROMPT_TEMPLATE.format(previous_content="x", section_title="t"), CHAT_SYSTEM_PROMPT_TEMPLATE.format(simulation_requirement="x", report_content="r", tools_description="d"), REACT_OBSERVATION_TEMPLATE.format(tool_name="x", result="y", tool_calls_count=1, max_tool_calls=5, used_tools_str="a, b", unused_hint="z"), etc. — each must render without raising KeyError.
Trigger-literal preservation: assert that "Final Answer:" is a substring of the translated SECTION_SYSTEM_PROMPT_TEMPLATE, SECTION_USER_PROMPT_TEMPLATE, REACT_OBSERVATION_TEMPLATE, REACT_TOOL_LIMIT_MSG, and REACT_FORCE_FINAL_MSG; assert that "<tool_call>" is a substring of the translated SECTION_SYSTEM_PROMPT_TEMPLATE and CHAT_SYSTEM_PROMPT_TEMPLATE.
Tool-name preservation: assert that all four primary tool names appear unchanged in the translated _define_tools keys and in the translated TOOL_DESC_* blocks.
End-to-end (deferred): per the precedent of issues #2/#3/#4, full pipeline runs under Accept-Language: en and Accept-Language: zh are not part of CI for this PR. Reviewer trust applies. If feasible in the implementer's local environment, a single sample run under en to confirm no Markdown headings leak into section bodies and a single sample run under zh to confirm Chinese output quality is preserved — both optional confidence boosters, not gates.

Security Considerations

No new security surface. Translated prompts do not expose new endpoints, do not add new external calls, and do not change authorization semantics. The _execute_tool error returns continue to expose str(e) from any caught exception — pre-existing behavior, unchanged by this PR.

Performance & Scalability

No performance regression expected. English prompts may be ~10–30% longer in token count than the equivalent Chinese (English requires more tokens for the same semantic content), but this is well within the 4096 max_tokens ceiling on the section LLM call and the model's overall context budget. No caching, no batching, no concurrency change.

Migration Strategy

No data or schema migration. The change is a single in-place edit. Rollback strategy: revert the single commit on feat/i18n-5-translate-report-agent-prompts if a regression is detected.

Supporting References

Detailed discovery, alternatives evaluation, decision rationale, and risk register: .kiro/specs/i18n-report-agent-prompts/research.md.
Sibling spec (i18n-simulation-config-generator-prompts): .kiro/specs/i18n-simulation-config-generator-prompts/{requirements,design,gap-analysis,research}.md.
Sibling commits: 0806832 (#2), 9d1d29b (#3), 6c2a412 (#4).
Ticket snapshot: .ticket/5.md.

23 KiB Raw Blame History Unescape Escape