MicroFish/.kiro/specs/i18n-report-agent-prompts/design.md

23 KiB
Raw Blame History

Design Document — i18n-report-agent-prompts

Overview

Purpose: Translate every Chinese string-literal that flows into the LLM message stream of backend/app/services/report_agent.py into English so that, under Accept-Language: en, the Report-Agent produces English-flavoured analytical reports and chat replies — and not the Chinese-biased output that today's Chinese-base prompts produce despite the get_language_instruction() English postfix.

Users: MiroFish operators running the 5-step pipeline under English locale; reviewers tracking the i18n epic (#11); developers maintaining sibling i18n issues (#6, #7, #8, #10) downstream of this change.

Impact: Behavioural — under Accept-Language: en, the report's section titles, section bodies, embedded quotations, and chat replies become English-flavoured. No public-API change. No Report.to_dict() shape change. No new dependencies.

Goals

  • Replace every Chinese string-literal in report_agent.py that is sent to the LLM (system prompt, user prompt, ReACT loop messages, tool descriptions, _define_tools parameter hints, _execute_tool error returns, plan_outline defaults) with English equivalents.
  • Preserve every variable interpolation, every JSON schema key, every literal trigger string (Final Answer:, <tool_call>, tool-name strings), every get_language_instruction() call site.
  • Keep the public surface of ReportAgent, ReportManager, Report, ReportOutline, ReportSection, ReportStatus byte-for-byte equivalent in shape.

Non-Goals

  • Logger calls (logger.info, logger.warning, logger.error, logger.debug) inside the same file — owned by issue #6. Notably, the single raw-Chinese logger.debug(f"LLM响应: ...") at line 1322 is left untouched.
  • Module docstring (lines 111), class docstrings, dataclass docstrings, method docstrings, inline # comments — owned by issue #7.
  • Refactoring prompt structure, the JSON output schema of PLAN_SYSTEM_PROMPT, the ReACT loop control flow, conflict-resolution branches, or the chat tool-budget caps.
  • Externalizing prompts into /locales/*.json.
  • Live end-to-end report generation under both en and zh (deferred to fixture-based static checks; reviewer trust on quality parity, matching the precedent of issues #2/#3/#4).

Boundary Commitments

This Spec Owns

  • The string-literal content of all LLM-facing regions in backend/app/services/report_agent.py:
    • Tool description constants TOOL_DESC_INSIGHT_FORGE (476492), TOOL_DESC_PANORAMA_SEARCH (494509), TOOL_DESC_QUICK_SEARCH (511521), TOOL_DESC_INTERVIEW_AGENTS (523548).
    • PLAN-phase prompts PLAN_SYSTEM_PROMPT (552589), PLAN_USER_PROMPT_TEMPLATE (591611).
    • EXEC-phase prompts SECTION_SYSTEM_PROMPT_TEMPLATE (615767), SECTION_USER_PROMPT_TEMPLATE (769792), including the embedded "Correct Example" / "Wrong Example" code blocks.
    • ReACT loop conversation templates REACT_OBSERVATION_TEMPLATE (796806), REACT_INSUFFICIENT_TOOLS_MSG (808811), REACT_INSUFFICIENT_TOOLS_MSG_ALT (813816), REACT_TOOL_LIMIT_MSG (818821), REACT_UNUSED_TOOLS_HINT (823), REACT_FORCE_FINAL_MSG (825).
    • CHAT-phase prompts CHAT_SYSTEM_PROMPT_TEMPLATE (829855), CHAT_OBSERVATION_SUFFIX (857).
    • The _define_tools parameter-description dict values (925952) and the _get_tools_description leader "可用工具:" (1129).
    • The _execute_tool error returns at lines 1058 and 1062.
    • The inline LLM-visible strings inside _generate_section_react: report_context f-string (1294), empty-response retry (13161317), conflict-handling block (13421346), inline unused_hint literals (1380, 1476).
    • The inline LLM-visible strings inside chat: report-truncated marker (1799), no-report fallback (1805), observation joiner (1861).
    • The default / fallback outline content in plan_outline: success-path default title (1197), exception-path fallback ReportOutline (12121218).
  • The unused_tools_str join separator at line 1454 — switch from "、" to ", " for natural English rendering inside the now-English ReACT templates.

Out of Boundary

  • All logger.* calls in this file (issue #6), including the one raw-Chinese logger.debug at line 1322.
  • All """...""" docstrings and # comments in this file (issue #7).
  • backend/app/utils/locale.py, /locales/*.json, /locales/languages.json.
  • backend/app/services/zep_tools.py, zep_entity_reader.py, zep_graph_memory_updater.py.
  • backend/app/api/report.py, backend/app/api/simulation.py, backend/app/api/graph.py.
  • backend/app/services/simulation_runner.py, simulation_ipc.py, OASIS subprocess source.
  • backend/app/config.py constants.
  • backend/pyproject.toml, backend/uv.lock.
  • All other files in the repository.

Allowed Dependencies

  • Read access to get_language_instruction() from backend/app/utils/locale.py — three call sites preserved verbatim (lines 1166, 1262, 1808).
  • Read access to t(...) from backend/app/utils/locale.py — call sites preserved verbatim.
  • No new external dependencies.

Revalidation Triggers

  • A change to the Report.to_dict() payload shape would force the report API blueprint and the frontend report panel to re-validate. This spec does not change the shape.
  • A change to the PLAN_SYSTEM_PROMPT JSON output schema (title, summary, sections[].title, sections[].description) would force plan_outline()'s response parser to re-validate. This spec preserves the schema verbatim.
  • A change to the Final Answer: literal trigger or the <tool_call>...</tool_call> XML tag would force _generate_section_react's parser branches to re-validate. This spec preserves both byte-for-byte.
  • A change to the four primary tool names (insight_forge, panorama_search, quick_search, interview_agents) or the legacy aliases (search_graph, get_graph_statistics, get_entity_summary, get_simulation_context, get_entities_by_type) would force _execute_tool and _is_valid_tool_call to re-validate. This spec does not rename tools.

Architecture

Existing Architecture Analysis

ReportAgent is a single Python class in backend/app/services/report_agent.py. The three LLM invocation paths (PLAN, SECTION, CHAT) follow a uniform pattern:

system_prompt = <chinese system prompt template>
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"
user_prompt = <chinese user prompt template with {interpolations}>
response = self.llm.chat(messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
])

_generate_section_react extends this with a multi-turn ReACT loop where the user-role messages re-injected after each tool call (REACT_OBSERVATION_TEMPLATE, etc.) are also Chinese today. There is no abstraction layer between prompt construction and LLM invocation — the prompt text and the call site are colocated. This matches sister modules (simulation_config_generator.py, oasis_profile_generator.py, ontology_generator.py).

Architecture Pattern & Boundary Map

Selected pattern: In-place string-literal translation. No new components, no new modules, no new abstractions.

flowchart TB
    subgraph Caller["Caller — api/report.py"]
        api["POST /api/report/generate<br/>POST /api/report/chat"]
    end

    subgraph ReportAgentMod["report_agent.py — IN SCOPE"]
        plan["plan_outline<br/>**translate PLAN_*, defaults**"]
        sec["_generate_section_react<br/>**translate SECTION_*, REACT_*, inline strings**"]
        chat["chat<br/>**translate CHAT_*, inline strings**"]
        tools["_define_tools / _get_tools_description<br/>**translate TOOL_DESC_*, params, leader**"]
        exec["_execute_tool<br/>**translate error returns**"]
        parse["_parse_tool_calls<br/>UNCHANGED (matches literals)"]
        manager["ReportManager<br/>UNCHANGED (persistence)"]
    end

    subgraph Locale["utils/locale.py — UNCHANGED"]
        gli[get_language_instruction]
        tr[t]
    end

    subgraph ZepTools["services/zep_tools.py — UNCHANGED"]
        zt[ZepTools dispatch]
    end

    api --> plan
    api --> sec
    api --> chat
    plan --> gli
    sec --> gli
    chat --> gli
    sec --> tools
    chat --> tools
    sec --> parse
    sec --> exec
    chat --> parse
    chat --> exec
    exec --> zt
    plan --> manager
    sec --> manager

Architecture Integration:

  • Selected pattern: in-place string-literal translation; matches the precedent of issues #2/#3/#4.
  • Domain/feature boundaries: prompt-content is the only boundary that moves. Logger / docstring / comment boundaries (issues #6, #7) and persistence-layer boundary (ReportManager) are explicitly preserved.
  • Existing patterns preserved: get_language_instruction() postfix injection at three call sites; <tool_call> XML protocol; Final Answer: literal trigger; tool-name registry; JSON output schema for outline planning.
  • New components rationale: none — no new components.
  • Steering compliance: respects tech.md "preserve both styles working" for comments/docstrings (those are out of scope); respects structure.md per-project file isolation; respects commits.md Conventional Commits format for the eventual commit message.

Technology Stack

Layer Choice / Version Role in Feature Notes
Frontend / CLI n/a Frontend renders the translated Report payload as plain text/Markdown No frontend change required
Backend / Services Python 3.11, Flask 3.0 Hosts ReportAgent and the report API Single-file edit
Data / Storage Neo4j + Graphiti Source of retrieval results consumed by zep_tools Unchanged
Messaging / Events n/a Report generation runs as a background Task Unchanged
Infrastructure / Runtime uv-managed venv Backend dependency manager No new dependencies

No new external dependencies, libraries, or infrastructure components are introduced. Detailed locale-resolution mechanics are documented in research.md.

File Structure Plan

Modified Files

  • backend/app/services/report_agent.py — translate every Chinese string-literal that is sent to the LLM, plus the one separator literal at line 1454. No structural code changes; no new methods; no new constants. Line counts will shift due to the typically larger English character count, but the file's overall organization is unchanged.

Unmodified Files (explicitly verified)

  • backend/app/utils/locale.py
  • backend/app/services/zep_tools.py, zep_entity_reader.py, zep_graph_memory_updater.py
  • backend/app/api/report.py, simulation.py, graph.py
  • backend/app/services/simulation_runner.py, simulation_ipc.py
  • backend/app/config.py
  • backend/pyproject.toml, backend/uv.lock
  • /locales/en.json, /locales/zh.json, /locales/languages.json
  • All frontend files

System Flows

The PLAN / SECTION / CHAT flows are unchanged at the control-flow level — only the string content of system / user / observation messages is translated. No new diagram is required; research.md records the relevant parser-trigger details.

Requirements Traceability

Requirement Summary Components Interfaces Flows
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 Translate PLAN_SYSTEM_PROMPT and PLAN_USER_PROMPT_TEMPLATE; preserve schema, count limits, interpolations, postfix call site PLAN_SYSTEM_PROMPT (552), PLAN_USER_PROMPT_TEMPLATE (591), plan_outline (1137) plan_outline() LLM chat_json invocation at line 1177 PLAN flow
2.12.9 Translate SECTION_SYSTEM_PROMPT_TEMPLATE (incl. examples) and SECTION_USER_PROMPT_TEMPLATE; preserve Final Answer: / <tool_call> literals; preserve no-headings instruction SECTION_SYSTEM_PROMPT_TEMPLATE (615), SECTION_USER_PROMPT_TEMPLATE (769), _generate_section_react (1221) _generate_section_react() LLM chat invocation at line 1305 SECTION ReACT flow
3.13.7 Translate CHAT_SYSTEM_PROMPT_TEMPLATE and CHAT_OBSERVATION_SUFFIX; preserve <tool_call> literal and prefix-injection contract CHAT_SYSTEM_PROMPT_TEMPLATE (829), CHAT_OBSERVATION_SUFFIX (857), chat (1766) chat() LLM chat invocations at lines 1828, 1868 CHAT flow
4.14.6 Translate ReACT loop conversation templates; preserve Final Answer: literal; switch separator to ", " REACT_* constants (796825) _generate_section_react() ReACT loop branches SECTION ReACT flow
5.15.7 Translate four TOOL_DESC_* blocks, _define_tools parameter dict values, _get_tools_description leader; preserve tool names TOOL_DESC_* (476548), _define_tools (919), _get_tools_description (1127) _define_tools() and _get_tools_description() return values SECTION + CHAT flows
6.16.7 Translate inline LLM-visible strings in _generate_section_react and chat Inline strings at 1294, 13161317, 13421346, 1380, 1476, 1799, 1805, 1861 Direct messages.append(...) calls SECTION + CHAT flows
7.17.3 Translate _execute_tool error returns f-strings at 1058, 1062 _execute_tool() return value SECTION + CHAT flows (error path)
8.18.4 Translate plan_outline defaults; preserve ReportOutline shape plan_outline defaults at 1197, 12121218 plan_outline() return value PLAN flow (default + fallback paths)
9.19.5 Locale switching continues to work get_language_instruction() call sites at 1166, 1262, 1808 unchanged All flows
10.110.5 Public API stable ReportAgent, ReportManager, Report, ReportOutline, ReportSection, ReportStatus unchanged All flows
11.111.5 End-to-end Step 4 / Step 5 parity Verification only unchanged All flows
12.112.6 Out-of-scope guardrail None edited unchanged n/a

Components and Interfaces

Component Domain/Layer Intent Req Coverage Key Dependencies (P0/P1) Contracts
Tool description constants Module-scope constants in report_agent.py LLM-facing tool catalog injected into SECTION + CHAT system prompts via _get_tools_description 5.1, 5.2, 5.7 _define_tools (P0), _get_tools_description (P0) State (string literals only)
PLAN_* prompts Module-scope constants Outline planning system + user prompts 1.1, 1.2, 1.5, 1.6 get_language_instruction (P0), plan_outline (P0) State
SECTION_* prompts Module-scope constants Section ReACT system + user prompts 2.1, 2.2, 2.3, 2.4, 2.6, 2.7 get_language_instruction (P0), _generate_section_react (P0), _get_tools_description (P1) State
REACT_* templates Module-scope constants ReACT loop user-role messages re-injected after tool calls 4.1, 4.2, 4.3, 4.4, 4.5 _generate_section_react (P0) State
CHAT_* prompts Module-scope constants Chat system prompt + observation suffix 3.1, 3.2, 3.3, 3.4, 3.5, 3.6 get_language_instruction (P0), chat (P0), _get_tools_description (P1) State
_define_tools parameter dict ReportAgent instance method Catalog of tools + parameter hints, exposed to LLM via _get_tools_description 5.3, 5.4, 5.6 _get_tools_description (P0) Service
_get_tools_description ReportAgent instance method Renders _define_tools output as a single string for SECTION + CHAT prompts 5.5 _define_tools (P0) Service
_execute_tool error returns ReportAgent instance method Returns observation strings to the LLM for unknown-tool / execution-error paths 7.1, 7.2, 7.3 _execute_tool (P0) Service
_generate_section_react inline strings ReportAgent instance method body LLM-visible strings appended to messages during ReACT loop 6.1, 6.2, 6.3, 6.4 _generate_section_react (P0) Service
chat inline strings ReportAgent instance method body LLM-visible strings appended to messages during chat loop 6.5, 6.6 chat (P0) Service
plan_outline defaults ReportAgent instance method body Default / fallback ReportOutline content emitted on success-without-title or exception path 8.1, 8.2, 8.3, 8.4 plan_outline (P0) State

All components are existing module-scope constants or method-internal expressions. None require a full detail block — the responsibility boundary is "translate the string content; preserve the structural shape". The summary table above plus the requirement-level acceptance criteria in requirements.md form a complete contract.

Implementation Notes (cross-cutting)

  • Translation glossary (consistent across all components — see research.md Decision: Standard English phrasing): 上帝视角 → "god's-eye view"; 未来预演 → "forecast simulation" / "simulated future"; 模拟需求 → "simulation requirement"; 模拟世界 → "simulated world"; 章节 → "section"; 大纲 → "outline"; 引用 → "quote"/"quotation"; 正确示例 → "Correct Example"; 错误示例 → "Wrong Example"; 注意 → "Note"; 重要 → "IMPORTANT"; 工具 → "tool"; 检索 → "retrieval".
  • Literal preservation: Final Answer:, <tool_call>, </tool_call>, all tool names (insight_forge, panorama_search, quick_search, interview_agents, plus legacy aliases), all {interpolation} tokens, all JSON schema keys, all emoji / box-drawing characters (💡, ).
  • Locale-agnostic strings: _execute_tool error returns and plan_outline default / fallback outline content are returned regardless of locale (no get_language_instruction() injection at those sites). They become locale-agnostic English under this PR.
  • Separator change: unused_tools_str = "、".join(unused_tools) at line 1454 → ", ".join(unused_tools). This is the only non-string-literal code change.

Data Models

No data-model changes. Report, ReportOutline, ReportSection, ReportStatus, Task, and the report API JSON contract are all preserved verbatim. Report.to_dict() and ReportOutline.to_dict() shapes are unchanged. The persistence schema under reports/<id>/ (meta.json, outline.json, progress.json, section_NN.md, full_report.md, agent_log.jsonl, console_log.txt) is unchanged.

Error Handling

Error Strategy

No new error types or recovery strategies. The translated _execute_tool error returns and plan_outline exception-path fallback continue to behave identically — the only change is the string content.

Error Categories and Responses

  • Unknown-tool error: _execute_tool returns a translated English string "Unknown tool: {tool_name}. Please use one of: insight_forge, panorama_search, quick_search". The string is fed back to the LLM as the next user-role observation.
  • Tool-execution exception: _execute_tool returns a translated English string "Tool execution failed: {str(e)}". Same flow.
  • plan_outline LLM exception: returns the translated English fallback ReportOutline (3 sections). Downstream report assembly proceeds normally.
  • Empty-response retry / conflict-handling / insufficient-tools: translated English messages re-injected into the LLM message stream (R6, R4 acceptance criteria). Loop control flow unchanged.

Testing Strategy

Default sections (adapted to translation work)

  • Static lint: python -m py_compile backend/app/services/report_agent.py — must pass.
  • Zero-Chinese assertion (in-scope regions): a verification harness (a small ad-hoc script under scripts/ if needed, deleted before PR) imports report_agent and runs re.findall(r'[一-鿿]', literal) over each in-scope constant, expecting an empty list. The single permitted Chinese remnant is the logger.debug f-string at line 1322 (not in scope).
  • Interpolation-shape parity: invoke PLAN_USER_PROMPT_TEMPLATE.format(simulation_requirement="x", total_nodes=0, total_edges=0, entity_types=[], total_entities=0, related_facts_json="[]"), SECTION_SYSTEM_PROMPT_TEMPLATE.format(report_title="x", report_summary="y", simulation_requirement="z", section_title="t", tools_description="d"), SECTION_USER_PROMPT_TEMPLATE.format(previous_content="x", section_title="t"), CHAT_SYSTEM_PROMPT_TEMPLATE.format(simulation_requirement="x", report_content="r", tools_description="d"), REACT_OBSERVATION_TEMPLATE.format(tool_name="x", result="y", tool_calls_count=1, max_tool_calls=5, used_tools_str="a, b", unused_hint="z"), etc. — each must render without raising KeyError.
  • Trigger-literal preservation: assert that "Final Answer:" is a substring of the translated SECTION_SYSTEM_PROMPT_TEMPLATE, SECTION_USER_PROMPT_TEMPLATE, REACT_OBSERVATION_TEMPLATE, REACT_TOOL_LIMIT_MSG, and REACT_FORCE_FINAL_MSG; assert that "<tool_call>" is a substring of the translated SECTION_SYSTEM_PROMPT_TEMPLATE and CHAT_SYSTEM_PROMPT_TEMPLATE.
  • Tool-name preservation: assert that all four primary tool names appear unchanged in the translated _define_tools keys and in the translated TOOL_DESC_* blocks.
  • End-to-end (deferred): per the precedent of issues #2/#3/#4, full pipeline runs under Accept-Language: en and Accept-Language: zh are not part of CI for this PR. Reviewer trust applies. If feasible in the implementer's local environment, a single sample run under en to confirm no Markdown headings leak into section bodies and a single sample run under zh to confirm Chinese output quality is preserved — both optional confidence boosters, not gates.

Security Considerations

No new security surface. Translated prompts do not expose new endpoints, do not add new external calls, and do not change authorization semantics. The _execute_tool error returns continue to expose str(e) from any caught exception — pre-existing behavior, unchanged by this PR.

Performance & Scalability

No performance regression expected. English prompts may be ~1030% longer in token count than the equivalent Chinese (English requires more tokens for the same semantic content), but this is well within the 4096 max_tokens ceiling on the section LLM call and the model's overall context budget. No caching, no batching, no concurrency change.

Migration Strategy

No data or schema migration. The change is a single in-place edit. Rollback strategy: revert the single commit on feat/i18n-5-translate-report-agent-prompts if a regression is detected.

Supporting References

  • Detailed discovery, alternatives evaluation, decision rationale, and risk register: .kiro/specs/i18n-report-agent-prompts/research.md.
  • Sibling spec (i18n-simulation-config-generator-prompts): .kiro/specs/i18n-simulation-config-generator-prompts/{requirements,design,gap-analysis,research}.md.
  • Sibling commits: 0806832 (#2), 9d1d29b (#3), 6c2a412 (#4).
  • Ticket snapshot: .ticket/5.md.