20 KiB
Implementation Plan
1. Foundation: stage a verification harness
- 1.1 Stage a one-shot verification harness for prompt-string content
- Add a small, isolated verification script under
backend/scripts/that, given the path toreport_agent.py, asserts: (a) the file compiles viapy_compile; (b) every in-scope LLM-facing string-literal contains zero[一-鿿]matches; (c) the literal trigger stringsFinal Answer:and<tool_call>are still present in the relevant translated templates; (d) the four primary tool names (insight_forge,panorama_search,quick_search,interview_agents) are still byte-equal in_define_toolsand the fourTOOL_DESC_*constants; (e) the threeget_language_instruction()call sites are byte-equal at the same logical positions; (f) the only Chinese remaining in the module is inlogger.*lines,"""..."""docstrings, or#comments (i.e. issue #6/#7 scope). - Wire the script to be runnable via
cd backend && uv run python scripts/verify_report_agent_prompts.py. - Observable completion: running the script before any translation prints concrete failures (~2680 Chinese chars in in-scope regions); after translation it prints "all checks passed" and exits 0.
- Requirements: 1.1, 1.2, 2.1, 2.2, 3.1, 3.2, 4.1, 4.2, 5.1, 5.5, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 7.1, 7.2, 8.1, 8.2, 9.1, 12.1, 12.2
- Add a small, isolated verification script under
2. Core: translate tool-description constants and _define_tools parameter hints
-
2.1 Translate the four
TOOL_DESC_*constants to English- Rewrite
TOOL_DESC_INSIGHT_FORGE,TOOL_DESC_PANORAMA_SEARCH,TOOL_DESC_QUICK_SEARCH,TOOL_DESC_INTERVIEW_AGENTSto English while preserving the per-tool semantics:insight_forgeis deep multi-angle analytical retrieval;panorama_searchis breadth/timeline overview retrieval;quick_searchis lightweight literal-keyword retrieval;interview_agentsis a real OASIS dual-platform agent-interview API. - Preserve byte-for-byte the literal tool name mentions and the operational warning about needing a running OASIS environment in
TOOL_DESC_INTERVIEW_AGENTS. - Observable completion: harness from 1.1 reports zero Chinese in the four constants; tool-name byte-equality check passes.
- Requirements: 5.1, 5.2, 5.3, 5.7
- Boundary: report_agent module-scope TOOL_DESC* constants_
- Rewrite
-
2.2 Translate
_define_toolsparameter dict values and_get_tools_descriptionleader- Rewrite the parameter-description string values inside
_define_tools(the values forquery,report_context,include_expired,limit,interview_topic,max_agentsper tool) to English. Preserve the parameter dict keys byte-for-byte. - Rewrite the leading literal
"可用工具:"in_get_tools_descriptionto English (e.g."Available tools:"). - Observable completion: harness reports zero Chinese in
_define_toolsparameter values and in the_get_tools_descriptionleader; calling_get_tools_description()on a stubReportAgentinstance returns a string starting with the English leader. - Requirements: 5.4, 5.5, 5.6
- _Boundary: report_agent.ReportAgent._define_tools, get_tools_description
- Rewrite the parameter-description string values inside
3. Core: translate the PLAN-phase prompts
-
3.1 (P) Translate
PLAN_SYSTEM_PROMPTandPLAN_USER_PROMPT_TEMPLATEto English- Rewrite both constants to English while keeping the JSON output schema (
title,summary,sections[].title,sections[].description), the 2–5 section count constraint, and the all-seeing-observer / forecast-simulation framing. - Preserve every variable interpolation:
{simulation_requirement},{total_nodes},{total_edges},{entity_types},{total_entities},{related_facts_json}. Leave thesystem_prompt = f"{PLAN_SYSTEM_PROMPT}\n\n{get_language_instruction()}"injection at line 1166 untouched. - Observable completion: harness reports zero Chinese in
PLAN_SYSTEM_PROMPTandPLAN_USER_PROMPT_TEMPLATE; renderingPLAN_USER_PROMPT_TEMPLATE.format(simulation_requirement="x", total_nodes=0, total_edges=0, entity_types=[], total_entities=0, related_facts_json="[]")raises noKeyError. - Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 9.1
- Boundary: report_agent module-scope PLAN_SYSTEM_PROMPT, PLAN_USER_PROMPT_TEMPLATE
- Rewrite both constants to English while keeping the JSON output schema (
-
3.2 Translate
plan_outlinedefault / fallback outline content to English- Replace the success-path default title
"模拟分析报告"(line 1197) with a locale-agnostic English equivalent (e.g."Simulation Analysis Report"). - Replace the exception-path fallback
ReportOutlinecontent (lines 1212–1218): title"未来预测报告"→ e.g."Future Prediction Report"; summary"基于模拟预测的未来趋势与风险分析"→ e.g."Trend and risk analysis based on simulation predictions"; three section titles"预测场景与核心发现","人群行为预测分析","趋势展望与风险提示"→ e.g."Scenario and Key Findings","Population Behavior Predictions","Trend Outlook and Risk Notes". - Preserve the existing
ReportOutlineshape: 3ReportSectionitems, no field additions/removals. - Observable completion: forcing
plan_outline()into the exception path (e.g. by stubbingself.llm.chat_jsonto raise) returns aReportOutlinewhose title, summary, and section titles are locale-agnostic English; harness reports zero Chinese in lines 1197, 1212–1218. - Requirements: 8.1, 8.2, 8.3, 8.4
- Boundary: report_agent.ReportAgent.plan_outline
- Replace the success-path default title
4. Core: translate the EXEC-phase prompts (section ReACT)
-
4.1 Translate
SECTION_SYSTEM_PROMPT_TEMPLATEto English (incl. embedded examples)- Rewrite the template to English while preserving: every
{report_title},{report_summary},{simulation_requirement},{section_title},{tools_description}interpolation; the no-headings rule (no#,##,###,####); the language-consistency rule for translating quoted tool output to the report language; the must-call-tools instruction with min 3 / max 5 calls; the two-mode reply contract; the literalFinal Answer:trigger string; the literal<tool_call>...</tool_call>example block; the box-drawing═separators and the⚠️/❌/✅markers. - Translate the embedded "正确示例" / "错误示例" code blocks (lines 678–703) to semantically equivalent English illustrations: the "Correct Example" should show a sample paragraph using
**bold**,>block quotes, and lists (no headings); the "Wrong Example" should show wrong English headings (## Executive Summary,### 1. First Stage, etc.) labelled as errors. - Leave the
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"injection at line 1262 untouched. - Observable completion: harness reports zero Chinese in
SECTION_SYSTEM_PROMPT_TEMPLATE;Final Answer:and<tool_call>literals are present byte-equal; rendering with stub interpolations raises noKeyError. - Requirements: 2.1, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 9.1
- Boundary: report_agent module-scope SECTION_SYSTEM_PROMPT_TEMPLATE
- Rewrite the template to English while preserving: every
-
4.2 (P) Translate
SECTION_USER_PROMPT_TEMPLATEto English- Rewrite the template to English while preserving the
{previous_content}and{section_title}interpolations. Note that{section_title}is referenced twice — once as a literal in the do-not-write-as-opening warning and once as a body reference; both must be retained. - Preserve the must-call-tools / mix-tools / no-headings reminders. Preserve the closing three-step instruction (think → call tool → output Final Answer).
- Observable completion: harness reports zero Chinese in
SECTION_USER_PROMPT_TEMPLATE; renderingSECTION_USER_PROMPT_TEMPLATE.format(previous_content="x", section_title="t")raises noKeyErrorand the rendered string containstin two places. - Requirements: 2.2, 2.3, 2.7
- Boundary: report_agent module-scope SECTION_USER_PROMPT_TEMPLATE
- Rewrite the template to English while preserving the
5. Core: translate the ReACT loop conversation templates
-
5.1 Translate
REACT_OBSERVATION_TEMPLATEand the fiveREACT_*_MSGconstants to English- Rewrite
REACT_OBSERVATION_TEMPLATE,REACT_INSUFFICIENT_TOOLS_MSG,REACT_INSUFFICIENT_TOOLS_MSG_ALT,REACT_TOOL_LIMIT_MSG,REACT_UNUSED_TOOLS_HINT,REACT_FORCE_FINAL_MSGto English. - Preserve the
{tool_name},{result},{tool_calls_count},{max_tool_calls},{used_tools_str},{unused_hint},{min_tool_calls},{unused_list}interpolations across these templates. Preserve theFinal Answer:literal trigger insideREACT_OBSERVATION_TEMPLATEandREACT_TOOL_LIMIT_MSG. - Preserve the emoji and box-drawing characters (
💡,═). - Observable completion: harness reports zero Chinese in the six
REACT_*constants;Final Answer:substring check passes for the two templates that reference it; renderingREACT_OBSERVATION_TEMPLATE.format(tool_name="x", result="y", tool_calls_count=1, max_tool_calls=5, used_tools_str="a, b", unused_hint="z")raises noKeyError. - Requirements: 4.1, 4.2, 4.3, 4.4, 4.5
- Boundary: report_agent module-scope REACT* constants_
- Rewrite
-
5.2 Switch the
unused_tools_strjoin separator at line 1454 from"、"to", "- Change
unused_tools_str = "、".join(unused_tools)tounused_tools_str = ", ".join(unused_tools)so the result reads naturally inside the now-EnglishREACT_OBSERVATION_TEMPLATE. - Observable completion: a grep over
report_agent.pyfor"、"returns zero matches;unused_tools_strrendered with two stub tool names yields"insight_forge, panorama_search"(English-friendly). - Requirements: 4.6
- _Boundary: report_agent.ReportAgent.generate_section_react
- Change
6. Core: translate the CHAT-phase prompts
- 6.1 Translate
CHAT_SYSTEM_PROMPT_TEMPLATEandCHAT_OBSERVATION_SUFFIXto English- Rewrite both constants to English while preserving the
{simulation_requirement},{report_content},{tools_description}interpolations and the literal<tool_call>...</tool_call>example block. - Preserve the chat tool-budget hint (
MAX_TOOL_CALLS_PER_CHATsemantics: 1–2 per session) and the answer-style instructions (concise, lead with conclusion, use>for quoted material). - Leave the
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"injection at line 1808 untouched. - Observable completion: harness reports zero Chinese in both constants;
<tool_call>substring check passes; renderingCHAT_SYSTEM_PROMPT_TEMPLATE.format(simulation_requirement="x", report_content="r", tools_description="d")raises noKeyError. - Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 9.1
- Boundary: report_agent module-scope CHAT_SYSTEM_PROMPT_TEMPLATE, CHAT_OBSERVATION_SUFFIX
- Rewrite both constants to English while preserving the
7. Core: translate inline LLM-visible strings inside _generate_section_react and chat
-
7.1 Translate the inline strings in
_generate_section_reactto English- Replace
report_context = f"章节标题: {section.title}\n模拟需求: {self.simulation_requirement}"(line 1294) with an English equivalent (e.g.f"Section title: {section.title}\nSimulation requirement: {self.simulation_requirement}"), preserving both interpolations. - Replace the empty-response retry messages
"(响应为空)"(line 1316) and"请继续生成内容。"(line 1317) with English equivalents (e.g."(empty response)"and"Please continue generating content."). - Replace the conflict-handling assistant→user message at lines 1342–1346 with an English equivalent that preserves the literal mention of
<tool_call>and'Final Answer:'and the two-mode contract (call one tool OR output Final Answer; never both). - Replace the inline
unused_hintliterals at lines 1380 and 1476 (f"(这些工具还未使用,推荐用一下他们: {', '.join(unused_tools)})") with English equivalents (e.g.f"(These tools have not been used yet, you may try them: {', '.join(unused_tools)})"), preserving the{', '.join(unused_tools)}interpolation. Both sites should convey the same hint and remain syntactically equivalent. - Observable completion: harness reports zero Chinese in
_generate_section_reactoutside oflogger.*, docstrings, and#comments; the four targeted regions render with their interpolations intact. - Requirements: 6.1, 6.2, 6.3, 6.4, 6.7
- _Boundary: report_agent.ReportAgent.generate_section_react
- Replace
-
7.2 (P) Translate the inline strings in
chatto English- Replace
"\n\n... [报告内容已截断] ..."(line 1799) with an English equivalent (e.g."\n\n... [report content truncated] ..."). - Replace
"(暂无报告)"(line 1805) with an English equivalent (e.g."(no report yet)"). - Replace the observation joiner format
f"[{r['tool']}结果]\n{r['result']}"(line 1861) with an English equivalent (e.g.f"[{r['tool']} result]\n{r['result']}"), preserving the{r['tool']}and{r['result']}interpolations. - Observable completion: harness reports zero Chinese in
chatoutside oflogger.*, docstrings, and#comments; the three targeted regions render with their interpolations intact. - Requirements: 6.5, 6.6, 6.7
- Boundary: report_agent.ReportAgent.chat
- Replace
8. Core: translate _execute_tool error returns
- 8.1 Translate the
_execute_toolerror returns to English- Replace
f"未知工具: {tool_name}。请使用以下工具之一: insight_forge, panorama_search, quick_search"(line 1058) with an English equivalent (e.g.f"Unknown tool: {tool_name}. Please use one of: insight_forge, panorama_search, quick_search"), preserving the{tool_name}interpolation and the literal tool-name list. - Replace
f"工具执行失败: {str(e)}"(line 1062) with an English equivalent (e.g.f"Tool execution failed: {str(e)}"), preserving the{str(e)}interpolation. - Both translated strings remain locale-agnostic English (no
get_language_instruction()injection at this site). - Observable completion: harness reports zero Chinese in lines 1058 and 1062; both error returns are locale-agnostic English; the literal tool-name list is byte-equal.
- Requirements: 7.1, 7.2, 7.3
- _Boundary: report_agent.ReportAgent.execute_tool
- Replace
9. Validation: locale and integration checks
-
9.1 Confirm
get_language_instruction()call sites are byte-equal at lines 1166, 1262, 1808- After translation, run the harness from 1.1; it must verify that the three
system_prompt = f"{...}\n\n{get_language_instruction()}"injection lines remain unchanged in syntactic form (the only allowed deltas are inside{...}itself, which the prompt-content checks already covered). - Observable completion: harness prints a "locale-postfix injection unchanged at lines 1166/1262/1808" line and exits 0.
- Requirements: 1.6, 2.4, 3.4, 9.1
- Depends: 3.1, 4.1, 6.1
- After translation, run the harness from 1.1; it must verify that the three
-
9.2 Confirm public-API and constants are byte-stable
- Programmatically inspect the module after translation and confirm:
ReportAgent.__init__,plan_outline,generate_report,chat,_generate_section_react,_execute_tool,_define_tools,_get_tools_description,_parse_tool_calls,_is_valid_tool_callall retain their existing parameter names and return annotations; the dataclass-equivalent definitionsReport,ReportOutline,ReportSection,ReportStatusare unchanged; the class-level constantsMAX_TOOL_CALLS_PER_SECTION,MAX_REFLECTION_ROUNDS,MAX_TOOL_CALLS_PER_CHAT,REPORTS_DIRare unchanged. - Inspection can be by
inspect.signaturechecks plusre.searchfor the constant declarations. - Observable completion: a single signature/constant-stability check runs from the harness and prints "public surface stable" before exit.
- Requirements: 10.1, 10.2, 10.3, 10.4, 10.5
- Depends: 3.1, 4.1, 6.1
- Programmatically inspect the module after translation and confirm:
-
9.3 Confirm out-of-scope guardrails: logger calls, docstrings, comments, adjacent files
- Run a targeted check that confirms: every
logger.info/logger.warning/logger.error/logger.debugcall line retains its pre-existing Chinese content (no translation creep into #6's scope) — the line-1322logger.debug(f"LLM响应: ...")is the canary;"""..."""docstrings (module, classesReportLogger,ReportConsoleLogger,Report,ReportOutline,ReportSection,ReportAgent,ReportManager, dataclasses, methods) retain their pre-existing Chinese content (no translation creep into #7's scope);git statusshows onlybackend/app/services/report_agent.py(and optionallybackend/scripts/verify_report_agent_prompts.py) modified — no edits tobackend/app/config.py,backend/app/services/zep_tools.py,backend/app/utils/locale.py,backend/app/api/report.py,/locales/,backend/pyproject.toml, orbackend/uv.lock. - Observable completion: a check prints "out-of-scope guardrails respected" listing the count of Chinese chars remaining in logger lines (>0 expected) and in docstrings (>0 expected) as positive indicators;
git statusis clean except for the two allowed paths. - Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6
- Depends: 3.1, 3.2, 4.1, 4.2, 5.1, 5.2, 6.1, 7.1, 7.2, 8.1
- Run a targeted check that confirms: every
-
9.4 Locale-switching static evidence:
enandzh- Sandbox lacks runtime dependencies for an end-to-end report run. Substitute runtime smoke with static evidence that locale switching is preserved: (a) harness check confirms
get_language_instruction()call-site count is exactly 3 at the expected logical positions; (b) harness check confirms the three injection lines are syntactically byte-equal in form; (c)git statusconfirmsbackend/app/utils/locale.pyandlocales/*.jsonare unchanged. Together these guarantee that underAccept-Language: enthe postfixPlease respond in English.continues to be appended and underAccept-Language: zhthe postfix请使用中文回答。continues to be appended at the same call sites with no semantic delta. Sister specs (#2, #3, #4) used the same static-only posture. - Observable completion: harness exits 0 with all three checks reported as PASS.
- Requirements: 9.1, 9.2, 9.3, 9.4, 9.5
- Depends: 3.1, 4.1, 6.1
- Sandbox lacks runtime dependencies for an end-to-end report run. Substitute runtime smoke with static evidence that locale switching is preserved: (a) harness check confirms
-
9.5* Optional fixture-based render-shape parity check
- Build a stub
ReportAgent(with stubbedzep_toolsandllm) and patch the LLM client to return well-shaped JSON forplan_outline()and well-shaped tool-call + Final-Answer responses for_generate_section_react(). Rungenerate_report(...)end-to-end against the stub. Assert that the returnedReporthas a non-empty title, non-empty summary, ≥2 and ≤5 sections, each section non-empty. - Confirms R11 functional coverage without depending on a live Neo4j / OASIS environment. Marked optional because R10 + R11.5 already lock the shape stability via guard checks (9.2) and design-level reasoning; this is auxiliary belt-and-braces test coverage.
- Observable completion: a single fixture-based test prints the
Report.to_dict()keys and asserts the non-emptiness invariants; exits 0. - Requirements: 11.1, 11.2, 11.3, 11.4, 11.5
- Depends: 3.1, 3.2, 4.1, 4.2, 5.1, 5.2, 6.1, 7.1, 7.2, 8.1
- Build a stub
10. Cleanup
- 10.1 Remove or move the verification harness as appropriate
- If the verification harness from 1.1 is intended as a one-shot check, delete
backend/scripts/verify_report_agent_prompts.pyafter the implementation passes its checks. If it is intended as a permanent regression test, keep it underbackend/scripts/and ensure it is callable viauv run python scripts/verify_report_agent_prompts.pywith no test framework required. - Decision rule: keep the harness only if it costs less than 30 lines and reads as a usable smoke check; otherwise remove it. Sister specs (#2, #3, #4) shipped without permanent harnesses, so the default is "remove."
- Observable completion:
git statusshows onlybackend/app/services/report_agent.pymodified, with no harness artefacts left behind (preferred); or, if kept, the harness lives underbackend/scripts/with a one-line module docstring linking back to speci18n-report-agent-prompts. - Requirements: 12.4
- Depends: 9.1, 9.2, 9.3, 9.4
- If the verification harness from 1.1 is intended as a one-shot check, delete