MicroFish/.kiro/specs/i18n-report-agent-prompts/gap-analysis.md

16 KiB
Raw Blame History

Gap Analysis — i18n-report-agent-prompts

1. Current-State Investigation

Domain assets

  • Target file: backend/app/services/report_agent.py (2572 lines).
  • Tool description constants (LLM-facing, injected via _get_tools_description):
    • TOOL_DESC_INSIGHT_FORGE (lines 476492)
    • TOOL_DESC_PANORAMA_SEARCH (lines 494509)
    • TOOL_DESC_QUICK_SEARCH (lines 511521)
    • TOOL_DESC_INTERVIEW_AGENTS (lines 523548)
  • PLAN-phase prompts (plan_outline, line ~1137):
    • PLAN_SYSTEM_PROMPT (lines 552589) — system_prompt = f"{PLAN_SYSTEM_PROMPT}\n\n{get_language_instruction()}" at line 1166.
    • PLAN_USER_PROMPT_TEMPLATE (lines 591611).
  • EXEC-phase prompts (_generate_section_react, line ~1221):
    • SECTION_SYSTEM_PROMPT_TEMPLATE (lines 615767) — appended postfix at line 1262.
    • SECTION_USER_PROMPT_TEMPLATE (lines 769792).
  • ReACT loop conversation templates (consumed inside _generate_section_react):
    • REACT_OBSERVATION_TEMPLATE (796806)
    • REACT_INSUFFICIENT_TOOLS_MSG (808811)
    • REACT_INSUFFICIENT_TOOLS_MSG_ALT (813816)
    • REACT_TOOL_LIMIT_MSG (818821)
    • REACT_UNUSED_TOOLS_HINT (823)
    • REACT_FORCE_FINAL_MSG (825)
  • CHAT-phase prompts (chat, line ~1766):
    • CHAT_SYSTEM_PROMPT_TEMPLATE (829855) — appended postfix at line 1808.
    • CHAT_OBSERVATION_SUFFIX (857).
  • Inline LLM-visible Chinese strings (sent into messages):
    • _define_tools parameter-description dict values (925952).
    • _get_tools_description leader "可用工具:" (1129).
    • _execute_tool error returns f"未知工具: {tool_name}..." (1058) and f"工具执行失败: {str(e)}" (1062).
    • _generate_section_react: report_context = f"章节标题: ...\n模拟需求: ..." (1294); empty-response messages "(响应为空)" / "请继续生成内容。" (13161317); conflict-handling block (13421346); inline unused_hint literals at 1380 and 1476.
    • chat: report-truncated marker "\n\n... [报告内容已截断] ..." (1799); no-report fallback "(暂无报告)" (1805); observation joiner f"[{r['tool']}结果]\n{r['result']}" (1861).
  • Default / fallback outline content in plan_outline():
    • Success-path default title "模拟分析报告" (1197).
    • Exception-path fallback ReportOutline title "未来预测报告", summary "基于模拟预测的未来趋势与风险分析", three section titles (12121218).
  • Locale resolution: backend/app/utils/locale.py get_locale/get_language_instruction/t resolves locale from Accept-Language header (or thread-local in background threads). languages.json registers zh, en, es, fr, pt, ru, de.

Counts (verified, in-scope only)

Region Approx Chinese chars
TOOL_DESC_INSIGHT_FORGE 110
TOOL_DESC_PANORAMA_SEARCH 95
TOOL_DESC_QUICK_SEARCH 50
TOOL_DESC_INTERVIEW_AGENTS 215
PLAN_SYSTEM_PROMPT 250
PLAN_USER_PROMPT_TEMPLATE 130
SECTION_SYSTEM_PROMPT_TEMPLATE 950
SECTION_USER_PROMPT_TEMPLATE 150
REACT_* templates 130
CHAT_SYSTEM_PROMPT_TEMPLATE + CHAT_OBSERVATION_SUFFIX 130
_define_tools parameter dict values 110
_execute_tool error returns 30
_generate_section_react inline messages 230
chat inline messages 60
plan_outline defaults 50
In-scope total ~2680

The ticket's "~609 Chinese characters" undercounts — it apparently only counted the three system-prompt blocks. The full LLM-message-stream Chinese surface is ~4× that. Logger calls (~17), docstrings, and module/class/method/inline # comments are out of scope (covered by #6 / #7).

Conventions (extracted)

  • Sister specs i18n-ontology-generator-prompts (commit 0806832, issue #2), i18n-oasis-profile-generator-prompts (commit 9d1d29b, issue #3), and i18n-simulation-config-generator-prompts (commit 6c2a412, issue #4) established the pattern: in-place translation of all LLM-facing string literals in a single file; preserve get_language_instruction() call sites; preserve all interpolations; do not touch logger, docstrings, comments, or other files.
  • 4-space indent, snake_case, double quotes for strings, f"""...""" for multi-line prompts. Existing Chinese-then-English mix is acceptable in comments/docstrings (steering tech.md: "preserve both; do not translate one into the other unless asked").
  • No linter/formatter — match surrounding style.
  • File mixes top-level constant prompts (e.g. PLAN_SYSTEM_PROMPT) with inline f-strings and .format() templates inside method bodies. Translation must respect both placement conventions.

Integration surfaces

  • ReportAgent.generate_report(...) is called from the report API blueprint (api/report.py). The returned Report.to_dict() payload is consumed by the frontend report panel; field shapes and types must remain unchanged.
  • ReportAgent.chat(...) is called from the chat endpoint; the returned {"response", "tool_calls", "sources"} shape is consumed by the frontend chat UI.
  • The four primary tools (insight_forge, panorama_search, quick_search, interview_agents) are dispatched in _execute_tool to self.zep_tools.* — those callees are unchanged.
  • _parse_tool_calls matches the literal <tool_call>...</tool_call> XML tag and a fallback bare-JSON form via regex. Translation must preserve those literals byte-for-byte.
  • chat() strips <tool_call> blocks from the user-visible response via re.sub(r'<tool_call>.*?</tool_call>', '', ...) (lines 1838, 1874). Translation does not affect this.
  • _clean_section_content and _post_process_report post-process generated section content under the assumption that the LLM does not emit Markdown headings (#, ##, ###, etc.) inside section bodies. The translated SECTION_SYSTEM_PROMPT_TEMPLATE must continue to forbid headings.
  • Locale-switching contract: when locale = zh, get_language_instruction() returns 请使用中文回答。; when en, Please respond in English. — verified.

2. Requirement-to-Asset Map

Requirement Existing asset Gap Tag
R1 — PLAN prompts EN PLAN_SYSTEM_PROMPT (552), PLAN_USER_PROMPT_TEMPLATE (591) Translate text; preserve JSON schema (title, summary, sections[] w/ title, description); preserve 25 section count; preserve all interpolations Missing (translation)
R2 — EXEC prompts EN SECTION_SYSTEM_PROMPT_TEMPLATE (615), SECTION_USER_PROMPT_TEMPLATE (769) Translate text; preserve Final Answer: / <tool_call> literals; preserve no-headings instruction; preserve language-consistency rule; preserve interpolation tokens Missing (translation)
R3 — CHAT prompts EN CHAT_SYSTEM_PROMPT_TEMPLATE (829), CHAT_OBSERVATION_SUFFIX (857) Translate text; preserve <tool_call> literal; preserve MAX_TOOL_CALLS_PER_CHAT semantics Missing (translation)
R4 — ReACT loop templates EN REACT_OBSERVATION_TEMPLATE and 5 message constants (796825) Translate text; preserve Final Answer: literal; preserve emoji/box-drawing visuals; reconcile "、".join(...) separator Missing (translation)
R5 — Tool-description constants EN 4 TOOL_DESC_* blocks (476548); _define_tools parameter dict (925952); _get_tools_description leader (1129) Translate text; preserve tool-name literals; preserve parameter dict keys; preserve OASIS-running warning Missing (translation)
R6 — Inline LLM-visible strings EN 7 inline strings across _generate_section_react and chat (1294, 13161317, 13421346, 1380, 1476, 1799, 1805, 1861) Translate text; preserve {section.title}, {self.simulation_requirement}, {r['tool']}, {r['result']}, {', '.join(unused_tools)} interpolations Missing (translation)
R7 — _execute_tool error returns EN 2 f-strings (1058, 1062) Translate text; preserve {tool_name} and {str(e)} interpolations; remain locale-agnostic Missing (translation)
R8 — plan_outline defaults EN 1 success-path default (1197), 5 exception-path strings (12121218) Translate text; remain locale-agnostic; preserve ReportOutline shape (3 sections) Missing (translation)
R9 — Locale switching preserved get_language_instruction() calls at 1166, 1262, 1808 None — keep call sites untouched Constraint
R10 — Public API stable Class/method/dataclass surface None — text-only changes Constraint
R11 — End-to-end parity API blueprint, frontend report panel, OASIS interview API Verification only — Report.to_dict() shape unchanged Constraint
R12 — Out-of-scope guardrails logger calls (~17 in this file), docstrings, comments, all other files None — leave untouched Constraint

Unknown / Research-needed

  • R11 verification feasibility: Running an end-to-end report generation flow under Accept-Language: en and Accept-Language: zh requires Neo4j, an LLM key, a populated graph, and a running OASIS simulation (for interview_agents). In a sandboxed CI-like environment, this is not practical. Defer to a lightweight fixture-based check, matching the precedent set by issues #2/#3/#4: (a) python -m py_compile lint pass on report_agent.py; (b) zero-Chinese assertion on the in-scope string set via a script that imports the module and inspects each constant + a regex sweep over the module source; (c) shape parity by constructing a mock ReportAgent and confirming _get_tools_description(), system_prompt, and user_prompt render to the expected interpolation set without raising. Decision (autonomous run): adopt option (c) — reviewer-trust is the precedent for issues #2/#3/#4 and the scope here is identical (single-file translation).
  • logger.debug(f"LLM响应: {response[:200]}...") at line 1322: This is the one raw-Chinese logger call in this file (all others use t('...')). It is OUT OF SCOPE for issue #5 — it falls under issue #6 (logger translation). Note for the reviewer: this leaves one Chinese f-string in report_agent.py after this PR; the acceptance criterion in the ticket explicitly carves out logger lines.
  • SECTION_SYSTEM_PROMPT_TEMPLATE includes a "正确示例" / "错误示例" code block (lines 678703) with embedded Chinese sample text (微博, 抖音, 校方 etc.). These are example illustrations of the formatting contract, not data. Translating them to English is required (R2 acceptance criterion 1: "zero Chinese characters"). The translated examples should still illustrate the same format rule (use **bold** not ##, use > for block quotes, no headings).

3. Implementation Approach Options

What: Edit every Chinese string-literal in backend/app/services/report_agent.py directly, in place. No new files.

Trade-offs:

  • Matches the precedent set by commits 0806832 (issue #2), 9d1d29b (issue #3), and 6c2a412 (issue #4) — same file scope, same approach. Reviewer pattern recognition is the lowest possible.
  • Smallest possible diff at the file system level (1 file).
  • No new abstractions, no new files, no dependency churn.
  • Translations are baked in — switching to es/fr/pt/ru/de still relies on the get_language_instruction() postfix to bias the model. (This is also true under the current Chinese-base baseline; not a regression.)
  • The diff is non-trivial (~2680 chars to retranslate plus structural rewriting of the section system prompt). Reviewer must read the prompts side-by-side; line counts shift.

Option B — Externalize prompts to /locales/

What: Move all prompt content to locales/en.json / locales/zh.json and look them up via t('prompts.report.plan.system') etc.

Trade-offs:

  • Genuinely locale-agnostic prompts; could deliver native-quality Spanish, French, etc. with future translation work.
  • Separates content from code, easing future prompt edits without code review.
  • Diverges from the established pattern of issues #2/#3/#4 — those translated in place. Adopting a new pattern for the same kind of work re-opens architectural design questions and inflates this PR's blast radius.
  • Touches backend/app/utils/locale.py (or its caller surface) and /locales/, which the spec's R12 and the ticket's "Out of scope" boundary explicitly forbid.
  • Increases JSON-escape-hell risk for the section system prompt's literal {{ and }} braces and triple-quote contents.

Option C — Hybrid (top-level constants stay externalized; inline strings stay in code)

What: Externalize only the seven top-level prompt constants (PLAN_*, SECTION_*, CHAT_*, TOOL_DESC_*) to /locales/; translate inline f-strings in code in place.

Trade-offs:

  • Captures the largest blocks (highest character count) in a localizable way.
  • Two-tier inconsistency: some prompt content in /locales/, some in code. Future maintainers must trace both.
  • Same R12 violation as Option B (touches /locales/).
  • No precedent in the four sibling i18n efforts already in flight.

4. Implementation Complexity & Risk

  • Effort: M (35 days for one focused engineer). Larger than the 247-char ticket estimate suggested, but smaller than a typical M because the work is mechanical translation with strict guardrails. Most of the work is high-quality English rewriting of the section system prompt (~950 Chinese chars, the largest block in the file), getting reviewer-acceptable phrasing for the "上帝视角" / "未来预演" framing, and verifying that the no-headings instruction stays semantically equivalent.
  • Risk: Low. Familiar tech, established sibling-spec precedent, clear guardrails (R9R12), single file, no new dependencies, no API changes. The only non-trivial risk is a regression in zh quality if a translated prompt drops a structural cue the Chinese version was carrying — mitigated by preserving every interpolation, the JSON schema, the format-contract instructions, and the get_language_instruction() postfix.

5. Recommendations for Design Phase

  • Preferred approach: Option A — in-place translation in backend/app/services/report_agent.py. Rationale: matches the four sibling i18n PRs, smallest blast radius, respects R12.
  • Key decisions to lock in design:
    1. Translation of the Chinese examples inside SECTION_SYSTEM_PROMPT_TEMPLATE (lines 678703): replace with semantically equivalent English illustrations of the same formatting contract (use **bold**, > block quotes, no headings).
    2. Treatment of the "、".join(unused_tools) separator at line 1454 → switch to ", ".join(...) for natural English rendering, since the join result is interpolated into now-English REACT_OBSERVATION_TEMPLATE and REACT_UNUSED_TOOLS_HINT.
    3. Standard English phrasing for the recurring framing terms: 上帝视角 → "all-seeing observer", 未来预演 → "future rehearsal" (or "forecast simulation"), 模拟需求 → "simulation prompt" / "scenario brief", 上下文 → "context". Pick once, use everywhere.
    4. Handling of the _get_tools_description leader (1129): English equivalent "Available tools:" (verified by precedent in _build_context translation in #4).
    5. Treatment of the conflict-handling message (lines 13421346): keep the same two-mode contract, but rephrase in English while preserving the literal <tool_call> tag and 'Final Answer:' mentions.
  • Research items to carry forward:
    1. Confirm that the Final Answer: literal is matched case-sensitively in _generate_section_react (it is — line 1327: "Final Answer:" in response). Translation must keep it byte-for-byte.
    2. Confirm that no tooling outside this file consumes the Chinese fallback outline strings as keys (e.g. translation tables, frontend lookups). Quick grep confirms none do — the strings flow into Report.title / ReportOutline.title only.
    3. Verify after translation that python -m py_compile backend/app/services/report_agent.py passes and that the file's net Chinese-character count drops to the 17 logger lines + docstrings + comments scope (i.e. zero Chinese in any string literal that is sent into an LLM messages array).