13 KiB
13 KiB
Implementation Plan
-
1. Translate the system-prompt base string in
_get_system_prompt- Replace the body of
base_prompt(currently"你是社交媒体用户画像生成专家。生成详细、真实的人设用于舆论模拟,最大程度还原已有现实情况。必须返回有效的JSON格式,所有字符串值不能包含未转义的换行符。") with an English equivalent that preserves the same intent: define the LLM as an expert social-media-persona generator; require detailed, realistic personas grounded in supplied context; require valid JSON output; forbid unescaped newlines in string values - Preserve the trailing
f"{base_prompt}\n\n{get_language_instruction()}"concatenation site exactly - Preserve the
is_individualparameter (still accepted, still unused — no signature change) - Observable completion:
_get_system_prompt(...)returns an English-only base prompt followed by the locale-appropriateget_language_instruction()postfix - Requirements: 1.1, 1.2, 1.3, 1.4
- Replace the body of
-
2. Translate the individual-persona user-message template in
_build_individual_persona_prompt- Replace the introductory line (
"为实体生成详细的社交媒体用户人设,...") with an English equivalent - Replace the field-label rows (
实体名称,实体类型,实体摘要,实体属性,上下文信息) with English equivalents - Replace the
请生成JSON,包含以下字段:enumeration block with an English equivalent that preserves the eight required output keys verbatim by name (bio,persona,age,gender,mbti,country,profession,interested_topics) - Translate the per-field guidance:
biois a 200-character social-media bio;personais a coherent ~2000-character text containing basic info, background, personality (with MBTI), social-media behavior, stance, distinctive traits, and event-specific memories;agemust be an integer;gendermust be the literal English token"male"or"female";mbtiis an MBTI four-letter code;countryis a free-form country name;professionis a free-form occupation;interested_topicsis a list of topics - Replace the trailing
重要:rules block with an English equivalent: all field values must be strings or numbers, no embedded newlines; persona must be a coherent single text block;gendermust use Englishmale/female; content must remain consistent with the entity information;agemust be a valid integer - Preserve the call to
get_language_instruction()interpolated into the rules block - Replace the
attrs_strno-attributes placeholder"无"with"None"(or English equivalent) at line 677 - Replace the
context_strno-context placeholder"无额外上下文"with"No additional context"(or English equivalent) at line 678 - Preserve every f-string interpolation by name and position:
{entity_name},{entity_type},{entity_summary},{attrs_str},{context_str},{get_language_instruction()} - Observable completion:
_build_individual_persona_prompt(...)produces an English-only message body for any input combination, with zero CJK characters in any string literal it contributes; under the same inputs as before, all interpolated values still appear in the rendered output - Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9
- Replace the introductory line (
-
3. Translate the group-persona user-message template in
_build_group_persona_prompt- Replace the introductory line (
"为机构/群体实体生成详细的社交媒体账号设定,...") with an English equivalent - Replace the field-label rows (
实体名称,实体类型,实体摘要,实体属性,上下文信息) with English equivalents (matching task 2) - Replace the
请生成JSON,包含以下字段:enumeration block with an English equivalent that preserves the eight required output keys verbatim by name (bio,persona,age,gender,mbti,country,profession,interested_topics) - Translate the per-field guidance:
biois a polished ~200-character official-account bio;personais a coherent ~2000-character text covering institutional background, account positioning, voice, content patterns, official stance, distinctive traits, and event-specific memories;agemust be the integer literal30;gendermust be the literal English token"other";mbtidescribes account voice;countryis a free-form country name;professionis the institution's role;interested_topicsis a list of focus areas - Replace the trailing
重要:rules block with an English equivalent: all field values must be strings or numbers (no nulls); persona must be a coherent single text block (no embedded newlines);gendermust use English"other";agemust be the integer30; the institutional account's voice must match its identity - Preserve the call to
get_language_instruction()interpolated into the rules block - Replace the
attrs_strandcontext_strplaceholders the same way as in task 2 (lines 726, 727) - Preserve every f-string interpolation by name and position
- Observable completion:
_build_group_persona_prompt(...)produces an English-only message body for any input combination, with zero CJK characters; under the same inputs as before, all interpolated values still appear in the rendered output - Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9
- Replace the introductory line (
-
4. Translate the section labels in
_search_zep_for_entityand_build_entity_context- Replace the related-node prefix
f"相关实体: {node.name}"with an English equivalent (e.g.f"Related entity: {node.name}") at line 384 - Replace the facts block heading
"事实信息:\n"with"Facts:\n"(or equivalent) at line 390 - Replace the related-entities block heading
"相关实体:\n"with"Related entities:\n"(or equivalent) at line 392 - Replace the entity-attributes section heading
"### 实体属性\n"with"### Entity attributes\n"(or equivalent) at line 422 - Replace the inline edge-direction placeholder
(相关实体)with(related entity)(or equivalent) at lines 438 and 440 (both outgoing and incoming branches) - Replace the related-facts/relationships section heading
"### 相关事实和关系\n"with"### Related facts and relationships\n"(or equivalent) at line 443 - Replace the related-entity-information section heading
"### 关联实体信息\n"with"### Related entity information\n"(or equivalent) at line 463 - Replace the Zep-retrieved facts section heading
"### Zep检索到的事实信息\n"with"### Facts retrieved from the graph\n"(or equivalent) at line 472 - Replace the Zep-retrieved related-nodes section heading
"### Zep检索到的相关节点\n"with"### Related nodes retrieved from the graph\n"(or equivalent) at line 475 - Preserve the structure (heading + bulleted body, joined by
"\n".join(...)) - Observable completion: the context string returned by
_build_entity_context(...)contains zero CJK characters in section labels for any input - Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 4.10
- Replace the related-node prefix
-
5. Translate the fallback persona templates
- Replace
f"{entity_name}是一个{entity_type}。"withf"{entity_name} is a {entity_type}."(or equivalent) at line 547 (_generate_profile_with_llm, missing-persona branch) - Replace the same template at line 644 (
_try_fix_json, regex-extraction branch) - Replace the same template at line 659 (
_try_fix_json, catastrophic-failure branch) - Preserve the
entity_summary or templatepriority order at every site - Observable completion: when the LLM fails JSON parse and the fallback template is invoked, the resulting
personavalue is English - Requirements: 5.1, 5.2, 5.3
- Replace
-
6. Translate the console-output formatting in
_print_generated_profileand the surrounding banners- Replace the section headings in
_print_generated_profile:f"【简介】"→ English equivalent (e.g."[Bio]"),f"【详细人设】"→ English equivalent (e.g."[Persona]"),f"【基本属性】"→ English equivalent (e.g."[Basic attributes]") - Replace the row labels in
_print_generated_profile:f"用户名:"→f"Username: {profile.user_name}",f"年龄: {profile.age} | 性别: {profile.gender} | MBTI: {profile.mbti}"→f"Age: {profile.age} | Gender: {profile.gender} | MBTI: {profile.mbti}",f"职业: {profile.profession} | 国家: {profile.country}"→f"Profession: {profile.profession} | Country: {profile.country}",f"兴趣话题: {topics_str}"→f"Interested topics: {topics_str}" - Replace the empty-topics sentinel
'无'with'None'(or equivalent) at line 1011 - Replace the start-of-batch banner in
generate_profiles_from_entities(currentlyf"开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}"at line 945) with an English equivalent (e.g.f"Generating agent profiles — {total} entities, parallel: {parallel_count}") - Replace the end-of-batch banner (currently
f"人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent"at line 1001) with an English equivalent (e.g.f"Profile generation complete — produced {len([p for p in profiles if p])} agents") - Preserve all f-string interpolations
- Preserve the existing
t('progress.profileGenerated', name=entity_name, type=entity_type)call (already locale-keyed) - Observable completion: the console output stream contains zero CJK characters in literals contributed by
_print_generated_profileand the two batch banners (the entity name itself may still contain CJK because it is data, not a literal) - Requirements: 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7
- Replace the section headings in
-
7. Confirm boundary commitments around the translation
- Confirm
logger.warning(...),logger.info(...),logger.error(...),logger.debug(...)calls and theirt("log.profile_generator.*")keys in this file are unchanged - Confirm the module/class/method docstrings and inline comments are unchanged (including lines 65, 93, 641, 804–807, 816–819)
- Confirm
_normalize_gendermapping table (Chinese keys男/女/机构/其他) is unchanged - Confirm the rule-based
country: "中国"default at lines 807, 819 is unchanged - Confirm the
ValueError("LLM_API_KEY 未配置")raise at line 194 is unchanged - Confirm public signatures (
__init__,generate_profile_from_entity,generate_profiles_from_entities,set_graph_id,save_profiles,save_profiles_to_json) and private helper signatures are unchanged - Confirm the
OasisAgentProfiledataclass schema is unchanged - Confirm the LLM call (
response_format={"type": "json_object"},temperature=0.7 - (attempt * 0.1), nomax_tokens) is unchanged - Confirm
backend/app/utils/locale.py,/locales/languages.json,/locales/en.json,/locales/zh.jsonare not modified - Confirm
backend/pyproject.toml,backend/uv.lock, and any file outsidebackend/app/services/oasis_profile_generator.pyare not modified - Observable completion: a
git diffreview againstmainshows changes only insidebackend/app/services/oasis_profile_generator.py, only inside the seven owned regions - Requirements: 7.1, 7.4, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
- Confirm
-
8. Verify CJK-free invariant in the seven owned regions
- Run a one-shot script that imports
OasisProfileGenerator, calls_build_individual_persona_prompt(...),_build_group_persona_prompt(...),_get_system_prompt(...), and_build_entity_context(...)with representative inputs that contain no CJK in the inputs themselves, and asserts the rendered output contains zero matches against the regex[一-鿿] - Manually inspect the seven owned regions in the patched file with a CJK regex (
grep -nP '[\x{4e00}-\x{9fff}]') and confirm there are no remaining matches inside the owned regions - Observable completion: the inspection passes; if it fails, fix the offending region and re-run before completing this task
- Requirements: 1.1, 2.8, 3.8, 4.10, 5.3, 6.6
- Run a one-shot script that imports
-
9. Verify locale-driven output language under both
enandzh- Set the thread-local locale to
enviaset_locale("en"), runOasisProfileGenerator().generate_profile_from_entity(...)against the configured LLM with a small representative entity, and confirm the returnedbioandpersonaare in English - Set the thread-local locale to
zhviaset_locale("zh"), run the same round-trip, and confirm the returnedbioandpersonaare in Chinese, equivalent in quality to the pre-change baseline - Observable completion: both runs succeed; the
enrun is CJK-free inbioandpersona; thezhrun continues to produce Chinese; results recorded in the PR description - Requirements: 7.2, 7.3
- Set the thread-local locale to