14 KiB
14 KiB
Implementation Plan
-
1. Translate the system-prompt builder to English
- Replace the Chinese
base_promptliteral inside_get_system_prompt(currently"你是社交媒体用户画像生成专家。…"at line ~664) with an English rendering that conveys the same role and intent: identifies the model as an expert in social-media user-persona generation, asks for detailed and realistic personas suitable for opinion-simulation that faithfully reflect existing real-world conditions, mandates valid JSON output, and forbids unescaped newlines inside string values - Preserve the assembled return shape
f"{base_prompt}\n\n{get_language_instruction()}"exactly — the call toget_language_instruction()is unchanged in name and position - Preserve the method signature
_get_system_prompt(self, is_individual: bool) -> str; do not branch onis_individual(current behaviour preserved) - Observable completion:
_get_system_prompt(True)and_get_system_prompt(False)both return non-empty English strings ending with the per-locale postfix fromget_language_instruction(); thebase_promptbody contains zero CJK characters - Requirements: 1.1, 1.2, 1.3, 1.4
- Replace the Chinese
-
2. Translate the individual-persona user-message builder to English
- Replace the Chinese f-string body inside
_build_individual_persona_prompt(currently lines ~680–714) with an English rendering structured as: a lead sentence requesting a detailed social-media persona faithful to existing reality; an entity-context block with English labels forentity_name,entity_type,entity_summary,entity_attributes; aContext information:block; aGenerate JSON with the following fields:enumeration of the eight output keys (bio,persona,age,gender,mbti,country,profession,interested_topics); and a trailingImportant:rules block - Translate the field-level descriptions verbatim in spirit:
bio≈ 200 chars;persona≈ 2000 chars covering basic info (age, profession, education, location), background (notable experience, event association, social ties), personality (MBTI, core traits, emotional expression), social-media behaviour (posting frequency, content preferences, interaction style, language traits), stance (attitudes toward the topic, emotional triggers), unique features (catchphrases, special experiences, hobbies), and personal memory (the entity's relation to the event and prior actions/reactions);ageinteger;genderMUST be the literal"male"or"female";mbtifour-letter type;countrycountry name;profession;interested_topicsarray - Translate the trailing rules block to English while keeping every locale-independent constraint intact: all values are strings or numbers;
personais a single coherent text without unescaped newlines; the inline{get_language_instruction()}call remains followed by the parenthetical reminder thatgenderMUST use the English values"male"/"female"; content stays consistent with the entity;ageMUST be a valid integer - Replace the
attrs_strandcontext_strChinese fallback defaults with English:"无"→"None"(used whenentity_attributesis empty/falsy) and"无额外上下文"→"No additional context"(used whencontextis empty/falsy) - Drop the country-language hint
(使用中文,如"中国")soget_language_instruction()steers the country language; preserve the country line as a neutralcountry: country nameentry - Preserve every f-string interpolation by name and position:
{entity_name},{entity_type},{entity_summary},{attrs_str},{context_str},{get_language_instruction()} - Preserve the
context[:3000]truncation behaviour and the method signature_build_individual_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str - Observable completion: calling
_build_individual_persona_prompt("Alice", "Student", "summary", {"k": "v"}, "ctx")returns a non-empty English string with all six interpolations resolved, with zero CJK characters in any literal contributed by this method, and the string contains thegenderenum lock-in"male"/"female"exactly once - Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 4.1, 4.5
- Replace the Chinese f-string body inside
-
3. Translate the group/institution-persona user-message builder to English
- Replace the Chinese f-string body inside
_build_group_persona_prompt(currently lines ~729–762) with an English rendering structured the same way as Task 2 but adapted for institutional voice: lead sentence requesting a detailed social-media account profile for an institution/group faithful to existing reality; entity-context block;Context information:block;Generate JSON with the following fields:enumeration of the eight output keys; trailingImportant:rules block - Translate the field-level descriptions verbatim in spirit:
bio≈ 200 chars in an official-account voice;persona≈ 2000 chars covering institutional basics (formal name, type, founding background, primary functions), account positioning (account type, target audience, core function), voice (language traits, common phrasing, taboo topics), publishing pattern (content types, publishing frequency, active hours), stance (official position on the core topic, controversy-handling style), special notes (group portrait represented, operational habits), and institutional memory (the institution's relation to the event and prior actions/reactions);ageMUST be the integer30;genderMUST be the literal"other";mbtifour-letter type characterizing account voice;country;professiondescribes institutional function;interested_topicsarray - Translate the trailing rules block to English while keeping every locale-independent constraint intact: all values are strings or numbers, no
nullallowed;personais a single coherent text without unescaped newlines; the inline{get_language_instruction()}call remains followed by the parenthetical reminder thatgenderMUST use the English value"other";ageMUST be the integer30andgenderMUST be the string"other"; account voice must match identity positioning - Replace the
attrs_strandcontext_strChinese fallback defaults with the same English replacements applied in Task 2 ("None"and"No additional context") - Drop the country-language hint as in Task 2
- Preserve every f-string interpolation by name and position:
{entity_name},{entity_type},{entity_summary},{attrs_str},{context_str},{get_language_instruction()} - Preserve the
context[:3000]truncation behaviour and the method signature_build_group_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str - Observable completion: calling
_build_group_persona_prompt("ACME Corp", "Organization", "summary", {"k": "v"}, "ctx")returns a non-empty English string with all six interpolations resolved, with zero CJK characters in any literal contributed by this method, and the string contains both theage == 30lock-in and thegender == "other"lock-in - Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.1, 4.5
- Replace the Chinese f-string body inside
-
4. Confirm boundary commitments around the translation
- Confirm every existing
get_language_instruction()call site is preserved verbatim: the system-prompt assembly inside_get_system_prompt, the inline call inside the trailing rules block of_build_individual_persona_prompt, and the inline call inside the trailing rules block of_build_group_persona_prompt - Confirm the locale-thread plumbing in
generate_profiles_for_entities(capturecurrent_locale = get_locale()at line ~910 andset_locale(current_locale)inside the worker at line ~914) is byte-identical - Confirm the public signatures of
OasisProfileGenerator.__init__,generate_profile_from_entity,generate_profiles_for_entities,set_graph_id, and the private helpers_call_llm_with_retry,_generate_profile_rule_based,_print_generated_profile,_fix_truncated_json,_try_fix_json,_save_twitter_csv,_save_reddit_json,_generate_usernameare unchanged - Confirm the
OasisAgentProfiledataclass field set, default values, and theto_reddit_format,to_twitter_format,to_full_dictserializers are unchanged - Confirm class constants
MBTI_TYPES,COUNTRIES,INDIVIDUAL_ENTITY_TYPES,GROUP_ENTITY_TYPESare unchanged - Confirm the LLM invocation parameters at the call site that consumes the translated prompts (
response_format={"type": "json_object"},temperature=0.7 - (attempt * 0.1),max_attempts=3) are unchanged - Confirm
_fix_truncated_jsonand_try_fix_json(including their Chinese persona fragments such asf"{entity_name}是一个{entity_type}。") are not modified — these are runtime data fallbacks, not prompts, and are out of scope - Confirm
_generate_profile_rule_basedis not modified — including its Chinese country defaults"中国"at lines ~807 and ~819 - Confirm
backend/app/utils/locale.py,/locales/languages.json,/locales/en.json, and/locales/zh.jsonare not modified - Confirm
logger.warning(...),logger.info(...),logger.error(...), the print banner at line ~945, module / class / method docstrings, and inline comments inoasis_profile_generator.pyare not modified (owned by issues #6 and #7) - Confirm
backend/scripts/test_profile_format.py,backend/pyproject.toml,backend/uv.lock, and any file outsidebackend/app/services/oasis_profile_generator.pyare not modified - Observable completion: a
git diffreview againstmainshows changes only insidebackend/app/services/oasis_profile_generator.py, only inside_get_system_prompt,_build_individual_persona_prompt,_build_group_persona_prompt, and the surrounding lines (method headers, neighbouring methods) are byte-identical - Requirements: 1.4, 2.6, 3.6, 4.1, 4.2, 4.6, 5.1, 5.2, 5.3, 5.4, 6.1, 6.3, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6
- Confirm every existing
-
5. Verify smoke import and OASIS profile-format pytest
- Run
cd backend && uv run python -c "from app.services.oasis_profile_generator import OasisProfileGenerator, OasisAgentProfile"and confirm it exits 0 (catches f-string syntax errors) - Run
cd backend && uv run python -m pytest backend/scripts/test_profile_format.py(or equivalent invocation per project convention) and confirm it passes — the test does not exercise prompts, so a pure-translation diff must keep it green - Construct an instance of
OasisProfileGenerator(usingOasisProfileGenerator.__new__(OasisProfileGenerator)to skip__init__if the LLM key is unavailable, mirroring the pattern intest_profile_format.py) and confirm_get_system_prompt(True),_build_individual_persona_prompt("Alice", "Student", "summary", {"k": "v"}, "ctx"), and_build_group_persona_prompt("ACME", "Organization", "summary", {"k": "v"}, "ctx")each return a string with zero CJK matches against the regex[一-鿿] - Observable completion: smoke import exits 0; pytest passes with zero regressions; the three prompt-builder calls each produce English-only output under the default
zhlocale (theget_language_instruction()postfix at the end is the only place where Chinese is allowed to appear, and only when locale iszh) - Requirements: 6.4, 7.1, 7.2, 7.3, 7.4
- Run
-
6. Verify locale-driven output language under both
enandzh- With the thread-local locale forced via
set_locale("en"), render each of the three builders against representative inputs and confirm: each output contains zero CJK characters; each ends with the English locale postfix"Please respond in English."; thegenderenum constraint appears as English"male"/"female"(individual) or"other"(group) - With
set_locale("zh"), render the same three builders and confirm: the per-prompt body remains English-only (the translated base prompt does not depend on locale); each ends with the Chinese locale postfix"请使用中文回答。"; thegenderenum constraint still appears as the English literal values - Optionally, with a configured LLM key, run
OasisProfileGenerator().generate_profile_from_entity(...)end-to-end under each locale against a syntheticEntityNodeand spot-check that the producedbio,persona,professionare English underenand Chinese underzh, whilegenderis one of the three English enum literals under both - Observable completion: the locale-
enrendering is CJK-free in the prompt body and ends with the English locale postfix; the locale-zhrendering preserves the prompt body in English and ends with the Chinese locale postfix; if the LLM round-trip is exercised, results are recorded in the PR description - Requirements: 4.3, 4.4, 4.5
- With the thread-local locale forced via
-
7. Final CJK regression sweep on the three builders
- Run a regex audit limited to the three method bodies (
_get_system_prompt,_build_individual_persona_prompt,_build_group_persona_prompt) using the project-level CJK guard regex ([一-鿿]) and confirm zero matches inside their string literals - Run a CJK audit on the rendered output of the three builders for representative inputs and confirm zero matches in the prompt body (the locale postfix is excluded — its Chinese form is a deliberate kept use under
zh) - Confirm the file-level
git grep -nE '[\\x{4e00}-\\x{9fff}]' -- backend/app/services/oasis_profile_generator.pyoutput still flags only known out-of-scope locations: docstrings, comments, logger keys, rule-based fallback country"中国"defaults, and resilience-helper Chinese fragments — and does not flag any line inside the three translated method bodies - Observable completion: the targeted regex audit returns zero matches inside the three method bodies; the file-level audit's residual CJK lines all fall outside the three method bodies and match the out-of-scope inventory in
design.md§ Boundary Commitments → Out of Boundary - Requirements: 1.1, 2.8, 3.8, 8.1, 8.2, 8.3
- Run a regex audit limited to the three method bodies (