Translate the system prompt and the individual / group persona prompt
builders in backend/app/services/oasis_profile_generator.py from
Chinese to English. The base prompt language was biasing persona
prose (bio, persona, profession, interested_topics) toward Chinese
even under Accept-Language: en, despite the existing
get_language_instruction() postfix mechanism. Translating the base
prompts removes that bias.
All locale-steering call sites are preserved verbatim (the inline
{get_language_instruction()} in each builder, the system-prompt
assembly), so non-English locales continue to receive Chinese output
of equivalent quality. Locale-independent constraints stay English
inside the prompt: gender stays the literal "male"/"female" enum
for individuals and "other" for groups; age stays an integer (30
for institutional accounts). The two attrs_str / context_str fallback
defaults ("无", "无额外上下文") are translated to "None" /
"No additional context" so they compose with the English body.
The country-language hint country: 国家(使用中文,如"中国") is
dropped during translation; locale now decides the country language
via the postfix.
Out of scope (untouched): logger calls (issue #6, already merged),
docstrings and comments (issue #7), the rule-based fallback
_generate_profile_rule_based, and the resilience helpers
_fix_truncated_json / _try_fix_json. No public API change, no new
dependencies, no edits outside the target file.
Closes#3
The oasis_profile_generator.py system prompt, both user-message
templates (individual + group personas), the context-builder section
labels embedded into the prompt context, the fallback persona templates,
and the per-batch console output banners were all written in Chinese.
Even when Accept-Language was en, the Chinese base prompt and embedded
section labels biased the LLM toward Chinese persona output.
Translate every owned prompt-assembly literal to English while
preserving all functional contracts: f-string interpolations, the
required JSON output keys, the gender/age literal-token rules, the
get_language_instruction() postfix call sites, the _normalize_gender
mapping (which still accepts Chinese gender keys from upstream),
and the rule-based country: "中国" data default. Logger calls,
docstrings, and inline comments are out of scope (issues #6 / #7)
and were not touched.
Closes#25