11 KiB
11 KiB
Research & Design Decisions — i18n-oasis-profile-generator-prompts
Summary
- Feature:
i18n-oasis-profile-generator-prompts - Discovery Scope: Extension (single-file translation in an existing brownfield service; sibling pattern already merged in #2, #4, #5)
- Key Findings:
- The existing
get_language_instruction()postfix mechanism (defined inbackend/app/utils/locale.py) is the project-canonical way to steer LLM output language. Translating the base prompt does not interfere with it and is the same approach taken in already-merged sibling specs. - The only Chinese surfaces inside the prompt-rendering path are
_get_system_prompt,_build_individual_persona_prompt,_build_group_persona_prompt, and the fourattrs_str/context_strfallback literals ("无","无额外上下文"). All other Chinese in the file is logger keys (already done by #6), docstrings/comments (out-of-scope, #7), or rule-based fallback data (out-of-scope). backend/scripts/test_profile_format.pydoes not exercise prompts; it only constructsOasisAgentProfileand round-trips through_save_twitter_csv/_save_reddit_json. A pure-translation diff cannot break it.
- The existing
Research Log
Locale steering mechanism
- Context: Confirm that translating the base prompt does not regress
Chinese output under
Accept-Language: zh. - Sources Consulted:
backend/app/utils/locale.py(lines 50–96).locales/languages.json(entries forenandzhwithllmInstructionfield).- Sibling spec
i18n-ontology-generator-prompts/design.mdand the merged commits referenced by it.
- Findings:
get_language_instruction()returnsPlease respond in English.for localeen,请使用中文回答。for localezh.- The function is called as an inline f-string interpolation in the
individual-persona and group-persona prompt bodies, and explicitly
appended in
_get_system_prompt. All three sites must be preserved byte-for-byte. - The thread-local locale is captured in
generate_profiles_for_entities(line ~910) and restored inside the worker viaset_locale(current_locale)(line ~914). This plumbing is untouched by the change.
- Implications:
- Design lock-in: the inline
{get_language_instruction()}call must remain in each of the three builders. Removing or renaming it would silently regress non-English locales. - The Chinese hint
country: 国家(使用中文,如"中国")in the original prompt overrides the locale postfix and forces Chinese output for one field. The English translation drops that hint so the locale postfix decides the country language. The rule-based fallback (out of scope) has its own (Chinese) defaults and is not affected.
- Design lock-in: the inline
Test contract
- Context: Verify that
backend/scripts/test_profile_format.pyremains green after a prompt-only translation. - Sources Consulted:
backend/scripts/test_profile_format.py,oasis_profile_generator.py:_save_twitter_csv,oasis_profile_generator.py:_save_reddit_json,oasis_profile_generator.py:to_reddit_format,oasis_profile_generator.py:to_twitter_format. - Findings:
- The pytest function
test_profile_formatsconstructsOasisAgentProfileinstances directly without invoking the LLM. - It calls
_save_twitter_csvand_save_reddit_jsonto verify CSV and JSON shape. Required CSV header:user_id, user_name, name, bio, friend_count, follower_count, statuses_count, created_at. Required JSON keys:realname, username, bio, persona.
- The pytest function
- Implications:
- Translating prompts cannot regress this test. The validation requirement (Requirement 7) is satisfied automatically as long as serializer code is not edited.
- No new tests are required for this change.
Sibling specs already shipped
- Context: Confirm there is an established project pattern this work must mirror.
- Sources Consulted:
.kiro/specs/i18n-ontology-generator-prompts/{design,tasks,requirements}.md.kiro/specs/i18n-report-agent-prompts/.kiro/specs/i18n-simulation-config-generator-prompts/- Recent merged commits referencing #2, #4, #5.
- Findings:
- All three siblings used a single-file in-place translation diff.
- All three preserved every
get_language_instruction()call site. - All three left logger calls and docstrings to companion issues (#6 / #7).
- None externalized prompts to
/locales/*.json.
- Implications:
- The same approach is correct here. Reviewer expectations are set by the sibling diffs.
OASIS profile schema
- Context: Verify that translated prompts continue to satisfy the
OASIS subprocess's expected schema (especially
genderenum andageinteger). - Sources Consulted:
OasisAgentProfiledataclass,to_reddit_format,to_twitter_format, sibling_generate_profile_rule_based. - Findings:
- OASIS-required fields are produced by serializers, not by the
prompt:
user_id,username,name,bio,karma/friend_count/follower_count/statuses_count,created_at. - The prompt-defined fields land in optional positions:
age,gender,mbti,country,profession,interested_topics. - The
genderenum constraint ("male"/"female"for individuals,"other"for groups) is locale-independent and must remain in English text inside the translated prompt.
- OASIS-required fields are produced by serializers, not by the
prompt:
- Implications:
- The English prompt must explicitly call out
gender ∈ {male, female}(individual) andgender == "other"(group), independent of theget_language_instruction()postfix.
- The English prompt must explicitly call out
Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|---|---|---|---|---|
| A — In-place builder edit | Translate three method bodies + four fallback literals directly | Smallest diff; matches sibling pattern; zero API change | None of note | Selected |
| B — Module-level constants | Hoist prompts to INDIVIDUAL_PERSONA_PROMPT_TEMPLATE etc. |
Easier git grep |
Larger diff; the inline {get_language_instruction()} call would need to become a .format() kwarg, which is a behavioural change beyond translation |
Diverges from #4 / #5 |
C — Externalize to locales/*.json |
Move every prompt sentence into t(...) keys |
Most i18n-pure | Three-file diff; diverges from project rationale (prompts use postfix mechanism, not key files) | Rejected |
Design Decisions
Decision: In-place edit of the three prompt builders (Option A)
- Context: Three methods build prompt strings; one of them is a
one-line system prompt, the other two are large f-string templates
with embedded
{variable}interpolations and an inline{get_language_instruction()}call. - Alternatives Considered:
- Option B — module-level constants.
- Option C — externalize to
/locales/*.jsonkeys.
- Selected Approach: Translate each method body in place. Replace
the four
"无"/"无额外上下文"fallbacks with English equivalents ("None"and"No additional context"). Preserve all{...}interpolations and the inline{get_language_instruction()}call. - Rationale: Matches merged sibling specs verbatim. Smallest review surface. Zero API change. Out-of-scope surfaces (logger, docstrings, rule-based fallback) cleanly avoided.
- Trade-offs: Leaves the file mixed-language in non-prompt parts (docstrings, rule fallback) until #7 lands. Acceptable per scope split.
- Follow-up: During implementation, run a regex audit for any
Chinese codepoints inside the three method bodies after the edit and
confirm the diff stays within
backend/app/services/oasis_profile_generator.py.
Decision: Drop the "use Chinese country names" hint
- Context: The current prompt at line 704 reads
country: 国家(使用中文,如"中国")and at line 753country: 国家(使用中文,如"中国"). This forces Chinese for thecountryfield even underAccept-Language: en. - Alternatives Considered:
- Translate to English literally:
country: country (use English, e.g. "China"). - Drop the language hint entirely:
country: country name string.
- Translate to English literally:
- Selected Approach: Drop the language hint. Let
get_language_instruction()steer the country language alongside every other free-text field. - Rationale: Hard-coding a language in the prompt defeats the locale-steering mechanism. The rule-based fallback (out of scope) carries its own Chinese defaults; under the LLM path, locale should decide.
- Trade-offs: Under
Accept-Language: zh, the LLM may produce a Chinese country name (e.g.中国) — this is the desired behaviour. UnderAccept-Language: en, the LLM produces English (China), matchingCOUNTRIES = ["China", "US", ...]already in the file. - Follow-up: Verify in the validation phase that a sample run under
locale
enproduces an English country name.
Decision: Keep gender enum constraint in English inside the prompt
- Context:
gendermust be one of"male"/"female"/"other"regardless of locale, because OASIS consumers and the_generate_profile_rule_basedfallback assume English values. - Alternatives Considered: None — the constraint is a contract.
- Selected Approach: The translated prompt explicitly states the
enum in English, even when the locale postfix asks for Chinese
output:
gender MUST be one of "male" or "female" (English literal). - Rationale: Same as the existing Chinese prompt (which already
states
必须是英文: "male" 或 "female"). The translation preserves the same lock-in. - Trade-offs: None.
- Follow-up: Validation phase will check that under both locales
the produced
genderis one of the three English literals.
Risks & Mitigations
- Risk: Mistranslation drops a locale-independent constraint
(e.g.
genderenum,ageinteger rule,personano-newline rule).- Mitigation: The implementation task list will enumerate every constraint inline so reviewers can check by diff.
- Risk: Variable-name typo inside an f-string causes a
KeyErrorat runtime.- Mitigation: Implementation task verifies that the set of
{variable}interpolations in each translated block matches the pre-change set 1:1; apython -c "import ..."smoke import and apytest backend/scripts/test_profile_format.pyrun are mandatory.
- Mitigation: Implementation task verifies that the set of
- Risk: Accidentally leaving a CJK codepoint inside the three
builders.
- Mitigation: Final implementation step runs the project's repo-level CJK guard regex (added by #26) constrained to the three builders' line ranges.
References
backend/app/services/oasis_profile_generator.py— target file.backend/app/utils/locale.py— locale infrastructure.locales/languages.json,locales/en.json,locales/zh.json— locale registries..kiro/specs/i18n-ontology-generator-prompts/— sibling spec #2..kiro/specs/i18n-simulation-config-generator-prompts/— sibling spec #4..kiro/specs/i18n-report-agent-prompts/— sibling spec #5.- GitHub issue #3.