11 KiB

Raw Blame History

Research & Design Decisions — i18n-oasis-profile-generator-prompts

Summary

Feature: i18n-oasis-profile-generator-prompts
Discovery Scope: Extension (single-file translation in an existing brownfield service; sibling pattern already merged in #2, #4, #5)
Key Findings:
- The existing get_language_instruction() postfix mechanism (defined in backend/app/utils/locale.py) is the project-canonical way to steer LLM output language. Translating the base prompt does not interfere with it and is the same approach taken in already-merged sibling specs.
- The only Chinese surfaces inside the prompt-rendering path are _get_system_prompt, _build_individual_persona_prompt, _build_group_persona_prompt, and the four attrs_str/context_str fallback literals ("无", "无额外上下文"). All other Chinese in the file is logger keys (already done by #6), docstrings/comments (out-of-scope, #7), or rule-based fallback data (out-of-scope).
- backend/scripts/test_profile_format.py does not exercise prompts; it only constructs OasisAgentProfile and round-trips through _save_twitter_csv / _save_reddit_json. A pure-translation diff cannot break it.

Research Log

Locale steering mechanism

Context: Confirm that translating the base prompt does not regress Chinese output under Accept-Language: zh.
Sources Consulted:
- backend/app/utils/locale.py (lines 50–96).
- locales/languages.json (entries for en and zh with llmInstruction field).
- Sibling spec i18n-ontology-generator-prompts/design.md and the merged commits referenced by it.
Findings:
- get_language_instruction() returns Please respond in English. for locale en, 请使用中文回答。 for locale zh.
- The function is called as an inline f-string interpolation in the individual-persona and group-persona prompt bodies, and explicitly appended in _get_system_prompt. All three sites must be preserved byte-for-byte.
- The thread-local locale is captured in generate_profiles_for_entities (line ~910) and restored inside the worker via set_locale(current_locale) (line ~914). This plumbing is untouched by the change.
Implications:
- Design lock-in: the inline {get_language_instruction()} call must remain in each of the three builders. Removing or renaming it would silently regress non-English locales.
- The Chinese hint country: 国家（使用中文，如"中国"） in the original prompt overrides the locale postfix and forces Chinese output for one field. The English translation drops that hint so the locale postfix decides the country language. The rule-based fallback (out of scope) has its own (Chinese) defaults and is not affected.

Test contract

Context: Verify that backend/scripts/test_profile_format.py remains green after a prompt-only translation.
Sources Consulted: backend/scripts/test_profile_format.py, oasis_profile_generator.py:_save_twitter_csv, oasis_profile_generator.py:_save_reddit_json, oasis_profile_generator.py:to_reddit_format, oasis_profile_generator.py:to_twitter_format.
Findings:
- The pytest function test_profile_formats constructs OasisAgentProfile instances directly without invoking the LLM.
- It calls _save_twitter_csv and _save_reddit_json to verify CSV and JSON shape. Required CSV header: user_id, user_name, name, bio, friend_count, follower_count, statuses_count, created_at. Required JSON keys: realname, username, bio, persona.
Implications:
- Translating prompts cannot regress this test. The validation requirement (Requirement 7) is satisfied automatically as long as serializer code is not edited.
- No new tests are required for this change.

Sibling specs already shipped

Context: Confirm there is an established project pattern this work must mirror.
Sources Consulted:
- .kiro/specs/i18n-ontology-generator-prompts/{design,tasks,requirements}.md
- .kiro/specs/i18n-report-agent-prompts/
- .kiro/specs/i18n-simulation-config-generator-prompts/
- Recent merged commits referencing #2, #4, #5.
Findings:
- All three siblings used a single-file in-place translation diff.
- All three preserved every get_language_instruction() call site.
- All three left logger calls and docstrings to companion issues (#6 / #7).
- None externalized prompts to /locales/*.json.
Implications:
- The same approach is correct here. Reviewer expectations are set by the sibling diffs.

OASIS profile schema

Context: Verify that translated prompts continue to satisfy the OASIS subprocess's expected schema (especially gender enum and age integer).
Sources Consulted: OasisAgentProfile dataclass, to_reddit_format, to_twitter_format, sibling _generate_profile_rule_based.
Findings:
- OASIS-required fields are produced by serializers, not by the prompt: user_id, username, name, bio, karma/friend_count/follower_count/statuses_count, created_at.
- The prompt-defined fields land in optional positions: age, gender, mbti, country, profession, interested_topics.
- The gender enum constraint ("male"/"female" for individuals, "other" for groups) is locale-independent and must remain in English text inside the translated prompt.
Implications:
- The English prompt must explicitly call out gender ∈ {male, female} (individual) and gender == "other" (group), independent of the get_language_instruction() postfix.

Architecture Pattern Evaluation

Option	Description	Strengths	Risks / Limitations	Notes
A — In-place builder edit	Translate three method bodies + four fallback literals directly	Smallest diff; matches sibling pattern; zero API change	None of note	Selected
B — Module-level constants	Hoist prompts to `INDIVIDUAL_PERSONA_PROMPT_TEMPLATE` etc.	Easier `git grep`	Larger diff; the inline `{get_language_instruction()}` call would need to become a `.format()` kwarg, which is a behavioural change beyond translation	Diverges from #4 / #5
C — Externalize to `locales/*.json`	Move every prompt sentence into `t(...)` keys	Most i18n-pure	Three-file diff; diverges from project rationale (prompts use postfix mechanism, not key files)	Rejected

Design Decisions

Decision: In-place edit of the three prompt builders (Option A)

Context: Three methods build prompt strings; one of them is a one-line system prompt, the other two are large f-string templates with embedded {variable} interpolations and an inline {get_language_instruction()} call.
Alternatives Considered:
1. Option B — module-level constants.
2. Option C — externalize to /locales/*.json keys.
Selected Approach: Translate each method body in place. Replace the four "无" / "无额外上下文" fallbacks with English equivalents ("None" and "No additional context"). Preserve all {...} interpolations and the inline {get_language_instruction()} call.
Rationale: Matches merged sibling specs verbatim. Smallest review surface. Zero API change. Out-of-scope surfaces (logger, docstrings, rule-based fallback) cleanly avoided.
Trade-offs: Leaves the file mixed-language in non-prompt parts (docstrings, rule fallback) until #7 lands. Acceptable per scope split.
Follow-up: During implementation, run a regex audit for any Chinese codepoints inside the three method bodies after the edit and confirm the diff stays within backend/app/services/oasis_profile_generator.py.

Decision: Drop the "use Chinese country names" hint

Context: The current prompt at line 704 reads country: 国家（使用中文，如"中国"） and at line 753 country: 国家（使用中文，如"中国"）. This forces Chinese for the country field even under Accept-Language: en.
Alternatives Considered:
1. Translate to English literally: country: country (use English, e.g. "China").
2. Drop the language hint entirely: country: country name string.
Selected Approach: Drop the language hint. Let get_language_instruction() steer the country language alongside every other free-text field.
Rationale: Hard-coding a language in the prompt defeats the locale-steering mechanism. The rule-based fallback (out of scope) carries its own Chinese defaults; under the LLM path, locale should decide.
Trade-offs: Under Accept-Language: zh, the LLM may produce a Chinese country name (e.g. 中国) — this is the desired behaviour. Under Accept-Language: en, the LLM produces English (China), matching COUNTRIES = ["China", "US", ...] already in the file.
Follow-up: Verify in the validation phase that a sample run under locale en produces an English country name.

Decision: Keep `gender` enum constraint in English inside the prompt

Context: gender must be one of "male"/"female"/"other" regardless of locale, because OASIS consumers and the _generate_profile_rule_based fallback assume English values.
Alternatives Considered: None — the constraint is a contract.
Selected Approach: The translated prompt explicitly states the enum in English, even when the locale postfix asks for Chinese output: gender MUST be one of "male" or "female" (English literal).
Rationale: Same as the existing Chinese prompt (which already states 必须是英文: "male" 或 "female"). The translation preserves the same lock-in.
Trade-offs: None.
Follow-up: Validation phase will check that under both locales the produced gender is one of the three English literals.

Risks & Mitigations

Risk: Mistranslation drops a locale-independent constraint (e.g. gender enum, age integer rule, persona no-newline rule).
- Mitigation: The implementation task list will enumerate every constraint inline so reviewers can check by diff.
Risk: Variable-name typo inside an f-string causes a KeyError at runtime.
- Mitigation: Implementation task verifies that the set of {variable} interpolations in each translated block matches the pre-change set 1:1; a python -c "import ..." smoke import and a pytest backend/scripts/test_profile_format.py run are mandatory.
Risk: Accidentally leaving a CJK codepoint inside the three builders.
- Mitigation: Final implementation step runs the project's repo-level CJK guard regex (added by #26) constrained to the three builders' line ranges.

References

backend/app/services/oasis_profile_generator.py — target file.
backend/app/utils/locale.py — locale infrastructure.
locales/languages.json, locales/en.json, locales/zh.json — locale registries.
.kiro/specs/i18n-ontology-generator-prompts/ — sibling spec #2.
.kiro/specs/i18n-simulation-config-generator-prompts/ — sibling spec #4.
.kiro/specs/i18n-report-agent-prompts/ — sibling spec #5.
GitHub issue #3.

11 KiB Raw Blame History Unescape Escape

Research & Design Decisions — i18n-oasis-profile-generator-prompts

Summary

Research Log

Locale steering mechanism

Test contract

Sibling specs already shipped

OASIS profile schema

Architecture Pattern Evaluation

Design Decisions

Decision: In-place edit of the three prompt builders (Option A)

Decision: Drop the "use Chinese country names" hint

Decision: Keep gender enum constraint in English inside the prompt

Risks & Mitigations

References

11 KiB

Raw Blame History

Decision: Keep `gender` enum constraint in English inside the prompt