21 KiB
Design Document — i18n-oasis-profile-generator-prompts
Overview
Purpose: Translate the Chinese prompt strings, context-builder section labels, fallback persona templates, and console-output formatting in backend/app/services/oasis_profile_generator.py to English while preserving every functional contract — LLM JSON output schema, the _normalize_gender mapping that must continue to accept Chinese gender values, the _generate_profile_rule_based default country: "中国" data value, all f-string interpolations, and the get_language_instruction() locale-postfix mechanism. The goal is to remove the Chinese-language base-prompt and context-label bias that currently leaks Chinese structure and word choice into OASIS profile output even when Accept-Language: en.
Users: MiroFish operators running the Step 2 OASIS profile generation under any locale; downstream OASIS / CAMEL-OASIS consumers of the agent JSON / CSV produced by OasisProfileGenerator.
Impact: Replaces approximately one base-prompt string, two large user-message templates, four context-builder section labels, three fallback persona templates, and ten console-output strings with English equivalents inside one file. No API surface change. No new dependencies. No new files. Callers (backend/app/api/simulation.py, etc.) and OASIS consumers are unaffected.
Goals
- Zero CJK characters in any prompt string literal contributed by
oasis_profile_generator.pyto the system prompt, the user message, or the context block. - Zero CJK characters in any console-output literal in
_print_generated_profileand the surrounding banners. - English
bio/personaoutput underAccept-Language: en. - Continued Chinese
bio/personaoutput underAccept-Language: zh, of equivalent quality to the pre-change behaviour. - No diff to public signatures, dataclass schema, LLM-call parameters, or call sites.
Non-Goals
- Externalizing prompts to
/locales/*.json(out of scope per ticket and consistent withi18n-ontology-generator-prompts). - Translating logger calls in this file (covered by issue #6).
- Translating module/class/method docstrings or inline comments in this file (covered by issue #7).
- Refactoring the OASIS profile JSON schema, the OASIS adapter, or the simulation flow.
- Modifying the
_normalize_gendermapping table (it must keep accepting Chinese gender keys). - Modifying the
_generate_profile_rule_baseddefault"中国"country value (data, not prompt). - Modifying the
ValueError("LLM_API_KEY 未配置")raise (covered by issue #6). - Modifying
backend/app/utils/locale.py, the locale registries, or any non-target file.
Boundary Commitments
This Spec Owns
- The English content of the
base_promptstring inOasisProfileGenerator._get_system_prompt(line 664). - The English content of every string literal in
OasisProfileGenerator._build_individual_persona_prompt(lines 677–714). - The English content of every string literal in
OasisProfileGenerator._build_group_persona_prompt(lines 726–762). - The English content of the section-label literals embedded in
OasisProfileGenerator._search_zep_for_entity(lines 384, 390, 392) andOasisProfileGenerator._build_entity_context(lines 422, 438, 440, 443, 463, 472, 475). - The English content of the fallback persona templates in
OasisProfileGenerator._generate_profile_with_llm(line 547) andOasisProfileGenerator._try_fix_json(lines 644, 659). - The English content of the no-attributes / no-context placeholder literals (
"无","无额外上下文") at lines 677, 678, 726, 727. - The English content of every string literal in
OasisProfileGenerator._print_generated_profile(lines 1011, 1017, 1019, 1022, 1025, 1026, 1027, 1028) and the surrounding banners inOasisProfileGenerator.generate_profiles_from_entities(lines 945, 1001).
Out of Boundary
- Locale resolution machinery (
backend/app/utils/locale.py). - Per-locale
llmInstructiondefinitions (/locales/languages.json). - Reasoning-model output stripping (
backend/app/utils/llm_client.py). - All
logger.*calls (already keyed viat("log.profile_generator.*"); covered by issue #6). - Module / class / method docstrings and inline comments (covered by issue #7), including the inline comments at lines 65, 93, 641, 804–807, 816–819.
- The
_normalize_gendermapping table (lines 1123–1132) — must continue to accept Chinese gender keys from upstream. - The hard-coded
country: "中国"default in_generate_profile_rule_based(lines 807, 819) — this is a data value, not a prompt. - The
ValueError("LLM_API_KEY 未配置")raise (line 194) — covered by issue #6. - All callers of
OasisProfileGenerator, includingbackend/app/api/simulation.py. - Tests, scripts, and frontend code.
Allowed Dependencies
- Existing
get_language_instruction,get_locale,set_locale,timports from..utils.locale(already imported; unchanged). - Existing
OpenAISDK invocation (unchanged). - No new imports.
Revalidation Triggers
The following changes elsewhere would invalidate this design and require revisiting the prompt:
- A change to the JSON contract emitted by the LLM (
bio,persona,age,gender,mbti,country,profession,interested_topics). - A change to
OasisAgentProfilefield semantics. - A change to
get_language_instruction()semantics or the per-localellmInstructionstrings. - A change to OASIS / CAMEL-OASIS profile field expectations (e.g. if
genderaccepts more thanmale/female/other).
Architecture
Existing Architecture Analysis
OasisProfileGenerator lives in backend/app/services/, follows the in-process service pattern with bounded thread-pool fan-out for batched profile generation, and is invoked from backend/app/api/simulation.py inside a background Task. It depends on:
OpenAISDK for the LLM call.GraphitiAdapter(legacyzep_clientfield name) for the Zep / Graphiti graph search.get_language_instruction()for locale steering.t()for already-keyed log strings.
The relevant flow is:
- The Flask handler resolves the request locale via
Accept-Language; the locale is propagated to thread-pool workers via theset_locale(current_locale)capture ingenerate_profiles_from_entities(line 914). - For each entity,
_build_entity_context()is called: it composes a context block by concatenating headed sub-sections (entity attributes, related facts/edges, related node summaries, Graphiti-search facts, Graphiti-search nodes). Some of these labels are currently in Chinese. - The context string is interpolated into the user-message template by either
_build_individual_persona_promptor_build_group_persona_prompt. Both templates are currently in Chinese, with Englishgendertoken directives interleaved. - The system prompt is built by
_get_system_prompt: a Chinese base prompt followed by the locale-appropriateget_language_instruction(). - The two messages are sent to
chat.completions.createwithresponse_format={"type": "json_object"}. The result flows throughjson.loads→_try_fix_json→_fix_truncated_jsonfallback chain. Synthesized fallback personas use the Chinese templatef"{entity_name}是一个{entity_type}。"if the LLM result is unusable. - After per-profile completion,
_print_generated_profilewrites a Chinese-headed banner to stdout, andgenerate_profiles_from_entitieswrites Chinese batch banners.
This design preserves all of the above structurally. The change is purely lexical inside the seven regions of one file.
Architecture Pattern & Boundary Map
graph TB
Caller[simulation.py handler]
Generator[OasisProfileGenerator]
Locale[locale.get_language_instruction]
Graph[GraphitiAdapter graph.search]
LLM[OpenAI chat.completions]
Caller -->|generate_profiles_from_entities| Generator
Generator -->|build context block| Generator
Generator -->|read locale postfix| Locale
Generator -->|search facts/nodes| Graph
Generator -->|JSON request| LLM
LLM -->|raw JSON| Generator
Generator -->|OasisAgentProfile| Caller
Architecture Integration:
- Selected pattern: In-place lexical translation of seven regions of an existing service. No structural change.
- Domain/feature boundaries: locale machinery vs. prompt assembly vs. LLM transport remain cleanly separated.
- Existing patterns preserved: prompt-as-f-string user-message construction; Chinese-keyed
_normalize_gendermapping;t(...)for log strings;get_language_instruction()postfix concatenation. - New components rationale: none — no new components.
- Steering compliance: matches the established
i18n-*-promptsfamily pattern (issues #2, #3, #4, #5) of in-place translation rather thant()keying for prompt bodies. Respects the steering note that "existing files mix English and Chinese in comments/docstrings — preserve both; do not translate one into the other unless asked." This ticket is the explicit ask for prompt strings, scoped to exclude comments/docstrings.
Technology Stack
| Layer | Choice / Version | Role in Feature | Notes |
|---|---|---|---|
| Backend / Services | Python 3.11+ | Hosts OasisProfileGenerator |
Existing — unchanged. |
| Backend / Services | openai SDK |
Issues the prompt; returns JSON | Existing — unchanged. |
| Backend / Services | backend/app/utils/locale.py |
Resolves Accept-Language → llmInstruction postfix |
Existing — unchanged. |
| Backend / Services | GraphitiAdapter |
Provides Graphiti graph search facts/nodes | Existing — unchanged. |
No new dependencies. No version changes.
File Structure Plan
Modified Files
backend/app/services/oasis_profile_generator.py— Replace the body of_get_system_promptbase_prompt; replace every Chinese string literal in_build_individual_persona_promptand_build_group_persona_promptwith English equivalents; replace the four section labels in_search_zep_for_entityand the six section labels in_build_entity_context; replace the three fallback persona templates; replace the two"无"/"无额外上下文"placeholders; replace the console-output literals in_print_generated_profileand the twoprint(...)banners ingenerate_profiles_from_entities. Preserve every other character of the file.
No new files. No deletions. No moves.
System Flows
The control-flow diagram in Architecture Pattern & Boundary Map covers the relevant flow; no additional diagrams are needed for this string-literal change.
Requirements Traceability
| Requirement | Summary | Components | Interfaces | Flows |
|---|---|---|---|---|
| 1.1–1.4 | English _get_system_prompt base_prompt; preserve get_language_instruction() site |
OasisProfileGenerator → _get_system_prompt |
None changed | Architecture diagram |
| 2.1–2.9 | English _build_individual_persona_prompt; preserve interpolations and JSON keys |
OasisProfileGenerator → _build_individual_persona_prompt |
f-string interpolation | n/a |
| 3.1–3.9 | English _build_group_persona_prompt; preserve fixed-value rules and interpolations |
OasisProfileGenerator → _build_group_persona_prompt |
f-string interpolation | n/a |
| 4.1–4.10 | English context-builder section labels | OasisProfileGenerator → _search_zep_for_entity, _build_entity_context |
Prompt-only | n/a |
| 5.1–5.3 | English fallback persona templates | OasisProfileGenerator → _generate_profile_with_llm, _try_fix_json |
None changed | n/a |
| 6.1–6.7 | English console-output formatting | OasisProfileGenerator → _print_generated_profile, generate_profiles_from_entities |
None changed | n/a |
| 7.1–7.4 | Locale switching preserved via get_language_instruction() |
OasisProfileGenerator + Locale | get_language_instruction() |
Architecture diagram |
| 8.1–8.6 | Public API and call-site stability; preserve _normalize_gender and country: "中国" data default |
OasisProfileGenerator (signatures, dataclass) | Public surface | n/a |
| 9.1–9.3 | Reasoning-model compatibility | OasisProfileGenerator → chat.completions.create + _try_fix_json |
OpenAI SDK | Architecture diagram |
| 10.1–10.7 | Out-of-scope surfaces untouched | OasisProfileGenerator (boundary commitment) | n/a | n/a |
Components and Interfaces
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|---|---|---|---|---|---|
| OasisProfileGenerator (modified) | Backend / Service | Render English profile-generation prompts and context labels; preserve all behaviour | 1.1–10.7 | OpenAI.chat.completions.create (P0), get_language_instruction (P0), GraphitiAdapter.graph.search (P1), _normalize_gender (P0) |
Service |
Backend / Service
OasisProfileGenerator (modified)
| Field | Detail |
|---|---|
| Intent | Translate prompt strings, context labels, fallback persona templates, and console output to English while preserving every functional contract. |
| Requirements | 1.1, 1.2, 1.3, 1.4, 2.1–2.9, 3.1–3.9, 4.1–4.10, 5.1–5.3, 6.1–6.7, 7.1–7.4, 8.1–8.6, 9.1–9.3, 10.1–10.7 |
Responsibilities & Constraints
- Owns: the English wording of the system prompt body, the two user-message templates, the context-builder section labels, the fallback persona templates, the no-attributes / no-context placeholders, and the console-output formatting.
- Domain boundary: prompt content and proximate console output only. Does not own locale resolution, transport, validation, or data values like the OASIS
countrydefault. - Invariants:
- All seven owned regions after translation MUST contain zero CJK characters.
- The translated user-message templates MUST present the same eight required JSON keys:
bio,persona,age,gender,mbti,country,profession,interested_topics. - The translated individual-persona template MUST require
gender ∈ {"male", "female"}andageto be a valid integer. - The translated group-persona template MUST require
age == 30andgender == "other". - The translated user-message templates MUST preserve the f-string interpolations:
{entity_name},{entity_type},{entity_summary},{attrs_str},{context_str},{get_language_instruction()}. - The translated context-builder labels MUST preserve the section structure (heading + bulleted body).
- The translated fallback persona templates MUST preserve the
entity_summary or templatepriority order. - The call to
get_language_instruction()MUST remain at its current locations. - The call to
self.client.chat.completions.create(...)MUST remain unchanged. - All public signatures, dataclass schema, and the private helper signatures MUST remain unchanged.
- All
logger.*calls (already keyed) and inline comments and docstrings in this file MUST remain unchanged (out of scope per #6 and #7). - The
_normalize_gendermapping table MUST remain unchanged. - The rule-based
country: "中国"default MUST remain unchanged.
Dependencies
- Inbound:
backend/app/api/simulation.py— production caller (P0). - Outbound:
backend/app/utils/locale.get_language_instruction— locale postfix (P0);backend/app/utils/locale.t— already-keyed log strings (P0);backend/app/services/graphiti_adapter.GraphitiAdapter.graph.search— facts/nodes retrieval (P1);OpenAI.chat.completions.create— JSON LLM transport (P0). - External: none.
Contracts: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
Service Interface
The public Python interface is unchanged. Representative signatures:
class OasisProfileGenerator:
def __init__(
self,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
model_name: Optional[str] = None,
zep_api_key: Optional[str] = None,
graph_id: Optional[str] = None,
) -> None: ...
def generate_profile_from_entity(
self,
entity: EntityNode,
user_id: int,
use_llm: bool = True,
) -> OasisAgentProfile: ...
def generate_profiles_from_entities(
self,
entities: List[EntityNode],
use_llm: bool = True,
progress_callback: Optional[callable] = None,
graph_id: Optional[str] = None,
parallel_count: int = 5,
realtime_output_path: Optional[str] = None,
output_platform: str = "reddit",
) -> List[OasisAgentProfile]: ...
def save_profiles(
self,
profiles: List[OasisAgentProfile],
file_path: str,
platform: str = "reddit",
) -> None: ...
- Preconditions: a configured LLM provider; a configured Graphiti / Neo4j graph; a non-empty
entitieslist when batching. - Postconditions:
OasisAgentProfileinstances with Englishbioandpersonaunder localeen, Chinese under localezh, and structurally equivalent across locales. - Invariants: see Responsibilities & Constraints.
Implementation Notes
- Integration: No new imports. No call-site changes. The diff is confined to seven regions of one file.
- Validation: After implementation, run a targeted regex check (
[一-鿿]) over the seven owned regions to confirm zero CJK; smoke-test_build_individual_persona_prompt(...)and_build_group_persona_prompt(...)with representative inputs to confirm interpolations still work; round-trip a single profile end-to-end under bothenandzhlocales. - Risks: English-base bias on Chinese-locale output (mitigated by the
llmInstructionpostfix already present in both system and user messages). Reduced LLM compliance withgender ∈ {male, female}for individual entities (mitigated by retaining the explicit English-token directive verbatim in the rules block).
Data Models
No data-model changes. The OasisAgentProfile dataclass is preserved verbatim.
Error Handling
Error Strategy
Error handling is unchanged from the existing implementation:
- LLM transport errors propagate from
chat.completions.create. - Truncation (
finish_reason == "length") is repaired by_fix_truncated_json. - Invalid JSON falls through to
_try_fix_json, then to a synthesized fallback profile (now with English persona text). - Per-entity exceptions are caught and a fallback
OasisAgentProfileis constructed with English fallback strings.
Error Categories and Responses
- User errors (4xx): not applicable at this layer; surfaced by the API handler.
- System errors (5xx): LLM/network failures propagate to the API handler, which converts them to JSON error responses.
- Business logic errors: malformed JSON is auto-repaired or replaced with a fallback profile.
Monitoring
Existing logger.* calls (keyed via t("log.profile_generator.*")) cover progress and warnings; no new monitoring is added.
Testing Strategy
Unit Tests
Given the project's intentionally minimal test harness (backend/scripts/test_profile_format.py only), the change is verified via:
- Static check: a one-shot regex assertion against the patched module ensuring zero CJK characters in the seven owned regions. This can be a quick
python -cinvocation during PR review. - Round-trip smoke test: instantiate
OasisProfileGenerator(), call_build_individual_persona_prompt(...)and_build_group_persona_prompt(...)with representative inputs, and verify all required interpolations appear in the output and no CJK characters remain. - Fallback rendering: simulate a JSON parse failure and verify the English fallback persona template is produced.
Integration Tests
- Step 2 profile generation under EN locale: run a small batched profile generation against a real Graphiti graph with locale
en. Verify produced profiles have Englishbio/personaand pass the existing OASIS profile-format check.
E2E/UI Tests
Not applicable — change does not affect frontend.
Performance/Load
Not applicable — token counts may differ slightly between Chinese and English renderings, but the LLM call has no max_tokens cap and remains within provider-acceptable limits.
Optional Sections
Security Considerations
Not applicable. Translation does not introduce new authentication, authorization, data-handling, or input-validation paths.
Performance & Scalability
Not applicable.
Migration Strategy
Not applicable. The change is a single in-place edit; no data migration. Rollback is git revert.
Supporting References
backend/app/services/oasis_profile_generator.py— current Chinese prompt content (the source of translation).backend/app/utils/locale.py— locale resolver.backend/app/api/simulation.py— call site..kiro/specs/i18n-ontology-generator-prompts/design.md— adjacent reference design for in-place prompt translation..ticket/25.md— ticket snapshot.