MicroFish/.kiro/specs/i18n-oasis-profile-generato.../design.md

618 lines
28 KiB
Markdown

# Design Document — i18n-oasis-profile-generator-prompts
## Overview
**Purpose**: Translate the Chinese prompt strings in
`backend/app/services/oasis_profile_generator.py` (the system prompt
inside `_get_system_prompt`, the individual-persona f-string template
inside `_build_individual_persona_prompt`, the group-persona f-string
template inside `_build_group_persona_prompt`, and the four
`attrs_str`/`context_str` fallback literals) to English while
preserving every functional contract — JSON output keys, the `gender`
English enum, the `age` integer rule, the `persona` no-newline rule,
all `{variable}` interpolations, and every `get_language_instruction()`
call site. The goal is to remove the Chinese-language base-prompt bias
that currently leaks Chinese structure and word choice into persona
output even when `Accept-Language: en`.
**Users**: MiroFish operators running the Step 2 environment-setup
pipeline under any locale; downstream Step 3 (CAMEL-OASIS subprocess)
which consumes the produced persona dictionaries.
**Impact**: Replaces approximately one one-line system prompt and two
large f-string templates with English equivalents inside one file. No
API change, no new dependencies, no new files. The two production
callers (`backend/app/services/simulation_manager.py:316` and
`backend/app/api/simulation.py:1413`) and the OASIS subprocess are
unaffected.
### Goals
- Zero CJK characters in any prompt string literal contributed by
`oasis_profile_generator.py` to the system prompt or the two
user-message bodies (including the `attrs_str`/`context_str`
fallback literals).
- English persona prose (`bio`, `persona`, `profession`,
`interested_topics`) under `Accept-Language: en`.
- Continued Chinese persona prose under `Accept-Language: zh`, of
equivalent quality to the pre-change behaviour.
- `gender` field stays exactly one of `"male"`/`"female"`/`"other"`
regardless of locale.
- No diff to public signatures, taxonomy lists, LLM-call parameters,
or call sites.
### Non-Goals
- Externalizing prompts to `/locales/*.json` (out of scope per ticket).
- Translating logger calls in this file (covered by issue #6).
- Translating module/class/method docstrings or inline comments
(covered by issue #7).
- Refactoring the `OasisAgentProfile` schema, `MBTI_TYPES` /
`COUNTRIES` lists, or the `INDIVIDUAL_ENTITY_TYPES` /
`GROUP_ENTITY_TYPES` taxonomies.
- Modifying the rule-based fallback (`_generate_profile_rule_based`)
including its Chinese country defaults.
- Modifying the resilience helpers `_fix_truncated_json` /
`_try_fix_json` and the Chinese persona fallback fragments inside
them (e.g. `f"{entity_name}是一个{entity_type}。"`).
- Modifying `backend/app/utils/locale.py`, the locale registries, or
any non-target file.
- Modifying `backend/scripts/test_profile_format.py`.
## Boundary Commitments
### This Spec Owns
- The English content of `_get_system_prompt`'s `base_prompt` literal.
- The English content of the f-string template body in
`_build_individual_persona_prompt`.
- The English content of the f-string template body in
`_build_group_persona_prompt`.
- The English replacements for the four `"无"` / `"无额外上下文"`
fallback literals (in both individual and group builders).
### Out of Boundary
- Locale resolution machinery (`backend/app/utils/locale.py`).
- Per-locale `llmInstruction` definitions
(`/locales/languages.json`).
- Reasoning-model output stripping inside `_fix_truncated_json` /
`_try_fix_json`.
- Logger calls and translation keys (`t("log.profile_generator.*")`)
inside `oasis_profile_generator.py` (issue #6, already merged).
- Module / class / method docstrings and inline comments inside
`oasis_profile_generator.py` (issue #7).
- Rule-based fallback (`_generate_profile_rule_based`) including its
Chinese country defaults `"中国"`.
- Chinese persona fragments inside the resilience helpers (e.g.
`f"{entity_name}是一个{entity_type}。"`) — those are runtime data
fallbacks, not LLM prompts.
- All callers of `OasisProfileGenerator`
(`simulation_manager.py`, `api/simulation.py`).
- Tests, scripts, and frontend code.
- The `print(...)` banner at line 945 (closely associated with logger
externalization #6).
### Allowed Dependencies
- Existing imports in the target file (no additions). Specifically:
`get_language_instruction`, `get_locale`, `set_locale`, `t` from
`..utils.locale` are already imported and remain unchanged.
- Existing LLM transport via `self.client.chat.completions.create`
(unchanged).
### Revalidation Triggers
The following changes elsewhere would invalidate this design:
- A change to the JSON contract emitted by the LLM (`bio`, `persona`,
`age`, `gender`, `mbti`, `country`, `profession`,
`interested_topics` keys).
- A change to the `OasisAgentProfile` dataclass field set or the
Reddit/Twitter serializers.
- A change to `get_language_instruction()` semantics or the per-locale
`llmInstruction` strings.
- A change to OASIS subprocess profile-format expectations (verified
via `backend/scripts/test_profile_format.py`).
## Architecture
### Existing Architecture Analysis
`OasisProfileGenerator` lives in `backend/app/services/`, follows the
in-process service pattern, and is invoked from a Flask handler inside
a background task. The relevant flow:
1. The Flask handler resolves the request locale via `Accept-Language`;
`set_locale()` is propagated into worker threads in
`generate_profiles_for_entities` (locale captured at line ~910 and
restored inside `generate_single_profile` at line ~914).
2. For each entity, `generate_profile_from_entity` decides between the
individual or group prompt builder via
`self._is_individual_entity(entity_type)`.
3. The chosen builder produces a user-message string; `_get_system_prompt`
produces a system-message string. Both are sent to the LLM via
`self.client.chat.completions.create(..., response_format={"type": "json_object"})`.
4. The LLM response is JSON-decoded; on failure, `_try_fix_json` and
`_fix_truncated_json` attempt recovery; on terminal failure,
`_generate_profile_rule_based` produces a rule-based persona.
5. The result is wrapped in an `OasisAgentProfile` dataclass and
serialized to Reddit JSON or Twitter CSV via `_save_reddit_json` /
`_save_twitter_csv`.
This design preserves all of the above. The change is purely lexical
inside three method bodies and four literal defaults.
### Architecture Pattern & Boundary Map
```mermaid
graph TB
Caller["simulation_manager.py / api/simulation.py"]
Generator["OasisProfileGenerator"]
Sys["_get_system_prompt"]
Ind["_build_individual_persona_prompt"]
Grp["_build_group_persona_prompt"]
Locale["locale.get_language_instruction"]
Client["openai.chat.completions.create"]
Parser["_try_fix_json / _fix_truncated_json"]
Fallback["_generate_profile_rule_based"]
Serializer["_save_reddit_json / _save_twitter_csv"]
Caller --> Generator
Generator --> Sys
Generator --> Ind
Generator --> Grp
Sys -. inline call .-> Locale
Ind -. inline call .-> Locale
Grp -. inline call .-> Locale
Sys --> Client
Ind --> Client
Grp --> Client
Client --> Parser
Parser --> Fallback
Generator --> Serializer
classDef change fill:#fff4ce,stroke:#a16207,color:#000
class Sys,Ind,Grp change
```
The three highlighted nodes (`_get_system_prompt`,
`_build_individual_persona_prompt`,
`_build_group_persona_prompt`) are the only nodes whose **string
contents** change. Every edge — including each call to
`get_language_instruction()` — remains intact.
**Architecture Integration**:
- **Selected pattern**: In-place lexical translation of the three
prompt builders (Option A from `gap-analysis.md` / `research.md`).
- **Domain/feature boundaries**: Same as today; `OasisProfileGenerator`
remains the sole owner of persona prompt content. `LocaleService`
remains the sole owner of locale-postfix steering.
- **Existing patterns preserved**: locale-thread propagation, retry
logic with temperature decay, JSON resilience helpers, rule-based
fallback, two-platform serialization.
- **New components rationale**: none — no new components.
- **Steering compliance**: aligns with `tech.md` ("LLM prompts use the
`get_language_instruction()` postfix mechanism, not key files") and
`structure.md` ("services own their own prompt strings").
### Technology Stack & Alignment
| Layer | Choice / Version | Role in Feature | Notes |
|-------|------------------|-----------------|-------|
| Backend / Services | Python ≥3.11 | Hosts the prompt builders | No version change |
| LLM transport | `openai` SDK against any OpenAI-compatible endpoint | Sends translated prompts | Unchanged |
| i18n | `backend/app/utils/locale.py` | Resolves locale and provides `get_language_instruction()` postfix | Unchanged |
| Storage | None | — | No persistence change |
No new dependencies. No version bumps. The locale infrastructure used
by the change is the same one used by every sibling i18n spec already
merged.
## File Structure Plan
### Modified Files
- `backend/app/services/oasis_profile_generator.py` — only file that
changes.
- `_get_system_prompt(self, is_individual: bool) -> str` — translate
`base_prompt` literal to English. Keep
`f"{base_prompt}\n\n{get_language_instruction()}"` shape.
- `_build_individual_persona_prompt(self, entity_name, entity_type,
entity_summary, entity_attributes, context) -> str` — translate
the f-string body to English; replace `"无"` and `"无额外上下文"`
defaults; keep every `{variable}` interpolation and the inline
`{get_language_instruction()}` call.
- `_build_group_persona_prompt(self, entity_name, entity_type,
entity_summary, entity_attributes, context) -> str` — same
treatment as the individual builder.
No other files in the repository are touched by this change.
## System Flows
The runtime flow does not change. The only way to demonstrate this is
to compare the call graph before and after — and the call graph is
already shown in the Architecture diagram above. Skipping a separate
sequence diagram.
## Requirements Traceability
| Requirement | Summary | Components | Interfaces | Flows |
|-------------|---------|------------|------------|-------|
| 1.1 | `base_prompt` contains zero Chinese characters | `_get_system_prompt` | `(self, is_individual: bool) -> str` | system-message construction |
| 1.2 | Preserve `f"{base_prompt}\n\n{get_language_instruction()}"` | `_get_system_prompt` | inline `get_language_instruction()` | system-message construction |
| 1.3 | Preserve role/intent semantics | `_get_system_prompt` | — | — |
| 1.4 | Preserve signature `_get_system_prompt(self, is_individual: bool) -> str` | `_get_system_prompt` | (signature) | — |
| 2.1 | Individual prompt body in English | `_build_individual_persona_prompt` | f-string body | user-message construction |
| 2.2 | Preserve `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}` | `_build_individual_persona_prompt` | f-string interpolations | — |
| 2.3 | Preserve JSON keys `bio, persona, age, gender, mbti, country, profession, interested_topics` | `_build_individual_persona_prompt` | prompt content | — |
| 2.4 | Preserve field-level constraints (lengths, MBTI, gender enum, age int) | `_build_individual_persona_prompt` | prompt content | — |
| 2.5 | Preserve trailing-rules block semantics | `_build_individual_persona_prompt` | prompt content | — |
| 2.6 | Preserve method signature | `_build_individual_persona_prompt` | (signature) | — |
| 2.7 | Translate `"无"` and `"无额外上下文"` defaults | `_build_individual_persona_prompt` | literal defaults | — |
| 2.8 | Zero Chinese in assembled body | `_build_individual_persona_prompt` | — | — |
| 3.1 | Group prompt body in English | `_build_group_persona_prompt` | f-string body | user-message construction |
| 3.2 | Preserve interpolations | `_build_group_persona_prompt` | f-string interpolations | — |
| 3.3 | Preserve JSON keys | `_build_group_persona_prompt` | prompt content | — |
| 3.4 | Preserve field-level constraints (age=30, gender="other", etc.) | `_build_group_persona_prompt` | prompt content | — |
| 3.5 | Preserve trailing-rules semantics | `_build_group_persona_prompt` | prompt content | — |
| 3.6 | Preserve method signature | `_build_group_persona_prompt` | (signature) | — |
| 3.7 | Translate `"无"` / `"无额外上下文"` defaults | `_build_group_persona_prompt` | literal defaults | — |
| 3.8 | Zero Chinese in assembled body | `_build_group_persona_prompt` | — | — |
| 4.1 | Preserve every `get_language_instruction()` call site | all three builders | inline call | system + user message construction |
| 4.2 | Preserve locale-thread plumbing | `generate_profiles_for_entities` (untouched) | `set_locale(current_locale)` | worker thread spawn |
| 4.3 | Locale=zh produces Chinese personas | runtime behaviour | locale postfix | LLM call |
| 4.4 | Locale=en produces English personas | runtime behaviour | locale postfix | LLM call |
| 4.5 | `gender` ∈ {male, female, other} regardless of locale | prompt content | — | — |
| 4.6 | Don't alter locale.py / locales/ | (none) | — | — |
| 5.1 | Preserve `OasisAgentProfile` dataclass | (untouched) | dataclass | — |
| 5.2 | Preserve method signatures | (untouched) | signatures | — |
| 5.3 | Preserve LLM invocation parameters | (untouched) | `chat.completions.create(...)` | — |
| 5.4 | Preserve `MBTI_TYPES`, `COUNTRIES`, taxonomy lists | (untouched) | class constants | — |
| 6.1 | Preserve `_fix_truncated_json` / `_try_fix_json` | (untouched) | helpers | — |
| 6.2 | Reasoning-model recovery still works | (untouched) | resilience helpers | — |
| 6.3 | No new prompt-language-dependent pre-processing | (none added) | — | — |
| 6.4 | Round-trip yields non-empty `bio` and `persona` | runtime behaviour | LLM call | — |
| 7.1 | `pytest test_profile_format.py` passes | runtime behaviour | serializers | — |
| 7.2 | Reddit format schema preserved | (untouched) | `to_reddit_format` | — |
| 7.3 | Twitter format schema preserved | (untouched) | `to_twitter_format` | — |
| 7.4 | `gender` enum preserved | prompt content | — | — |
| 8.1 | No logger edits | (untouched) | — | — |
| 8.2 | No docstring/comment edits | (untouched) | — | — |
| 8.3 | No rule-based fallback edits | (untouched) | — | — |
| 8.4 | No edits outside the target file | (none) | — | — |
| 8.5 | No new dependencies | (none) | `pyproject.toml` / `uv.lock` untouched | — |
| 8.6 | No edits to `test_profile_format.py` | (untouched) | — | — |
## Components and Interfaces
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|-----------|--------------|--------|--------------|--------------------------|-----------|
| `_get_system_prompt` | backend service / prompt builder | Produce the system message (English base + locale postfix) | 1.1, 1.2, 1.3, 1.4, 4.1, 4.5 | `get_language_instruction` (P0) | Service |
| `_build_individual_persona_prompt` | backend service / prompt builder | Produce the individual-entity user message in English | 2.x, 4.1, 4.5 | `get_language_instruction` (P0); JSON encoder (P1) | Service |
| `_build_group_persona_prompt` | backend service / prompt builder | Produce the group/institution user message in English | 3.x, 4.1, 4.5 | `get_language_instruction` (P0); JSON encoder (P1) | Service |
Only the three prompt-builder methods change. They all live inside the
single class `OasisProfileGenerator` in
`backend/app/services/oasis_profile_generator.py`. No new components.
### Backend / Services
#### `_get_system_prompt`
| Field | Detail |
|-------|--------|
| Intent | Build the `system` message: a one-line English directive that frames the model as a social-media persona expert + the per-locale postfix. |
| Requirements | 1.1, 1.2, 1.3, 1.4, 4.1, 4.5 |
**Responsibilities & Constraints**
- Construct and return a single string of the form
`f"{base_prompt}\n\n{get_language_instruction()}"`.
- Preserve the signature
`_get_system_prompt(self, is_individual: bool) -> str`.
- The English `base_prompt` MUST convey: (a) expert role in
social-media persona generation; (b) intent to produce detailed,
realistic personas for opinion-simulation, faithful to existing
reality; (c) the JSON-output requirement and the no-unescaped-newline
rule.
- The English `base_prompt` MUST NOT contain any CJK codepoint.
**Dependencies**
- Outbound: `get_language_instruction()` from
`backend/app/utils/locale.py` (P0, criticality high — the entire
locale-steering chain depends on it).
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
##### Service Interface
```python
def _get_system_prompt(self, is_individual: bool) -> str:
"""Return the LLM system message: English base + locale postfix."""
...
```
- Preconditions: none.
- Postconditions: returns a non-empty string ending with the locale
postfix produced by `get_language_instruction()`.
- Invariants: contains zero CJK codepoints.
**Implementation Notes**
- Integration: called only from `_call_llm_with_retry` (line ~523)
with `is_individual` decided upstream. The `is_individual` flag is
reserved for future divergence between system prompts; the current
implementation does not branch on it, and this design preserves
that.
- Validation: a CJK regex audit on the method body after the edit must
match zero codepoints.
- Risks: dropping one of the three role/intent pieces (expert framing,
JSON output requirement, no-newline rule). Implementation task lists
all three explicitly.
#### `_build_individual_persona_prompt`
| Field | Detail |
|-------|--------|
| Intent | Build the user-message string for an individual entity in English. Preserve every `{variable}` interpolation, the inline `{get_language_instruction()}` call, every JSON-output key, and every locale-independent constraint. |
| Requirements | 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 4.1, 4.5 |
**Responsibilities & Constraints**
- Preserve signature
`_build_individual_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`.
- Preserve `attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else <fallback>` with `<fallback>` translated to English (`"None"`).
- Preserve `context_str = context[:3000] if context else <fallback>` with `<fallback>` translated to English (`"No additional context"`).
- Translate the f-string body to English with these structural sections (mirror the original Chinese intent):
1. **Lead sentence** — instruct the model to generate a detailed
social-media persona for the entity, faithful to existing reality.
2. **Entity context block** — labelled lines for `entity_name`,
`entity_type`, `entity_summary`, `entity_attributes` (English
labels; values via `{...}` interpolation).
3. **Context information block**`Context information:` heading
followed by `{context_str}`.
4. **JSON-fields enumeration** — `Generate JSON with the following
fields:` followed by the eight numbered items (`bio`, `persona`,
`age`, `gender`, `mbti`, `country`, `profession`,
`interested_topics`) with English descriptions matching
Requirement 2.4.
5. **Trailing rules block**`Important:` followed by:
- `All field values must be strings or numbers; do not use newlines.`
- `persona must be a single coherent block of text.`
- `{get_language_instruction()} (gender field MUST use English values: "male" or "female")`
- `Content must remain consistent with the entity information.`
- `age must be a valid integer; gender must be exactly "male" or "female".`
- Preserve every `{variable}` interpolation present in the original by
name: `{entity_name}`, `{entity_type}`, `{entity_summary}`,
`{attrs_str}`, `{context_str}`, `{get_language_instruction()}`.
- The translated body MUST NOT contain any CJK codepoint.
**Dependencies**
- Outbound: `json.dumps(..., ensure_ascii=False)` (P1, formatting the
attributes dict) — unchanged.
- Outbound: `get_language_instruction()` (P0) — interpolated inline.
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
##### Service Interface
```python
def _build_individual_persona_prompt(
self,
entity_name: str,
entity_type: str,
entity_summary: str,
entity_attributes: Dict[str, Any],
context: str,
) -> str:
"""Return the LLM user message for an individual-entity persona."""
...
```
- Preconditions: `entity_name`, `entity_type`, `entity_summary`
are strings (may be empty); `entity_attributes` is a dict (may be
empty); `context` is a string (may be empty).
- Postconditions: returns a non-empty English string with all six
interpolations resolved.
- Invariants: contains zero CJK codepoints; preserves every
`{variable}` interpolation by name.
**Implementation Notes**
- Integration: called from `_call_llm_with_retry` (line ~506) when
`is_individual` is true.
- Validation: post-edit CJK regex audit; interpolation-set audit
(verify the multiset of `{...}` tokens equals the pre-change set);
smoke import + `pytest backend/scripts/test_profile_format.py`.
- Risks: dropping the `gender` enum lock when translating; dropping
the inline `{get_language_instruction()}` call. The implementation
task list calls these out as discrete checks.
#### `_build_group_persona_prompt`
| Field | Detail |
|-------|--------|
| Intent | Build the user-message string for a group/institution entity in English. Preserve every `{variable}` interpolation, the inline `{get_language_instruction()}` call, every JSON-output key, and every locale-independent constraint (notably `age == 30` and `gender == "other"`). |
| Requirements | 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.1, 4.5 |
**Responsibilities & Constraints**
- Preserve signature
`_build_group_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`.
- Preserve the `attrs_str` and `context_str` fallback handling with
English defaults (`"None"`, `"No additional context"`), identical to
the individual builder.
- Translate the f-string body to English with these structural
sections (mirror the original Chinese intent for institutions):
1. **Lead sentence** — instruct the model to generate a detailed
social-media account profile for the institution/group, faithful
to existing reality.
2. **Entity context block** — labelled lines for `entity_name`,
`entity_type`, `entity_summary`, `entity_attributes`.
3. **Context information block**`Context information:` heading
followed by `{context_str}`.
4. **JSON-fields enumeration** — `Generate JSON with the following
fields:` followed by the eight numbered items as defined in
Requirement 3.4: `bio` (~200 chars, official voice), `persona`
(~2000 chars, single coherent text covering institutional
basics, account positioning, voice, publishing pattern, stance,
special notes, institutional memory), `age` (= integer 30,
institutional virtual age), `gender` (= literal `"other"`),
`mbti` (e.g. ISTJ for strict/conservative), `country` (country
name string), `profession` (institutional function),
`interested_topics` (array).
5. **Trailing rules block**`Important:` followed by:
- `All field values must be strings or numbers; null is not allowed.`
- `persona must be a single coherent block of text without newlines.`
- `{get_language_instruction()} (gender field MUST use English value "other")`
- `age must be the integer 30; gender must be the string "other".`
- `Account voice must match its identity positioning.`
- Preserve every `{variable}` interpolation present in the original.
- The translated body MUST NOT contain any CJK codepoint.
**Dependencies**
- Outbound: same as individual builder.
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
##### Service Interface
```python
def _build_group_persona_prompt(
self,
entity_name: str,
entity_type: str,
entity_summary: str,
entity_attributes: Dict[str, Any],
context: str,
) -> str:
"""Return the LLM user message for a group/institution persona."""
...
```
- Preconditions / Postconditions / Invariants: same shape as the
individual builder.
**Implementation Notes**
- Integration: called from `_call_llm_with_retry` (line ~510) when
`is_individual` is false.
- Validation: same checks as the individual builder, plus an explicit
audit that the institutional sentinels (`age == 30`,
`gender == "other"`) appear in English in the trailing-rules block.
- Risks: same as the individual builder; additionally, the `country`
language hint (`"使用中文,如\"中国\""`) is intentionally dropped
during translation — the validation task verifies that under
`Accept-Language: en` a sample run produces an English country
name.
## Data Models
No data-model changes. The persona JSON schema, the
`OasisAgentProfile` dataclass, the Reddit/Twitter serializers, and the
OASIS subprocess profile-format expectations are all preserved
verbatim.
## Error Handling
### Error Strategy
No new error paths. The existing flow is preserved:
- `json.JSONDecodeError``_try_fix_json``_fix_truncated_json`
partial-extract via regex → `_generate_profile_rule_based`.
- LLM call failure → retry with temperature decay (`0.7 - attempt * 0.1`)
up to `max_attempts = 3`.
- Terminal failure → rule-based fallback persona.
- Per-entity worker exception → fallback `OasisAgentProfile` produced
inside `generate_single_profile` at line ~932.
The translated prompts do not introduce new failure modes. Translating
prompt language has no semantic effect on JSON parsing or on the
`response_format={"type": "json_object"}` constraint.
### Error Categories and Responses
- **User errors**: not applicable (this is an internal pipeline).
- **System errors**: LLM transport errors are retried; logger emits
`t("log.profile_generator.m011")` etc. Logger keys already exist in
`locales/{en,zh}.json`.
- **Business-logic errors**: `gender` not in the English enum, `age`
not an integer — the prompt explicitly mandates them; the validator
inside `_try_fix_json` does not enforce these but the OASIS
subprocess does. No change in either direction.
### Monitoring
Existing logger calls are unchanged. Logger keys already i18n-keyed via
`t("log.profile_generator.*")`.
## Testing Strategy
### Unit Tests
- **(Existing)**
`backend/scripts/test_profile_format.py::test_profile_formats`
must continue to pass without modification.
- **(Manual)** Smoke import:
`cd backend && uv run python -c "from app.services.oasis_profile_generator import OasisProfileGenerator"`
— confirms no syntax errors after editing f-strings.
### Integration Tests
- **(Manual)** Run the prompt builders directly under each locale:
- `set_locale("en")`
`OasisProfileGenerator()._build_individual_persona_prompt("Alice", "Student", "summary", {"k": "v"}, "ctx")`
— assert no CJK codepoints in the output, assert the English
locale postfix appears via `get_language_instruction()` (which is
`"Please respond in English."`).
- `set_locale("zh")` → same call → assert the locale postfix is
`"请使用中文回答。"`.
- These do not require an LLM call; they only verify the rendered
prompt string.
### E2E Tests
- **(Manual, optional, preferred but skippable when no LLM key
present)** Run `npm run dev` and trigger Step 2 profile generation
from the UI under English locale on a small entity set; spot-check
that bios and persona prose are in English. Skip if a live LLM key
is unavailable in CI; sibling specs #2/#4/#5 used the same manual
E2E approach.
### Performance / Load
Not applicable. Prompt translation has no measurable performance
impact.
## Optional Sections
### Security Considerations
No security implications. No new external surfaces; no new data
retention; no change to authentication or authorization.
### Migration Strategy
No migration required. The change is forward-compatible: a deployment
that picks up the translated prompts continues to serve users on the
`zh` locale via the unchanged
`get_language_instruction()` postfix mechanism.
## Supporting References
- `gap-analysis.md` — option evaluation and effort/risk sizing.
- `research.md` — discovery findings, design decisions (in particular
the "drop the country language hint" decision), and risk register.
- `requirements.md` — EARS requirements with numeric IDs.
- Sibling specs `i18n-ontology-generator-prompts`,
`i18n-simulation-config-generator-prompts`,
`i18n-report-agent-prompts` — same translation pattern, already
merged.