Merge pull request #32 from salestech-group/feat/25-i18n-oasis-profile-generator-prompts
fix(i18n): translate oasis profile generator prompts to english
This commit is contained in:
commit
54d7fb7828
|
|
@ -2,616 +2,316 @@
|
|||
|
||||
## Overview
|
||||
|
||||
**Purpose**: Translate the Chinese prompt strings in
|
||||
`backend/app/services/oasis_profile_generator.py` (the system prompt
|
||||
inside `_get_system_prompt`, the individual-persona f-string template
|
||||
inside `_build_individual_persona_prompt`, the group-persona f-string
|
||||
template inside `_build_group_persona_prompt`, and the four
|
||||
`attrs_str`/`context_str` fallback literals) to English while
|
||||
preserving every functional contract — JSON output keys, the `gender`
|
||||
English enum, the `age` integer rule, the `persona` no-newline rule,
|
||||
all `{variable}` interpolations, and every `get_language_instruction()`
|
||||
call site. The goal is to remove the Chinese-language base-prompt bias
|
||||
that currently leaks Chinese structure and word choice into persona
|
||||
output even when `Accept-Language: en`.
|
||||
**Purpose**: Translate the Chinese prompt strings, context-builder section labels, fallback persona templates, and console-output formatting in `backend/app/services/oasis_profile_generator.py` to English while preserving every functional contract — LLM JSON output schema, the `_normalize_gender` mapping that must continue to accept Chinese gender values, the `_generate_profile_rule_based` default `country: "中国"` data value, all f-string interpolations, and the `get_language_instruction()` locale-postfix mechanism. The goal is to remove the Chinese-language base-prompt and context-label bias that currently leaks Chinese structure and word choice into OASIS profile output even when `Accept-Language: en`.
|
||||
|
||||
**Users**: MiroFish operators running the Step 2 environment-setup
|
||||
pipeline under any locale; downstream Step 3 (CAMEL-OASIS subprocess)
|
||||
which consumes the produced persona dictionaries.
|
||||
**Users**: MiroFish operators running the Step 2 OASIS profile generation under any locale; downstream OASIS / CAMEL-OASIS consumers of the agent JSON / CSV produced by `OasisProfileGenerator`.
|
||||
|
||||
**Impact**: Replaces approximately one one-line system prompt and two
|
||||
large f-string templates with English equivalents inside one file. No
|
||||
API change, no new dependencies, no new files. The two production
|
||||
callers (`backend/app/services/simulation_manager.py:316` and
|
||||
`backend/app/api/simulation.py:1413`) and the OASIS subprocess are
|
||||
unaffected.
|
||||
**Impact**: Replaces approximately one base-prompt string, two large user-message templates, four context-builder section labels, three fallback persona templates, and ten console-output strings with English equivalents inside one file. No API surface change. No new dependencies. No new files. Callers (`backend/app/api/simulation.py`, etc.) and OASIS consumers are unaffected.
|
||||
|
||||
### Goals
|
||||
|
||||
- Zero CJK characters in any prompt string literal contributed by
|
||||
`oasis_profile_generator.py` to the system prompt or the two
|
||||
user-message bodies (including the `attrs_str`/`context_str`
|
||||
fallback literals).
|
||||
- English persona prose (`bio`, `persona`, `profession`,
|
||||
`interested_topics`) under `Accept-Language: en`.
|
||||
- Continued Chinese persona prose under `Accept-Language: zh`, of
|
||||
equivalent quality to the pre-change behaviour.
|
||||
- `gender` field stays exactly one of `"male"`/`"female"`/`"other"`
|
||||
regardless of locale.
|
||||
- No diff to public signatures, taxonomy lists, LLM-call parameters,
|
||||
or call sites.
|
||||
- Zero CJK characters in any prompt string literal contributed by `oasis_profile_generator.py` to the system prompt, the user message, or the context block.
|
||||
- Zero CJK characters in any console-output literal in `_print_generated_profile` and the surrounding banners.
|
||||
- English `bio` / `persona` output under `Accept-Language: en`.
|
||||
- Continued Chinese `bio` / `persona` output under `Accept-Language: zh`, of equivalent quality to the pre-change behaviour.
|
||||
- No diff to public signatures, dataclass schema, LLM-call parameters, or call sites.
|
||||
|
||||
### Non-Goals
|
||||
|
||||
- Externalizing prompts to `/locales/*.json` (out of scope per ticket).
|
||||
- Externalizing prompts to `/locales/*.json` (out of scope per ticket and consistent with `i18n-ontology-generator-prompts`).
|
||||
- Translating logger calls in this file (covered by issue #6).
|
||||
- Translating module/class/method docstrings or inline comments
|
||||
(covered by issue #7).
|
||||
- Refactoring the `OasisAgentProfile` schema, `MBTI_TYPES` /
|
||||
`COUNTRIES` lists, or the `INDIVIDUAL_ENTITY_TYPES` /
|
||||
`GROUP_ENTITY_TYPES` taxonomies.
|
||||
- Modifying the rule-based fallback (`_generate_profile_rule_based`)
|
||||
including its Chinese country defaults.
|
||||
- Modifying the resilience helpers `_fix_truncated_json` /
|
||||
`_try_fix_json` and the Chinese persona fallback fragments inside
|
||||
them (e.g. `f"{entity_name}是一个{entity_type}。"`).
|
||||
- Modifying `backend/app/utils/locale.py`, the locale registries, or
|
||||
any non-target file.
|
||||
- Modifying `backend/scripts/test_profile_format.py`.
|
||||
- Translating module/class/method docstrings or inline comments in this file (covered by issue #7).
|
||||
- Refactoring the OASIS profile JSON schema, the OASIS adapter, or the simulation flow.
|
||||
- Modifying the `_normalize_gender` mapping table (it must keep accepting Chinese gender keys).
|
||||
- Modifying the `_generate_profile_rule_based` default `"中国"` country value (data, not prompt).
|
||||
- Modifying the `ValueError("LLM_API_KEY 未配置")` raise (covered by issue #6).
|
||||
- Modifying `backend/app/utils/locale.py`, the locale registries, or any non-target file.
|
||||
|
||||
## Boundary Commitments
|
||||
|
||||
### This Spec Owns
|
||||
|
||||
- The English content of `_get_system_prompt`'s `base_prompt` literal.
|
||||
- The English content of the f-string template body in
|
||||
`_build_individual_persona_prompt`.
|
||||
- The English content of the f-string template body in
|
||||
`_build_group_persona_prompt`.
|
||||
- The English replacements for the four `"无"` / `"无额外上下文"`
|
||||
fallback literals (in both individual and group builders).
|
||||
- The English content of the `base_prompt` string in `OasisProfileGenerator._get_system_prompt` (line 664).
|
||||
- The English content of every string literal in `OasisProfileGenerator._build_individual_persona_prompt` (lines 677–714).
|
||||
- The English content of every string literal in `OasisProfileGenerator._build_group_persona_prompt` (lines 726–762).
|
||||
- The English content of the section-label literals embedded in `OasisProfileGenerator._search_zep_for_entity` (lines 384, 390, 392) and `OasisProfileGenerator._build_entity_context` (lines 422, 438, 440, 443, 463, 472, 475).
|
||||
- The English content of the fallback persona templates in `OasisProfileGenerator._generate_profile_with_llm` (line 547) and `OasisProfileGenerator._try_fix_json` (lines 644, 659).
|
||||
- The English content of the no-attributes / no-context placeholder literals (`"无"`, `"无额外上下文"`) at lines 677, 678, 726, 727.
|
||||
- The English content of every string literal in `OasisProfileGenerator._print_generated_profile` (lines 1011, 1017, 1019, 1022, 1025, 1026, 1027, 1028) and the surrounding banners in `OasisProfileGenerator.generate_profiles_from_entities` (lines 945, 1001).
|
||||
|
||||
### Out of Boundary
|
||||
|
||||
- Locale resolution machinery (`backend/app/utils/locale.py`).
|
||||
- Per-locale `llmInstruction` definitions
|
||||
(`/locales/languages.json`).
|
||||
- Reasoning-model output stripping inside `_fix_truncated_json` /
|
||||
`_try_fix_json`.
|
||||
- Logger calls and translation keys (`t("log.profile_generator.*")`)
|
||||
inside `oasis_profile_generator.py` (issue #6, already merged).
|
||||
- Module / class / method docstrings and inline comments inside
|
||||
`oasis_profile_generator.py` (issue #7).
|
||||
- Rule-based fallback (`_generate_profile_rule_based`) including its
|
||||
Chinese country defaults `"中国"`.
|
||||
- Chinese persona fragments inside the resilience helpers (e.g.
|
||||
`f"{entity_name}是一个{entity_type}。"`) — those are runtime data
|
||||
fallbacks, not LLM prompts.
|
||||
- All callers of `OasisProfileGenerator`
|
||||
(`simulation_manager.py`, `api/simulation.py`).
|
||||
- Per-locale `llmInstruction` definitions (`/locales/languages.json`).
|
||||
- Reasoning-model output stripping (`backend/app/utils/llm_client.py`).
|
||||
- All `logger.*` calls (already keyed via `t("log.profile_generator.*")`; covered by issue #6).
|
||||
- Module / class / method docstrings and inline comments (covered by issue #7), including the inline comments at lines 65, 93, 641, 804–807, 816–819.
|
||||
- The `_normalize_gender` mapping table (lines 1123–1132) — must continue to accept Chinese gender keys from upstream.
|
||||
- The hard-coded `country: "中国"` default in `_generate_profile_rule_based` (lines 807, 819) — this is a data value, not a prompt.
|
||||
- The `ValueError("LLM_API_KEY 未配置")` raise (line 194) — covered by issue #6.
|
||||
- All callers of `OasisProfileGenerator`, including `backend/app/api/simulation.py`.
|
||||
- Tests, scripts, and frontend code.
|
||||
- The `print(...)` banner at line 945 (closely associated with logger
|
||||
externalization #6).
|
||||
|
||||
### Allowed Dependencies
|
||||
|
||||
- Existing imports in the target file (no additions). Specifically:
|
||||
`get_language_instruction`, `get_locale`, `set_locale`, `t` from
|
||||
`..utils.locale` are already imported and remain unchanged.
|
||||
- Existing LLM transport via `self.client.chat.completions.create`
|
||||
(unchanged).
|
||||
- Existing `get_language_instruction`, `get_locale`, `set_locale`, `t` imports from `..utils.locale` (already imported; unchanged).
|
||||
- Existing `OpenAI` SDK invocation (unchanged).
|
||||
- No new imports.
|
||||
|
||||
### Revalidation Triggers
|
||||
|
||||
The following changes elsewhere would invalidate this design:
|
||||
The following changes elsewhere would invalidate this design and require revisiting the prompt:
|
||||
|
||||
- A change to the JSON contract emitted by the LLM (`bio`, `persona`,
|
||||
`age`, `gender`, `mbti`, `country`, `profession`,
|
||||
`interested_topics` keys).
|
||||
- A change to the `OasisAgentProfile` dataclass field set or the
|
||||
Reddit/Twitter serializers.
|
||||
- A change to `get_language_instruction()` semantics or the per-locale
|
||||
`llmInstruction` strings.
|
||||
- A change to OASIS subprocess profile-format expectations (verified
|
||||
via `backend/scripts/test_profile_format.py`).
|
||||
- A change to the JSON contract emitted by the LLM (`bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`).
|
||||
- A change to `OasisAgentProfile` field semantics.
|
||||
- A change to `get_language_instruction()` semantics or the per-locale `llmInstruction` strings.
|
||||
- A change to OASIS / CAMEL-OASIS profile field expectations (e.g. if `gender` accepts more than `male` / `female` / `other`).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Existing Architecture Analysis
|
||||
|
||||
`OasisProfileGenerator` lives in `backend/app/services/`, follows the
|
||||
in-process service pattern, and is invoked from a Flask handler inside
|
||||
a background task. The relevant flow:
|
||||
`OasisProfileGenerator` lives in `backend/app/services/`, follows the in-process service pattern with bounded thread-pool fan-out for batched profile generation, and is invoked from `backend/app/api/simulation.py` inside a background `Task`. It depends on:
|
||||
|
||||
1. The Flask handler resolves the request locale via `Accept-Language`;
|
||||
`set_locale()` is propagated into worker threads in
|
||||
`generate_profiles_for_entities` (locale captured at line ~910 and
|
||||
restored inside `generate_single_profile` at line ~914).
|
||||
2. For each entity, `generate_profile_from_entity` decides between the
|
||||
individual or group prompt builder via
|
||||
`self._is_individual_entity(entity_type)`.
|
||||
3. The chosen builder produces a user-message string; `_get_system_prompt`
|
||||
produces a system-message string. Both are sent to the LLM via
|
||||
`self.client.chat.completions.create(..., response_format={"type": "json_object"})`.
|
||||
4. The LLM response is JSON-decoded; on failure, `_try_fix_json` and
|
||||
`_fix_truncated_json` attempt recovery; on terminal failure,
|
||||
`_generate_profile_rule_based` produces a rule-based persona.
|
||||
5. The result is wrapped in an `OasisAgentProfile` dataclass and
|
||||
serialized to Reddit JSON or Twitter CSV via `_save_reddit_json` /
|
||||
`_save_twitter_csv`.
|
||||
- `OpenAI` SDK for the LLM call.
|
||||
- `GraphitiAdapter` (legacy `zep_client` field name) for the Zep / Graphiti graph search.
|
||||
- `get_language_instruction()` for locale steering.
|
||||
- `t()` for already-keyed log strings.
|
||||
|
||||
This design preserves all of the above. The change is purely lexical
|
||||
inside three method bodies and four literal defaults.
|
||||
The relevant flow is:
|
||||
|
||||
1. The Flask handler resolves the request locale via `Accept-Language`; the locale is propagated to thread-pool workers via the `set_locale(current_locale)` capture in `generate_profiles_from_entities` (line 914).
|
||||
2. For each entity, `_build_entity_context()` is called: it composes a context block by concatenating headed sub-sections (entity attributes, related facts/edges, related node summaries, Graphiti-search facts, Graphiti-search nodes). Some of these labels are currently in Chinese.
|
||||
3. The context string is interpolated into the user-message template by either `_build_individual_persona_prompt` or `_build_group_persona_prompt`. Both templates are currently in Chinese, with English `gender` token directives interleaved.
|
||||
4. The system prompt is built by `_get_system_prompt`: a Chinese base prompt followed by the locale-appropriate `get_language_instruction()`.
|
||||
5. The two messages are sent to `chat.completions.create` with `response_format={"type": "json_object"}`. The result flows through `json.loads` → `_try_fix_json` → `_fix_truncated_json` fallback chain. Synthesized fallback personas use the Chinese template `f"{entity_name}是一个{entity_type}。"` if the LLM result is unusable.
|
||||
6. After per-profile completion, `_print_generated_profile` writes a Chinese-headed banner to stdout, and `generate_profiles_from_entities` writes Chinese batch banners.
|
||||
|
||||
This design preserves all of the above structurally. The change is purely lexical inside the seven regions of one file.
|
||||
|
||||
### Architecture Pattern & Boundary Map
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
Caller["simulation_manager.py / api/simulation.py"]
|
||||
Generator["OasisProfileGenerator"]
|
||||
Sys["_get_system_prompt"]
|
||||
Ind["_build_individual_persona_prompt"]
|
||||
Grp["_build_group_persona_prompt"]
|
||||
Locale["locale.get_language_instruction"]
|
||||
Client["openai.chat.completions.create"]
|
||||
Parser["_try_fix_json / _fix_truncated_json"]
|
||||
Fallback["_generate_profile_rule_based"]
|
||||
Serializer["_save_reddit_json / _save_twitter_csv"]
|
||||
Caller[simulation.py handler]
|
||||
Generator[OasisProfileGenerator]
|
||||
Locale[locale.get_language_instruction]
|
||||
Graph[GraphitiAdapter graph.search]
|
||||
LLM[OpenAI chat.completions]
|
||||
|
||||
Caller --> Generator
|
||||
Generator --> Sys
|
||||
Generator --> Ind
|
||||
Generator --> Grp
|
||||
Sys -. inline call .-> Locale
|
||||
Ind -. inline call .-> Locale
|
||||
Grp -. inline call .-> Locale
|
||||
Sys --> Client
|
||||
Ind --> Client
|
||||
Grp --> Client
|
||||
Client --> Parser
|
||||
Parser --> Fallback
|
||||
Generator --> Serializer
|
||||
|
||||
classDef change fill:#fff4ce,stroke:#a16207,color:#000
|
||||
class Sys,Ind,Grp change
|
||||
Caller -->|generate_profiles_from_entities| Generator
|
||||
Generator -->|build context block| Generator
|
||||
Generator -->|read locale postfix| Locale
|
||||
Generator -->|search facts/nodes| Graph
|
||||
Generator -->|JSON request| LLM
|
||||
LLM -->|raw JSON| Generator
|
||||
Generator -->|OasisAgentProfile| Caller
|
||||
```
|
||||
|
||||
The three highlighted nodes (`_get_system_prompt`,
|
||||
`_build_individual_persona_prompt`,
|
||||
`_build_group_persona_prompt`) are the only nodes whose **string
|
||||
contents** change. Every edge — including each call to
|
||||
`get_language_instruction()` — remains intact.
|
||||
|
||||
**Architecture Integration**:
|
||||
|
||||
- **Selected pattern**: In-place lexical translation of the three
|
||||
prompt builders (Option A from `gap-analysis.md` / `research.md`).
|
||||
- **Domain/feature boundaries**: Same as today; `OasisProfileGenerator`
|
||||
remains the sole owner of persona prompt content. `LocaleService`
|
||||
remains the sole owner of locale-postfix steering.
|
||||
- **Existing patterns preserved**: locale-thread propagation, retry
|
||||
logic with temperature decay, JSON resilience helpers, rule-based
|
||||
fallback, two-platform serialization.
|
||||
- **New components rationale**: none — no new components.
|
||||
- **Steering compliance**: aligns with `tech.md` ("LLM prompts use the
|
||||
`get_language_instruction()` postfix mechanism, not key files") and
|
||||
`structure.md` ("services own their own prompt strings").
|
||||
- Selected pattern: **In-place lexical translation** of seven regions of an existing service. No structural change.
|
||||
- Domain/feature boundaries: locale machinery vs. prompt assembly vs. LLM transport remain cleanly separated.
|
||||
- Existing patterns preserved: prompt-as-f-string user-message construction; Chinese-keyed `_normalize_gender` mapping; `t(...)` for log strings; `get_language_instruction()` postfix concatenation.
|
||||
- New components rationale: none — no new components.
|
||||
- Steering compliance: matches the established `i18n-*-prompts` family pattern (issues #2, #3, #4, #5) of in-place translation rather than `t()` keying for prompt bodies. Respects the steering note that "existing files mix English and Chinese in comments/docstrings — preserve both; do not translate one into the other unless asked." This ticket is the explicit ask for prompt strings, scoped to exclude comments/docstrings.
|
||||
|
||||
### Technology Stack & Alignment
|
||||
### Technology Stack
|
||||
|
||||
| Layer | Choice / Version | Role in Feature | Notes |
|
||||
|-------|------------------|-----------------|-------|
|
||||
| Backend / Services | Python ≥3.11 | Hosts the prompt builders | No version change |
|
||||
| LLM transport | `openai` SDK against any OpenAI-compatible endpoint | Sends translated prompts | Unchanged |
|
||||
| i18n | `backend/app/utils/locale.py` | Resolves locale and provides `get_language_instruction()` postfix | Unchanged |
|
||||
| Storage | None | — | No persistence change |
|
||||
| Backend / Services | Python 3.11+ | Hosts `OasisProfileGenerator` | Existing — unchanged. |
|
||||
| Backend / Services | `openai` SDK | Issues the prompt; returns JSON | Existing — unchanged. |
|
||||
| Backend / Services | `backend/app/utils/locale.py` | Resolves `Accept-Language` → `llmInstruction` postfix | Existing — unchanged. |
|
||||
| Backend / Services | `GraphitiAdapter` | Provides Graphiti graph search facts/nodes | Existing — unchanged. |
|
||||
|
||||
No new dependencies. No version bumps. The locale infrastructure used
|
||||
by the change is the same one used by every sibling i18n spec already
|
||||
merged.
|
||||
No new dependencies. No version changes.
|
||||
|
||||
## File Structure Plan
|
||||
|
||||
### Modified Files
|
||||
|
||||
- `backend/app/services/oasis_profile_generator.py` — only file that
|
||||
changes.
|
||||
- `_get_system_prompt(self, is_individual: bool) -> str` — translate
|
||||
`base_prompt` literal to English. Keep
|
||||
`f"{base_prompt}\n\n{get_language_instruction()}"` shape.
|
||||
- `_build_individual_persona_prompt(self, entity_name, entity_type,
|
||||
entity_summary, entity_attributes, context) -> str` — translate
|
||||
the f-string body to English; replace `"无"` and `"无额外上下文"`
|
||||
defaults; keep every `{variable}` interpolation and the inline
|
||||
`{get_language_instruction()}` call.
|
||||
- `_build_group_persona_prompt(self, entity_name, entity_type,
|
||||
entity_summary, entity_attributes, context) -> str` — same
|
||||
treatment as the individual builder.
|
||||
- `backend/app/services/oasis_profile_generator.py` — Replace the body of `_get_system_prompt` `base_prompt`; replace every Chinese string literal in `_build_individual_persona_prompt` and `_build_group_persona_prompt` with English equivalents; replace the four section labels in `_search_zep_for_entity` and the six section labels in `_build_entity_context`; replace the three fallback persona templates; replace the two `"无"` / `"无额外上下文"` placeholders; replace the console-output literals in `_print_generated_profile` and the two `print(...)` banners in `generate_profiles_from_entities`. Preserve every other character of the file.
|
||||
|
||||
No other files in the repository are touched by this change.
|
||||
No new files. No deletions. No moves.
|
||||
|
||||
## System Flows
|
||||
|
||||
The runtime flow does not change. The only way to demonstrate this is
|
||||
to compare the call graph before and after — and the call graph is
|
||||
already shown in the Architecture diagram above. Skipping a separate
|
||||
sequence diagram.
|
||||
The control-flow diagram in *Architecture Pattern & Boundary Map* covers the relevant flow; no additional diagrams are needed for this string-literal change.
|
||||
|
||||
## Requirements Traceability
|
||||
|
||||
| Requirement | Summary | Components | Interfaces | Flows |
|
||||
|-------------|---------|------------|------------|-------|
|
||||
| 1.1 | `base_prompt` contains zero Chinese characters | `_get_system_prompt` | `(self, is_individual: bool) -> str` | system-message construction |
|
||||
| 1.2 | Preserve `f"{base_prompt}\n\n{get_language_instruction()}"` | `_get_system_prompt` | inline `get_language_instruction()` | system-message construction |
|
||||
| 1.3 | Preserve role/intent semantics | `_get_system_prompt` | — | — |
|
||||
| 1.4 | Preserve signature `_get_system_prompt(self, is_individual: bool) -> str` | `_get_system_prompt` | (signature) | — |
|
||||
| 2.1 | Individual prompt body in English | `_build_individual_persona_prompt` | f-string body | user-message construction |
|
||||
| 2.2 | Preserve `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}` | `_build_individual_persona_prompt` | f-string interpolations | — |
|
||||
| 2.3 | Preserve JSON keys `bio, persona, age, gender, mbti, country, profession, interested_topics` | `_build_individual_persona_prompt` | prompt content | — |
|
||||
| 2.4 | Preserve field-level constraints (lengths, MBTI, gender enum, age int) | `_build_individual_persona_prompt` | prompt content | — |
|
||||
| 2.5 | Preserve trailing-rules block semantics | `_build_individual_persona_prompt` | prompt content | — |
|
||||
| 2.6 | Preserve method signature | `_build_individual_persona_prompt` | (signature) | — |
|
||||
| 2.7 | Translate `"无"` and `"无额外上下文"` defaults | `_build_individual_persona_prompt` | literal defaults | — |
|
||||
| 2.8 | Zero Chinese in assembled body | `_build_individual_persona_prompt` | — | — |
|
||||
| 3.1 | Group prompt body in English | `_build_group_persona_prompt` | f-string body | user-message construction |
|
||||
| 3.2 | Preserve interpolations | `_build_group_persona_prompt` | f-string interpolations | — |
|
||||
| 3.3 | Preserve JSON keys | `_build_group_persona_prompt` | prompt content | — |
|
||||
| 3.4 | Preserve field-level constraints (age=30, gender="other", etc.) | `_build_group_persona_prompt` | prompt content | — |
|
||||
| 3.5 | Preserve trailing-rules semantics | `_build_group_persona_prompt` | prompt content | — |
|
||||
| 3.6 | Preserve method signature | `_build_group_persona_prompt` | (signature) | — |
|
||||
| 3.7 | Translate `"无"` / `"无额外上下文"` defaults | `_build_group_persona_prompt` | literal defaults | — |
|
||||
| 3.8 | Zero Chinese in assembled body | `_build_group_persona_prompt` | — | — |
|
||||
| 4.1 | Preserve every `get_language_instruction()` call site | all three builders | inline call | system + user message construction |
|
||||
| 4.2 | Preserve locale-thread plumbing | `generate_profiles_for_entities` (untouched) | `set_locale(current_locale)` | worker thread spawn |
|
||||
| 4.3 | Locale=zh produces Chinese personas | runtime behaviour | locale postfix | LLM call |
|
||||
| 4.4 | Locale=en produces English personas | runtime behaviour | locale postfix | LLM call |
|
||||
| 4.5 | `gender` ∈ {male, female, other} regardless of locale | prompt content | — | — |
|
||||
| 4.6 | Don't alter locale.py / locales/ | (none) | — | — |
|
||||
| 5.1 | Preserve `OasisAgentProfile` dataclass | (untouched) | dataclass | — |
|
||||
| 5.2 | Preserve method signatures | (untouched) | signatures | — |
|
||||
| 5.3 | Preserve LLM invocation parameters | (untouched) | `chat.completions.create(...)` | — |
|
||||
| 5.4 | Preserve `MBTI_TYPES`, `COUNTRIES`, taxonomy lists | (untouched) | class constants | — |
|
||||
| 6.1 | Preserve `_fix_truncated_json` / `_try_fix_json` | (untouched) | helpers | — |
|
||||
| 6.2 | Reasoning-model recovery still works | (untouched) | resilience helpers | — |
|
||||
| 6.3 | No new prompt-language-dependent pre-processing | (none added) | — | — |
|
||||
| 6.4 | Round-trip yields non-empty `bio` and `persona` | runtime behaviour | LLM call | — |
|
||||
| 7.1 | `pytest test_profile_format.py` passes | runtime behaviour | serializers | — |
|
||||
| 7.2 | Reddit format schema preserved | (untouched) | `to_reddit_format` | — |
|
||||
| 7.3 | Twitter format schema preserved | (untouched) | `to_twitter_format` | — |
|
||||
| 7.4 | `gender` enum preserved | prompt content | — | — |
|
||||
| 8.1 | No logger edits | (untouched) | — | — |
|
||||
| 8.2 | No docstring/comment edits | (untouched) | — | — |
|
||||
| 8.3 | No rule-based fallback edits | (untouched) | — | — |
|
||||
| 8.4 | No edits outside the target file | (none) | — | — |
|
||||
| 8.5 | No new dependencies | (none) | `pyproject.toml` / `uv.lock` untouched | — |
|
||||
| 8.6 | No edits to `test_profile_format.py` | (untouched) | — | — |
|
||||
| 1.1–1.4 | English `_get_system_prompt` `base_prompt`; preserve `get_language_instruction()` site | OasisProfileGenerator → `_get_system_prompt` | None changed | Architecture diagram |
|
||||
| 2.1–2.9 | English `_build_individual_persona_prompt`; preserve interpolations and JSON keys | OasisProfileGenerator → `_build_individual_persona_prompt` | f-string interpolation | n/a |
|
||||
| 3.1–3.9 | English `_build_group_persona_prompt`; preserve fixed-value rules and interpolations | OasisProfileGenerator → `_build_group_persona_prompt` | f-string interpolation | n/a |
|
||||
| 4.1–4.10 | English context-builder section labels | OasisProfileGenerator → `_search_zep_for_entity`, `_build_entity_context` | Prompt-only | n/a |
|
||||
| 5.1–5.3 | English fallback persona templates | OasisProfileGenerator → `_generate_profile_with_llm`, `_try_fix_json` | None changed | n/a |
|
||||
| 6.1–6.7 | English console-output formatting | OasisProfileGenerator → `_print_generated_profile`, `generate_profiles_from_entities` | None changed | n/a |
|
||||
| 7.1–7.4 | Locale switching preserved via `get_language_instruction()` | OasisProfileGenerator + Locale | `get_language_instruction()` | Architecture diagram |
|
||||
| 8.1–8.6 | Public API and call-site stability; preserve `_normalize_gender` and `country: "中国"` data default | OasisProfileGenerator (signatures, dataclass) | Public surface | n/a |
|
||||
| 9.1–9.3 | Reasoning-model compatibility | OasisProfileGenerator → `chat.completions.create` + `_try_fix_json` | OpenAI SDK | Architecture diagram |
|
||||
| 10.1–10.7 | Out-of-scope surfaces untouched | OasisProfileGenerator (boundary commitment) | n/a | n/a |
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|
||||
|-----------|--------------|--------|--------------|--------------------------|-----------|
|
||||
| `_get_system_prompt` | backend service / prompt builder | Produce the system message (English base + locale postfix) | 1.1, 1.2, 1.3, 1.4, 4.1, 4.5 | `get_language_instruction` (P0) | Service |
|
||||
| `_build_individual_persona_prompt` | backend service / prompt builder | Produce the individual-entity user message in English | 2.x, 4.1, 4.5 | `get_language_instruction` (P0); JSON encoder (P1) | Service |
|
||||
| `_build_group_persona_prompt` | backend service / prompt builder | Produce the group/institution user message in English | 3.x, 4.1, 4.5 | `get_language_instruction` (P0); JSON encoder (P1) | Service |
|
||||
| OasisProfileGenerator (modified) | Backend / Service | Render English profile-generation prompts and context labels; preserve all behaviour | 1.1–10.7 | `OpenAI.chat.completions.create` (P0), `get_language_instruction` (P0), `GraphitiAdapter.graph.search` (P1), `_normalize_gender` (P0) | Service |
|
||||
|
||||
Only the three prompt-builder methods change. They all live inside the
|
||||
single class `OasisProfileGenerator` in
|
||||
`backend/app/services/oasis_profile_generator.py`. No new components.
|
||||
### Backend / Service
|
||||
|
||||
### Backend / Services
|
||||
|
||||
#### `_get_system_prompt`
|
||||
#### OasisProfileGenerator (modified)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Build the `system` message: a one-line English directive that frames the model as a social-media persona expert + the per-locale postfix. |
|
||||
| Requirements | 1.1, 1.2, 1.3, 1.4, 4.1, 4.5 |
|
||||
| Intent | Translate prompt strings, context labels, fallback persona templates, and console output to English while preserving every functional contract. |
|
||||
| Requirements | 1.1, 1.2, 1.3, 1.4, 2.1–2.9, 3.1–3.9, 4.1–4.10, 5.1–5.3, 6.1–6.7, 7.1–7.4, 8.1–8.6, 9.1–9.3, 10.1–10.7 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Construct and return a single string of the form
|
||||
`f"{base_prompt}\n\n{get_language_instruction()}"`.
|
||||
- Preserve the signature
|
||||
`_get_system_prompt(self, is_individual: bool) -> str`.
|
||||
- The English `base_prompt` MUST convey: (a) expert role in
|
||||
social-media persona generation; (b) intent to produce detailed,
|
||||
realistic personas for opinion-simulation, faithful to existing
|
||||
reality; (c) the JSON-output requirement and the no-unescaped-newline
|
||||
rule.
|
||||
- The English `base_prompt` MUST NOT contain any CJK codepoint.
|
||||
- Owns: the English wording of the system prompt body, the two user-message templates, the context-builder section labels, the fallback persona templates, the no-attributes / no-context placeholders, and the console-output formatting.
|
||||
- Domain boundary: prompt content and proximate console output only. Does not own locale resolution, transport, validation, or data values like the OASIS `country` default.
|
||||
- Invariants:
|
||||
- All seven owned regions after translation MUST contain zero CJK characters.
|
||||
- The translated user-message templates MUST present the same eight required JSON keys: `bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`.
|
||||
- The translated individual-persona template MUST require `gender ∈ {"male", "female"}` and `age` to be a valid integer.
|
||||
- The translated group-persona template MUST require `age == 30` and `gender == "other"`.
|
||||
- The translated user-message templates MUST preserve the f-string interpolations: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}`.
|
||||
- The translated context-builder labels MUST preserve the section structure (heading + bulleted body).
|
||||
- The translated fallback persona templates MUST preserve the `entity_summary or template` priority order.
|
||||
- The call to `get_language_instruction()` MUST remain at its current locations.
|
||||
- The call to `self.client.chat.completions.create(...)` MUST remain unchanged.
|
||||
- All public signatures, dataclass schema, and the private helper signatures MUST remain unchanged.
|
||||
- All `logger.*` calls (already keyed) and inline comments and docstrings in this file MUST remain unchanged (out of scope per #6 and #7).
|
||||
- The `_normalize_gender` mapping table MUST remain unchanged.
|
||||
- The rule-based `country: "中国"` default MUST remain unchanged.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Outbound: `get_language_instruction()` from
|
||||
`backend/app/utils/locale.py` (P0, criticality high — the entire
|
||||
locale-steering chain depends on it).
|
||||
- Inbound: `backend/app/api/simulation.py` — production caller (P0).
|
||||
- Outbound: `backend/app/utils/locale.get_language_instruction` — locale postfix (P0); `backend/app/utils/locale.t` — already-keyed log strings (P0); `backend/app/services/graphiti_adapter.GraphitiAdapter.graph.search` — facts/nodes retrieval (P1); `OpenAI.chat.completions.create` — JSON LLM transport (P0).
|
||||
- External: none.
|
||||
|
||||
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
The public Python interface is unchanged. Representative signatures:
|
||||
|
||||
```python
|
||||
def _get_system_prompt(self, is_individual: bool) -> str:
|
||||
"""Return the LLM system message: English base + locale postfix."""
|
||||
...
|
||||
class OasisProfileGenerator:
|
||||
def __init__(
|
||||
self,
|
||||
api_key: Optional[str] = None,
|
||||
base_url: Optional[str] = None,
|
||||
model_name: Optional[str] = None,
|
||||
zep_api_key: Optional[str] = None,
|
||||
graph_id: Optional[str] = None,
|
||||
) -> None: ...
|
||||
|
||||
def generate_profile_from_entity(
|
||||
self,
|
||||
entity: EntityNode,
|
||||
user_id: int,
|
||||
use_llm: bool = True,
|
||||
) -> OasisAgentProfile: ...
|
||||
|
||||
def generate_profiles_from_entities(
|
||||
self,
|
||||
entities: List[EntityNode],
|
||||
use_llm: bool = True,
|
||||
progress_callback: Optional[callable] = None,
|
||||
graph_id: Optional[str] = None,
|
||||
parallel_count: int = 5,
|
||||
realtime_output_path: Optional[str] = None,
|
||||
output_platform: str = "reddit",
|
||||
) -> List[OasisAgentProfile]: ...
|
||||
|
||||
def save_profiles(
|
||||
self,
|
||||
profiles: List[OasisAgentProfile],
|
||||
file_path: str,
|
||||
platform: str = "reddit",
|
||||
) -> None: ...
|
||||
```
|
||||
|
||||
- Preconditions: none.
|
||||
- Postconditions: returns a non-empty string ending with the locale
|
||||
postfix produced by `get_language_instruction()`.
|
||||
- Invariants: contains zero CJK codepoints.
|
||||
- Preconditions: a configured LLM provider; a configured Graphiti / Neo4j graph; a non-empty `entities` list when batching.
|
||||
- Postconditions: `OasisAgentProfile` instances with English `bio` and `persona` under locale `en`, Chinese under locale `zh`, and structurally equivalent across locales.
|
||||
- Invariants: see *Responsibilities & Constraints*.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: called only from `_call_llm_with_retry` (line ~523)
|
||||
with `is_individual` decided upstream. The `is_individual` flag is
|
||||
reserved for future divergence between system prompts; the current
|
||||
implementation does not branch on it, and this design preserves
|
||||
that.
|
||||
- Validation: a CJK regex audit on the method body after the edit must
|
||||
match zero codepoints.
|
||||
- Risks: dropping one of the three role/intent pieces (expert framing,
|
||||
JSON output requirement, no-newline rule). Implementation task lists
|
||||
all three explicitly.
|
||||
|
||||
#### `_build_individual_persona_prompt`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Build the user-message string for an individual entity in English. Preserve every `{variable}` interpolation, the inline `{get_language_instruction()}` call, every JSON-output key, and every locale-independent constraint. |
|
||||
| Requirements | 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 4.1, 4.5 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Preserve signature
|
||||
`_build_individual_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`.
|
||||
- Preserve `attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else <fallback>` with `<fallback>` translated to English (`"None"`).
|
||||
- Preserve `context_str = context[:3000] if context else <fallback>` with `<fallback>` translated to English (`"No additional context"`).
|
||||
- Translate the f-string body to English with these structural sections (mirror the original Chinese intent):
|
||||
1. **Lead sentence** — instruct the model to generate a detailed
|
||||
social-media persona for the entity, faithful to existing reality.
|
||||
2. **Entity context block** — labelled lines for `entity_name`,
|
||||
`entity_type`, `entity_summary`, `entity_attributes` (English
|
||||
labels; values via `{...}` interpolation).
|
||||
3. **Context information block** — `Context information:` heading
|
||||
followed by `{context_str}`.
|
||||
4. **JSON-fields enumeration** — `Generate JSON with the following
|
||||
fields:` followed by the eight numbered items (`bio`, `persona`,
|
||||
`age`, `gender`, `mbti`, `country`, `profession`,
|
||||
`interested_topics`) with English descriptions matching
|
||||
Requirement 2.4.
|
||||
5. **Trailing rules block** — `Important:` followed by:
|
||||
- `All field values must be strings or numbers; do not use newlines.`
|
||||
- `persona must be a single coherent block of text.`
|
||||
- `{get_language_instruction()} (gender field MUST use English values: "male" or "female")`
|
||||
- `Content must remain consistent with the entity information.`
|
||||
- `age must be a valid integer; gender must be exactly "male" or "female".`
|
||||
- Preserve every `{variable}` interpolation present in the original by
|
||||
name: `{entity_name}`, `{entity_type}`, `{entity_summary}`,
|
||||
`{attrs_str}`, `{context_str}`, `{get_language_instruction()}`.
|
||||
- The translated body MUST NOT contain any CJK codepoint.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Outbound: `json.dumps(..., ensure_ascii=False)` (P1, formatting the
|
||||
attributes dict) — unchanged.
|
||||
- Outbound: `get_language_instruction()` (P0) — interpolated inline.
|
||||
|
||||
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _build_individual_persona_prompt(
|
||||
self,
|
||||
entity_name: str,
|
||||
entity_type: str,
|
||||
entity_summary: str,
|
||||
entity_attributes: Dict[str, Any],
|
||||
context: str,
|
||||
) -> str:
|
||||
"""Return the LLM user message for an individual-entity persona."""
|
||||
...
|
||||
```
|
||||
|
||||
- Preconditions: `entity_name`, `entity_type`, `entity_summary`
|
||||
are strings (may be empty); `entity_attributes` is a dict (may be
|
||||
empty); `context` is a string (may be empty).
|
||||
- Postconditions: returns a non-empty English string with all six
|
||||
interpolations resolved.
|
||||
- Invariants: contains zero CJK codepoints; preserves every
|
||||
`{variable}` interpolation by name.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: called from `_call_llm_with_retry` (line ~506) when
|
||||
`is_individual` is true.
|
||||
- Validation: post-edit CJK regex audit; interpolation-set audit
|
||||
(verify the multiset of `{...}` tokens equals the pre-change set);
|
||||
smoke import + `pytest backend/scripts/test_profile_format.py`.
|
||||
- Risks: dropping the `gender` enum lock when translating; dropping
|
||||
the inline `{get_language_instruction()}` call. The implementation
|
||||
task list calls these out as discrete checks.
|
||||
|
||||
#### `_build_group_persona_prompt`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Build the user-message string for a group/institution entity in English. Preserve every `{variable}` interpolation, the inline `{get_language_instruction()}` call, every JSON-output key, and every locale-independent constraint (notably `age == 30` and `gender == "other"`). |
|
||||
| Requirements | 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.1, 4.5 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Preserve signature
|
||||
`_build_group_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`.
|
||||
- Preserve the `attrs_str` and `context_str` fallback handling with
|
||||
English defaults (`"None"`, `"No additional context"`), identical to
|
||||
the individual builder.
|
||||
- Translate the f-string body to English with these structural
|
||||
sections (mirror the original Chinese intent for institutions):
|
||||
1. **Lead sentence** — instruct the model to generate a detailed
|
||||
social-media account profile for the institution/group, faithful
|
||||
to existing reality.
|
||||
2. **Entity context block** — labelled lines for `entity_name`,
|
||||
`entity_type`, `entity_summary`, `entity_attributes`.
|
||||
3. **Context information block** — `Context information:` heading
|
||||
followed by `{context_str}`.
|
||||
4. **JSON-fields enumeration** — `Generate JSON with the following
|
||||
fields:` followed by the eight numbered items as defined in
|
||||
Requirement 3.4: `bio` (~200 chars, official voice), `persona`
|
||||
(~2000 chars, single coherent text covering institutional
|
||||
basics, account positioning, voice, publishing pattern, stance,
|
||||
special notes, institutional memory), `age` (= integer 30,
|
||||
institutional virtual age), `gender` (= literal `"other"`),
|
||||
`mbti` (e.g. ISTJ for strict/conservative), `country` (country
|
||||
name string), `profession` (institutional function),
|
||||
`interested_topics` (array).
|
||||
5. **Trailing rules block** — `Important:` followed by:
|
||||
- `All field values must be strings or numbers; null is not allowed.`
|
||||
- `persona must be a single coherent block of text without newlines.`
|
||||
- `{get_language_instruction()} (gender field MUST use English value "other")`
|
||||
- `age must be the integer 30; gender must be the string "other".`
|
||||
- `Account voice must match its identity positioning.`
|
||||
- Preserve every `{variable}` interpolation present in the original.
|
||||
- The translated body MUST NOT contain any CJK codepoint.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Outbound: same as individual builder.
|
||||
|
||||
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [ ]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _build_group_persona_prompt(
|
||||
self,
|
||||
entity_name: str,
|
||||
entity_type: str,
|
||||
entity_summary: str,
|
||||
entity_attributes: Dict[str, Any],
|
||||
context: str,
|
||||
) -> str:
|
||||
"""Return the LLM user message for a group/institution persona."""
|
||||
...
|
||||
```
|
||||
|
||||
- Preconditions / Postconditions / Invariants: same shape as the
|
||||
individual builder.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: called from `_call_llm_with_retry` (line ~510) when
|
||||
`is_individual` is false.
|
||||
- Validation: same checks as the individual builder, plus an explicit
|
||||
audit that the institutional sentinels (`age == 30`,
|
||||
`gender == "other"`) appear in English in the trailing-rules block.
|
||||
- Risks: same as the individual builder; additionally, the `country`
|
||||
language hint (`"使用中文,如\"中国\""`) is intentionally dropped
|
||||
during translation — the validation task verifies that under
|
||||
`Accept-Language: en` a sample run produces an English country
|
||||
name.
|
||||
- **Integration**: No new imports. No call-site changes. The diff is confined to seven regions of one file.
|
||||
- **Validation**: After implementation, run a targeted regex check (`[一-鿿]`) over the seven owned regions to confirm zero CJK; smoke-test `_build_individual_persona_prompt(...)` and `_build_group_persona_prompt(...)` with representative inputs to confirm interpolations still work; round-trip a single profile end-to-end under both `en` and `zh` locales.
|
||||
- **Risks**: English-base bias on Chinese-locale output (mitigated by the `llmInstruction` postfix already present in both system and user messages). Reduced LLM compliance with `gender ∈ {male, female}` for individual entities (mitigated by retaining the explicit English-token directive verbatim in the rules block).
|
||||
|
||||
## Data Models
|
||||
|
||||
No data-model changes. The persona JSON schema, the
|
||||
`OasisAgentProfile` dataclass, the Reddit/Twitter serializers, and the
|
||||
OASIS subprocess profile-format expectations are all preserved
|
||||
verbatim.
|
||||
No data-model changes. The `OasisAgentProfile` dataclass is preserved verbatim.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Strategy
|
||||
|
||||
No new error paths. The existing flow is preserved:
|
||||
Error handling is unchanged from the existing implementation:
|
||||
|
||||
- `json.JSONDecodeError` → `_try_fix_json` → `_fix_truncated_json` →
|
||||
partial-extract via regex → `_generate_profile_rule_based`.
|
||||
- LLM call failure → retry with temperature decay (`0.7 - attempt * 0.1`)
|
||||
up to `max_attempts = 3`.
|
||||
- Terminal failure → rule-based fallback persona.
|
||||
- Per-entity worker exception → fallback `OasisAgentProfile` produced
|
||||
inside `generate_single_profile` at line ~932.
|
||||
|
||||
The translated prompts do not introduce new failure modes. Translating
|
||||
prompt language has no semantic effect on JSON parsing or on the
|
||||
`response_format={"type": "json_object"}` constraint.
|
||||
- LLM transport errors propagate from `chat.completions.create`.
|
||||
- Truncation (`finish_reason == "length"`) is repaired by `_fix_truncated_json`.
|
||||
- Invalid JSON falls through to `_try_fix_json`, then to a synthesized fallback profile (now with English persona text).
|
||||
- Per-entity exceptions are caught and a fallback `OasisAgentProfile` is constructed with English fallback strings.
|
||||
|
||||
### Error Categories and Responses
|
||||
|
||||
- **User errors**: not applicable (this is an internal pipeline).
|
||||
- **System errors**: LLM transport errors are retried; logger emits
|
||||
`t("log.profile_generator.m011")` etc. Logger keys already exist in
|
||||
`locales/{en,zh}.json`.
|
||||
- **Business-logic errors**: `gender` not in the English enum, `age`
|
||||
not an integer — the prompt explicitly mandates them; the validator
|
||||
inside `_try_fix_json` does not enforce these but the OASIS
|
||||
subprocess does. No change in either direction.
|
||||
- **User errors (4xx)**: not applicable at this layer; surfaced by the API handler.
|
||||
- **System errors (5xx)**: LLM/network failures propagate to the API handler, which converts them to JSON error responses.
|
||||
- **Business logic errors**: malformed JSON is auto-repaired or replaced with a fallback profile.
|
||||
|
||||
### Monitoring
|
||||
|
||||
Existing logger calls are unchanged. Logger keys already i18n-keyed via
|
||||
`t("log.profile_generator.*")`.
|
||||
Existing `logger.*` calls (keyed via `t("log.profile_generator.*")`) cover progress and warnings; no new monitoring is added.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- **(Existing)**
|
||||
`backend/scripts/test_profile_format.py::test_profile_formats` —
|
||||
must continue to pass without modification.
|
||||
- **(Manual)** Smoke import:
|
||||
`cd backend && uv run python -c "from app.services.oasis_profile_generator import OasisProfileGenerator"`
|
||||
— confirms no syntax errors after editing f-strings.
|
||||
Given the project's intentionally minimal test harness (`backend/scripts/test_profile_format.py` only), the change is verified via:
|
||||
|
||||
- **Static check**: a one-shot regex assertion against the patched module ensuring zero CJK characters in the seven owned regions. This can be a quick `python -c` invocation during PR review.
|
||||
- **Round-trip smoke test**: instantiate `OasisProfileGenerator()`, call `_build_individual_persona_prompt(...)` and `_build_group_persona_prompt(...)` with representative inputs, and verify all required interpolations appear in the output and no CJK characters remain.
|
||||
- **Fallback rendering**: simulate a JSON parse failure and verify the English fallback persona template is produced.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- **(Manual)** Run the prompt builders directly under each locale:
|
||||
- `set_locale("en")` →
|
||||
`OasisProfileGenerator()._build_individual_persona_prompt("Alice", "Student", "summary", {"k": "v"}, "ctx")`
|
||||
— assert no CJK codepoints in the output, assert the English
|
||||
locale postfix appears via `get_language_instruction()` (which is
|
||||
`"Please respond in English."`).
|
||||
- `set_locale("zh")` → same call → assert the locale postfix is
|
||||
`"请使用中文回答。"`.
|
||||
- These do not require an LLM call; they only verify the rendered
|
||||
prompt string.
|
||||
- **Step 2 profile generation under EN locale**: run a small batched profile generation against a real Graphiti graph with locale `en`. Verify produced profiles have English `bio` / `persona` and pass the existing OASIS profile-format check.
|
||||
|
||||
### E2E Tests
|
||||
### E2E/UI Tests
|
||||
|
||||
- **(Manual, optional, preferred but skippable when no LLM key
|
||||
present)** Run `npm run dev` and trigger Step 2 profile generation
|
||||
from the UI under English locale on a small entity set; spot-check
|
||||
that bios and persona prose are in English. Skip if a live LLM key
|
||||
is unavailable in CI; sibling specs #2/#4/#5 used the same manual
|
||||
E2E approach.
|
||||
Not applicable — change does not affect frontend.
|
||||
|
||||
### Performance / Load
|
||||
### Performance/Load
|
||||
|
||||
Not applicable. Prompt translation has no measurable performance
|
||||
impact.
|
||||
Not applicable — token counts may differ slightly between Chinese and English renderings, but the LLM call has no `max_tokens` cap and remains within provider-acceptable limits.
|
||||
|
||||
## Optional Sections
|
||||
|
||||
### Security Considerations
|
||||
|
||||
No security implications. No new external surfaces; no new data
|
||||
retention; no change to authentication or authorization.
|
||||
Not applicable. Translation does not introduce new authentication, authorization, data-handling, or input-validation paths.
|
||||
|
||||
### Performance & Scalability
|
||||
|
||||
Not applicable.
|
||||
|
||||
### Migration Strategy
|
||||
|
||||
No migration required. The change is forward-compatible: a deployment
|
||||
that picks up the translated prompts continues to serve users on the
|
||||
`zh` locale via the unchanged
|
||||
`get_language_instruction()` postfix mechanism.
|
||||
Not applicable. The change is a single in-place edit; no data migration. Rollback is `git revert`.
|
||||
|
||||
## Supporting References
|
||||
|
||||
- `gap-analysis.md` — option evaluation and effort/risk sizing.
|
||||
- `research.md` — discovery findings, design decisions (in particular
|
||||
the "drop the country language hint" decision), and risk register.
|
||||
- `requirements.md` — EARS requirements with numeric IDs.
|
||||
- Sibling specs `i18n-ontology-generator-prompts`,
|
||||
`i18n-simulation-config-generator-prompts`,
|
||||
`i18n-report-agent-prompts` — same translation pattern, already
|
||||
merged.
|
||||
- `backend/app/services/oasis_profile_generator.py` — current Chinese prompt content (the source of translation).
|
||||
- `backend/app/utils/locale.py` — locale resolver.
|
||||
- `backend/app/api/simulation.py` — call site.
|
||||
- `.kiro/specs/i18n-ontology-generator-prompts/design.md` — adjacent reference design for in-place prompt translation.
|
||||
- `.ticket/25.md` — ticket snapshot.
|
||||
|
|
|
|||
|
|
@ -2,144 +2,167 @@
|
|||
|
||||
## Introduction
|
||||
|
||||
This specification covers the English translation of the prompt strings in `backend/app/services/oasis_profile_generator.py`. The file converts Graphiti graph entities into OASIS agent persona dictionaries that drive Step 2 (Environment Setup) of the MiroFish pipeline. Today, the system prompt and the two `_build_*_persona_prompt` user-message templates are written in Chinese; the language is steered at runtime by appending `get_language_instruction()` to the system prompt and inside the user prompt body. While that postfix instructs the model *which* language to respond in, the base-prompt language biases the model's structural and lexical output, so persona prose (bio, persona, profession, interested_topics) skews Chinese under `Accept-Language: en`. Translating the base prompts to English removes that bias while preserving the existing locale-switching mechanism for non-English locales (`get_language_instruction()` returns `请使用中文回答。` when locale is `zh`, so a Chinese model response remains achievable from an English base prompt).
|
||||
This specification covers the English translation of the LLM-prompt assembly strings in `backend/app/services/oasis_profile_generator.py`. The file generates OASIS Agent profiles (bio, persona, demographics) from Graphiti/Zep entities during pipeline Step 2. Today, the system prompt and the two user-message builders (`_build_individual_persona_prompt`, `_build_group_persona_prompt`) are written in Chinese, and the runtime context-builders (`_search_zep_for_entity`, `_build_entity_context`) embed Chinese section labels (`事实信息:`, `相关实体:`, `### 实体属性`, `### 关联实体信息`, etc.) into the prompt context that is later interpolated into the user message. Locale is steered at runtime by appending `get_language_instruction()` to the system message and the user-message rules block, but the base-prompt language and the embedded context labels bias the LLM toward Chinese output even when `Accept-Language: en`. Translating the prompt body and the context labels removes that bias while preserving the existing locale-switching mechanism for non-English locales.
|
||||
|
||||
This work tracks GitHub issue [#3](https://github.com/salestech-group/MiroFish/issues/3) and is sibling to the already-merged ontology-generator (#2), simulation-config-generator (#4), and report-agent (#5) prompt translation specs.
|
||||
This work tracks GitHub issue [#25](https://github.com/salestech-group/MiroFish/issues/25).
|
||||
|
||||
## Boundary Context
|
||||
|
||||
- **In scope**:
|
||||
- Translating the system-prompt base string in `OasisProfileGenerator._get_system_prompt` (currently `"你是社交媒体用户画像生成专家。…"` at line ~664) from Chinese to English.
|
||||
- Translating the individual-persona user-message template in `OasisProfileGenerator._build_individual_persona_prompt` (currently lines ~680–714) from Chinese to English.
|
||||
- Translating the group/institution-persona user-message template in `OasisProfileGenerator._build_group_persona_prompt` (currently lines ~729–762) from Chinese to English.
|
||||
- Translating the small `attrs_str` and `context_str` fallback default literals (`"无"`, `"无额外上下文"`) to English equivalents.
|
||||
- Preserving all functional contracts: every `get_language_instruction()` call site, all variable interpolations, all JSON output keys, the `gender` enum constraint, the `age` integer constraint, and the institutional age=30 / gender="other" rule.
|
||||
- Translating the system-prompt base string in `_get_system_prompt` (`base_prompt = "你是社交媒体用户画像生成专家..."`).
|
||||
- Translating the user-message body in `_build_individual_persona_prompt` (header line, field labels, JSON-field descriptions, "重要" rules block).
|
||||
- Translating the user-message body in `_build_group_persona_prompt` (header line, field labels, JSON-field descriptions, "重要" rules block).
|
||||
- Translating the placeholder values used inside those builders: `"无"` and `"无额外上下文"` (substituted when an entity has no attributes or no context).
|
||||
- Translating the section-heading labels prepended to context fragments by `_search_zep_for_entity` (`"相关实体: "` prefix on node-name labels; `"事实信息:"`, `"相关实体:"` block headings).
|
||||
- Translating the section-heading labels prepended to context fragments by `_build_entity_context` (`"### 实体属性"`, `"### 相关事实和关系"`, `"### 关联实体信息"`, `"### Zep检索到的事实信息"`, `"### Zep检索到的相关节点"`, plus the inline `(相关实体)` placeholder in edge-direction fragments).
|
||||
- Translating the fallback persona templates (`f"{entity_name}是一个{entity_type}。"`) used when LLM JSON parsing fails or fields are missing.
|
||||
- Translating the console-output formatting in `_print_generated_profile` (the `【简介】`, `【详细人设】`, `【基本属性】` headings and the `用户名:`, `年龄:`, `性别:`, `MBTI:`, `职业:`, `国家:`, `兴趣话题:` row labels) and the surrounding `print` banners in `generate_profiles_from_entities` (`开始生成Agent人设...`, `人设生成完成!...`).
|
||||
- Translating the `'无'` sentinel emitted when `interested_topics` is empty in `_print_generated_profile`.
|
||||
- Preserving all functional contracts: f-string interpolations, JSON output schema, `get_language_instruction()` postfix call sites, `_normalize_gender` mappings (Chinese `男`/`女`/`机构`/`其他` keys remain — input data may still arrive in those forms), the `country: "中国"` rule-based default in `_generate_profile_rule_based`, the `OASIS 库要求字段名为 username(无下划线)` inline comments at lines 65 and 93 (these are code-level documentation, owned by issue #7), and the `# 可能被截断` / `# 机构虚拟年龄` etc. inline comments (owned by issue #7).
|
||||
- **Out of scope**:
|
||||
- Logger calls (`logger.info`, `logger.warning`, `logger.error`) and the printed banner text inside `oasis_profile_generator.py` — covered by issue #6.
|
||||
- Module docstring, class docstrings, method docstrings, and inline comments — covered by issue #7.
|
||||
- The fallback Chinese string literals embedded in non-prompt code paths (e.g. `f"{entity_name}是一个{entity_type}。"` inside `_try_fix_json` and the rule-based fallback) — those are runtime data fallbacks, not LLM prompts, and are out of scope for this issue (they are part of the fallback flow covered when comments/docstrings #7 lands or in a future cleanup; they are not user-visible while the LLM path succeeds).
|
||||
- Refactoring the OASIS profile JSON schema, the `OasisAgentProfile` dataclass, the MBTI list, the `COMMON_COUNTRIES` list, the entity-type taxonomy splits (`PERSONAL_ENTITY_TYPES` vs `GROUP_ENTITY_TYPES`), or persona-generation flow control.
|
||||
- Changing OASIS profile-format compatibility — verified by `backend/scripts/test_profile_format.py`.
|
||||
- Editing the locale plumbing block (currently the `current_locale = get_locale()` capture and the `set_locale(current_locale)` call inside `generate_single_profile` around lines ~910–916).
|
||||
- Logger calls in this file (covered by issue #6 and the in-flight #24/#25 backend-log work — the logger calls already use `t("log.profile_generator.*")` keys).
|
||||
- Module/class/method docstrings and inline code comments (covered by issue #7 — including the `# OASIS 库要求字段名为 username` and `# 机构虚拟年龄` style comments).
|
||||
- The `_normalize_gender` mapping table (it must continue to accept Chinese gender inputs that may still arrive from upstream LLM output or user-supplied data).
|
||||
- The hard-coded `"中国"` rule-based country default (this is a data value that downstream OASIS expects in a free-form `country` field; changing the default is a data migration, not a translation).
|
||||
- The Chinese identifier in the `ValueError("LLM_API_KEY 未配置")` raise — that is an exception message, not a prompt fragment, and will be translated under issue #6 (already partially in progress under #24).
|
||||
- Externalising prompt strings to `/locales/*.json` (out of scope per the `i18n-*-prompts` family of tickets — same pattern as issues #2/#3/#4/#5).
|
||||
- Editing call sites of `OasisProfileGenerator` (`api/simulation.py`, etc.).
|
||||
- Editing `backend/app/utils/locale.py`, the locale registries, or `/locales/`.
|
||||
- **Adjacent expectations**:
|
||||
- The Step 2 environment-setup pipeline must continue to consume the OASIS profile output unchanged. The Reddit (`to_reddit_format`) and Twitter (`to_twitter_format`) serializers are not coupled to prompt language; this is verified via the JSON schema contract preservation.
|
||||
- The locale resolution chain (`Accept-Language` header → `get_locale()` → `get_language_instruction()`) is owned by `backend/app/utils/locale.py` and is unchanged by this work.
|
||||
- Companion i18n issues (#6 logs, #7 comments/docstrings, #9 frontend comments, #10 e2e verification, #12 README) operate on different files or scopes and must not be touched here.
|
||||
- The OASIS / CAMEL-OASIS simulation layer must continue to consume profile JSON unchanged. No coupling to prompt language exists in the OASIS adapter.
|
||||
- The locale resolution chain (`Accept-Language` header → `get_locale()` → `get_language_instruction()`) is owned by `backend/app/utils/locale.py` and is unchanged by this work. Translating the base prompt does not modify locale resolution semantics.
|
||||
- Companion i18n issues (#3, #4, #5, #6, #7, #9, #10, #23, #24, #26) operate on different files or scopes and should not be touched here.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: English Translation of the System Prompt
|
||||
### Requirement 1: English Translation of the Profile-Generation System Prompt
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the persona-generation system prompt to be authored in English, so that the LLM's persona prose is not biased toward Chinese structure or word choice.
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the profile-generation system prompt to be authored in English, so that the LLM's persona output is not biased toward Chinese structure or word choice.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall set the `base_prompt` constant inside `_get_system_prompt` to an English string containing zero Chinese characters.
|
||||
2. The OASIS Profile Generator shall preserve the system-prompt assembly contract verbatim: the format `f"{base_prompt}\n\n{get_language_instruction()}"` and the call to `get_language_instruction()` at exactly that site.
|
||||
3. The OASIS Profile Generator shall preserve the role and intent semantics of the original prompt: identifying the model as an expert in social-media user-persona generation, requesting detailed and realistic personas for opinion simulation that reflect existing real-world conditions, and mandating valid JSON output where string values must not contain unescaped newlines.
|
||||
4. The OASIS Profile Generator shall preserve the function signature `_get_system_prompt(self, is_individual: bool) -> str`.
|
||||
1. The OASIS Profile Generator shall define `base_prompt` (in `_get_system_prompt`) containing zero CJK characters in any string-literal content.
|
||||
2. The OASIS Profile Generator shall preserve the system-prompt requirement that the model returns valid JSON whose string values do not contain unescaped newline characters.
|
||||
3. The OASIS Profile Generator shall preserve the call to `get_language_instruction()` appended to `base_prompt`, exactly at the existing concatenation site, so locale steering continues to work for non-English locales.
|
||||
4. The OASIS Profile Generator shall preserve the `is_individual` parameter of `_get_system_prompt` and continue to return a single concatenated system-prompt string of the form `"{base_prompt}\n\n{language_instruction}"`.
|
||||
|
||||
### Requirement 2: English Translation of the Individual-Persona User-Message Template
|
||||
|
||||
**Objective:** As a MiroFish operator generating personas for individual entities under `Accept-Language: en`, I want the user-message template constructed by `_build_individual_persona_prompt` to be authored in English, so that the rendered prompt does not interleave English `get_language_instruction()` directives with Chinese section headings.
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the individual-persona user-message template constructed by `_build_individual_persona_prompt` to be authored in English, so that the rendered prompt does not interleave English instructions with Chinese section headings, and the LLM is not biased toward Chinese output.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall render the individual-persona user message with English section headings and prose in place of the current Chinese (entity name, entity type, entity summary, entity attributes, context section, JSON-fields enumeration, "important" trailing block).
|
||||
2. The OASIS Profile Generator shall preserve all variable interpolations verbatim by name: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, and the inline `{get_language_instruction()}` call inside the trailing rules block.
|
||||
3. The OASIS Profile Generator shall preserve the JSON output contract enumerated in the prompt: the keys `bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics` (verbatim, English).
|
||||
4. The OASIS Profile Generator shall preserve the field-level constraints in the prompt:
|
||||
- `bio` ≈ 200 characters, social-media biography.
|
||||
- `persona` ≈ 2000 characters, single coherent text covering: basic information (age, profession, education, location), background (notable experience, event association, social ties), personality (MBTI, core traits, emotional expression), social-media behavior (posting frequency, content preferences, interaction style, language traits), stance (attitudes toward the topic, emotional triggers), unique features (catchphrases, special experiences, hobbies), and personal memory (the entity's relation to the event and prior actions/reactions in it).
|
||||
- `age` MUST be an integer.
|
||||
- `gender` MUST be one of `"male"` or `"female"` (English enum value, locale-independent).
|
||||
- `mbti` MUST be an MBTI four-letter type (e.g. INTJ, ENFP).
|
||||
- `country` MUST be a country name string.
|
||||
- `profession` MUST be a profession string.
|
||||
- `interested_topics` MUST be an array.
|
||||
5. The OASIS Profile Generator shall preserve the trailing-block rules verbatim in spirit: every value is a string or number, no newlines inside string values, `persona` is a single coherent text, `gender` must be the English `male`/`female` enum even when locale is `zh`, content must stay consistent with the source entity, `age` must be a valid integer.
|
||||
6. The OASIS Profile Generator shall preserve the function signature `_build_individual_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`.
|
||||
7. The OASIS Profile Generator shall preserve the `context[:3000]` truncation behaviour and the conditional fallback (`"无额外上下文"` translated to `"No additional context"`) when `context` is empty/falsy. Likewise, `attrs_str` shall fall back to an English placeholder (`"None"`) when `entity_attributes` is empty/falsy, replacing the current `"无"` literal.
|
||||
8. The OASIS Profile Generator shall return zero Chinese characters across all string literals contributed to the assembled individual-persona prompt body.
|
||||
1. The OASIS Profile Generator shall render the individual-persona user message with English field labels in place of `实体名称`, `实体类型`, `实体摘要`, `实体属性`, and `上下文信息`.
|
||||
2. The OASIS Profile Generator shall render the JSON-field descriptions (the `请生成JSON,包含以下字段` enumeration) in English while preserving the eight required output keys verbatim by name (`bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`).
|
||||
3. The OASIS Profile Generator shall preserve the requirement language that `gender` MUST be the literal English token `"male"` or `"female"` for individual entities, and that `age` MUST be a valid integer.
|
||||
4. The OASIS Profile Generator shall preserve the trailing rules block (the `重要:` enumeration) in English, conveying the same constraints: all field values must be strings or numbers, no embedded newlines; persona must be a coherent single text block; the `gender` field uses English `male`/`female`; content must remain consistent with the entity information; `age` must be a valid integer.
|
||||
5. The OASIS Profile Generator shall preserve the call to `get_language_instruction()` interpolated into the rules block.
|
||||
6. The OASIS Profile Generator shall preserve all f-string interpolations verbatim by name and position: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}`.
|
||||
7. The OASIS Profile Generator shall replace the no-attributes placeholder `"无"` with the English `"None"` when `entity_attributes` is empty / falsy, and the no-context placeholder `"无额外上下文"` with an English equivalent (e.g. `"No additional context"`) when `context` is empty / falsy.
|
||||
8. The OASIS Profile Generator shall return zero CJK characters across all string literals contributed by `_build_individual_persona_prompt`.
|
||||
9. The OASIS Profile Generator shall preserve the existing `country` field instruction semantics (a free-form country name is requested) but replace the example `"中国"` with a locale-neutral English phrasing that does not bias the model toward any single country (e.g. `Free-form country name`).
|
||||
|
||||
### Requirement 3: English Translation of the Group/Institution-Persona User-Message Template
|
||||
|
||||
**Objective:** As a MiroFish operator generating personas for institutional/group entities under `Accept-Language: en`, I want the user-message template constructed by `_build_group_persona_prompt` to be authored in English, so that the rendered prompt does not interleave English `get_language_instruction()` directives with Chinese section headings.
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the group-persona user-message template constructed by `_build_group_persona_prompt` to be authored in English, with the same scope and contract as Requirement 2 but for institutional entities.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall render the group-persona user message with English section headings and prose in place of the current Chinese.
|
||||
2. The OASIS Profile Generator shall preserve all variable interpolations verbatim by name: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, and the inline `{get_language_instruction()}` call inside the trailing rules block.
|
||||
3. The OASIS Profile Generator shall preserve the JSON output contract enumerated in the prompt: the keys `bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics` (verbatim, English).
|
||||
4. The OASIS Profile Generator shall preserve the field-level constraints in the prompt:
|
||||
- `bio` ≈ 200 characters, an official-account biography that reads as professionally appropriate.
|
||||
- `persona` ≈ 2000 characters, single coherent text covering: institutional basics (formal name, type, founding background, primary functions), account positioning (account type, target audience, core function), voice (language traits, common phrasing, taboo topics), publishing pattern (content types, publishing frequency, active hours), stance (official position on the core topic, controversy-handling style), special notes (group portrait represented, operational habits), and institutional memory (the institution's relation to the event and prior actions/reactions in it).
|
||||
- `age` MUST be the integer `30` (the institutional virtual-age sentinel).
|
||||
- `gender` MUST be the literal `"other"` (English enum value, locale-independent), indicating non-individual.
|
||||
- `mbti` MUST be an MBTI four-letter type used to characterize account voice (e.g. ISTJ for strict/conservative).
|
||||
- `country` MUST be a country name string.
|
||||
- `profession` MUST describe institutional function.
|
||||
- `interested_topics` MUST be an array of focus areas.
|
||||
5. The OASIS Profile Generator shall preserve the trailing-block rules verbatim in spirit: every value is a string or number, no `null` values, no newlines in string values, `persona` is a single coherent text, `gender` must be the English `"other"` enum even when locale is `zh`, the institutional account voice must match its identity positioning, and `age` must be the integer `30`.
|
||||
6. The OASIS Profile Generator shall preserve the function signature `_build_group_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`.
|
||||
7. The OASIS Profile Generator shall preserve the `context[:3000]` truncation behaviour and the conditional English-equivalent fallback for empty `context` and empty `entity_attributes`, mirroring Requirement 2.
|
||||
8. The OASIS Profile Generator shall return zero Chinese characters across all string literals contributed to the assembled group-persona prompt body.
|
||||
1. The OASIS Profile Generator shall render the group-persona user message with English field labels in place of `实体名称`, `实体类型`, `实体摘要`, `实体属性`, and `上下文信息`.
|
||||
2. The OASIS Profile Generator shall render the JSON-field descriptions (the `请生成JSON,包含以下字段` enumeration) in English while preserving the eight required output keys verbatim by name (`bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`).
|
||||
3. The OASIS Profile Generator shall preserve the fixed-value requirements: `age` MUST be the integer literal `30`; `gender` MUST be the literal English token `"other"`.
|
||||
4. The OASIS Profile Generator shall preserve the trailing rules block (the `重要:` enumeration) in English, conveying the same constraints: all field values must be strings or numbers (no nulls); persona must be a coherent single text block (no embedded newlines); the `gender` field uses English `"other"`; `age` must be the integer `30`; the institutional account's voice must match its identity.
|
||||
5. The OASIS Profile Generator shall preserve the call to `get_language_instruction()` interpolated into the rules block.
|
||||
6. The OASIS Profile Generator shall preserve all f-string interpolations verbatim by name and position: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}`.
|
||||
7. The OASIS Profile Generator shall use the same English placeholders as Requirement 2 for the no-attributes and no-context cases.
|
||||
8. The OASIS Profile Generator shall return zero CJK characters across all string literals contributed by `_build_group_persona_prompt`.
|
||||
9. The OASIS Profile Generator shall preserve the existing `country` field instruction with a locale-neutral English phrasing (matching Requirement 2.9).
|
||||
|
||||
### Requirement 4: Locale Switching Continues to Work via `get_language_instruction()`
|
||||
### Requirement 4: English Translation of the Context-Builder Section Labels
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: zh` (or any other configured non-English locale), I want generated personas to remain in the requested locale at equivalent quality, so that translating the base prompt does not regress non-English support.
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the section labels embedded in the context string by `_search_zep_for_entity` and `_build_entity_context` to be in English, so that the prompt context block interpolated into the user message is fully English and the LLM is not biased toward Chinese output by the context labels.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall preserve every existing `get_language_instruction()` call site exactly: the system-prompt site in `_get_system_prompt`, the inline call inside the trailing rules block of `_build_individual_persona_prompt`, and the inline call inside the trailing rules block of `_build_group_persona_prompt`.
|
||||
2. The OASIS Profile Generator shall preserve the locale-capture/restore plumbing inside `generate_profiles_for_entities` (currently the `current_locale = get_locale()` capture and the `set_locale(current_locale)` call inside `generate_single_profile`) — this code is not modified by the change.
|
||||
3. While the locale is `zh`, the OASIS Profile Generator shall produce profiles whose `bio`, `persona`, `profession`, and `interested_topics` content is in Chinese, equivalent in quality to the pre-change behaviour.
|
||||
4. While the locale is `en`, the OASIS Profile Generator shall produce profiles whose `bio`, `persona`, `profession`, and `interested_topics` content is in English.
|
||||
5. While the locale is `en` or `zh`, the OASIS Profile Generator shall produce profiles whose `gender` field is one of the literal English values `"male"`, `"female"` (individual entities) or `"other"` (group entities), regardless of locale.
|
||||
6. The OASIS Profile Generator shall not alter `backend/app/utils/locale.py`, the `_languages`, the `_translations` registries, or the locales under `/locales/`.
|
||||
1. The OASIS Profile Generator shall render the related-node prefix (currently `"相关实体: "`) in English (e.g. `"Related entity: "`) in `_search_zep_for_entity`.
|
||||
2. The OASIS Profile Generator shall render the facts block heading (currently `"事实信息:"`) in English (e.g. `"Facts:"`) in `_search_zep_for_entity`.
|
||||
3. The OASIS Profile Generator shall render the related-entities block heading (currently `"相关实体:"`) in English (e.g. `"Related entities:"`) in `_search_zep_for_entity`.
|
||||
4. The OASIS Profile Generator shall render the entity-attributes section heading (currently `"### 实体属性"`) in English (e.g. `"### Entity attributes"`) in `_build_entity_context`.
|
||||
5. The OASIS Profile Generator shall render the related-facts/relationships section heading (currently `"### 相关事实和关系"`) in English (e.g. `"### Related facts and relationships"`) in `_build_entity_context`.
|
||||
6. The OASIS Profile Generator shall render the related-entity-information section heading (currently `"### 关联实体信息"`) in English (e.g. `"### Related entity information"`) in `_build_entity_context`.
|
||||
7. The OASIS Profile Generator shall render the Zep-retrieved facts section heading (currently `"### Zep检索到的事实信息"`) in English (e.g. `"### Facts retrieved from the graph"`) in `_build_entity_context`.
|
||||
8. The OASIS Profile Generator shall render the Zep-retrieved related-nodes section heading (currently `"### Zep检索到的相关节点"`) in English (e.g. `"### Related nodes retrieved from the graph"`) in `_build_entity_context`.
|
||||
9. The OASIS Profile Generator shall render the inline edge-direction placeholder (currently `(相关实体)`) in English (e.g. `(related entity)`) in both outgoing and incoming branches of `_build_entity_context`.
|
||||
10. The OASIS Profile Generator shall return zero CJK characters across all section-label string literals contributed by `_search_zep_for_entity` and `_build_entity_context`.
|
||||
|
||||
### Requirement 5: Public API and Call-Site Stability
|
||||
### Requirement 5: English Translation of the Fallback Persona Templates
|
||||
|
||||
**Objective:** As a developer maintaining the rest of the MiroFish backend pipeline, I want the public surface of `OasisProfileGenerator` and `OasisAgentProfile` to remain unchanged, so that the Step 2 environment-setup flow and existing callers continue to work without modification.
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, when the LLM JSON parse fails or returns missing fields and the code falls back to a synthesized persona template, I want the fallback persona to be in English so that the resulting profile JSON does not contain unintended Chinese strings.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall preserve the dataclass `OasisAgentProfile`, including its field set (`user_id`, `user_name`, `name`, `bio`, `persona`, `karma`, `friend_count`, `follower_count`, `statuses_count`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`, `source_entity_uuid`, `source_entity_type`, `created_at`), default values, and the `to_reddit_format`, `to_twitter_format`, `to_full_dict` serializers.
|
||||
2. The OASIS Profile Generator shall preserve the signatures and call semantics of `OasisProfileGenerator.__init__`, `generate_profile_from_entity`, `generate_profiles_for_entities`, `_call_llm_with_retry`, `_generate_profile_rule_based`, `_get_system_prompt`, `_build_individual_persona_prompt`, `_build_group_persona_prompt`, `_print_generated_profile`, `_fix_truncated_json`, `_try_fix_json`, and `_generate_username`.
|
||||
3. The OASIS Profile Generator shall preserve the LLM invocation parameters (`temperature`, `max_tokens`, model selection, retry behaviour) at the call sites that consume the prompts produced by the translated builders.
|
||||
4. The OASIS Profile Generator shall preserve the `PERSONAL_ENTITY_TYPES` and `GROUP_ENTITY_TYPES` taxonomies, the `MBTI_TYPES` list, and the `COMMON_COUNTRIES` list verbatim.
|
||||
1. The OASIS Profile Generator shall replace the fallback persona template `f"{entity_name}是一个{entity_type}。"` at every occurrence (currently at the persona-validation branch in `_generate_profile_with_llm` line 547, the regex-extraction branch in `_try_fix_json` line 644, and the catastrophic-failure branch line 659) with an English equivalent (e.g. `f"{entity_name} is a {entity_type}."`).
|
||||
2. The OASIS Profile Generator shall preserve the priority order of the fallback chain (`entity_summary or template`).
|
||||
3. The OASIS Profile Generator shall return zero CJK characters across all fallback persona literals.
|
||||
|
||||
### Requirement 6: Reasoning-Model Output Compatibility
|
||||
### Requirement 6: English Translation of the Console-Output Formatting
|
||||
|
||||
**Objective:** As a MiroFish operator using a reasoning-model provider (e.g. MiniMax, GLM with `<think>` tags or markdown code fences), I want JSON parsing of the persona response to continue working, so that translating the base prompt does not regress provider compatibility.
|
||||
**Objective:** As a MiroFish operator monitoring profile generation in the console under `Accept-Language: en`, I want the per-profile diagnostic banner and the start/end batch banners to be in English so that the entire console stream is consistent with the requested locale.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall preserve the existing `_fix_truncated_json` and `_try_fix_json` resilience helpers exactly, including their regex-based extraction of `bio` and `persona` from partial output.
|
||||
2. If a reasoning-model provider returns truncated, `<think>`-tagged, or markdown-fenced output, then the existing parsing/recovery flow shall continue to apply unchanged.
|
||||
3. The OASIS Profile Generator shall not introduce any new pre-processing of the LLM response that depends on prompt language.
|
||||
4. After translation, the OASIS Profile Generator shall continue to round-trip a representative entity through `generate_profile_from_entity` and produce a JSON object with at minimum a non-empty `bio` and a non-empty `persona`, matching the pre-change behaviour.
|
||||
1. The OASIS Profile Generator shall render the per-profile section headings in English in `_print_generated_profile`: `【简介】` → `[Bio]`, `【详细人设】` → `[Persona]`, `【基本属性】` → `[Basic attributes]` (or equivalent English markers).
|
||||
2. The OASIS Profile Generator shall render the per-profile row labels in English in `_print_generated_profile`: `用户名:` → `Username:`, `年龄:` → `Age:`, `性别:` → `Gender:`, `职业:` → `Profession:`, `国家:` → `Country:`, `兴趣话题:` → `Interested topics:`.
|
||||
3. The OASIS Profile Generator shall replace the empty-topics sentinel `'无'` in `_print_generated_profile` with an English equivalent (e.g. `'None'`).
|
||||
4. The OASIS Profile Generator shall render the start-of-batch and end-of-batch banners in `generate_profiles_from_entities` in English: `开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}` → `Generating agent profiles — {total} entities, parallel: {parallel_count}` (or equivalent); `人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent` → `Profile generation complete — produced {n} agents` (or equivalent).
|
||||
5. The OASIS Profile Generator shall preserve all f-string interpolations in the banners verbatim (`{total}`, `{parallel_count}`, the count expression).
|
||||
6. The OASIS Profile Generator shall return zero CJK characters across all string literals contributed by `_print_generated_profile` and the surrounding `print(...)` banners in `generate_profiles_from_entities`.
|
||||
7. The OASIS Profile Generator shall continue to use the existing `t('progress.profileGenerated', ...)` key for the per-profile heading row, since that key is already locale-keyed via the `t()` helper.
|
||||
|
||||
### Requirement 7: Step 2 Environment-Setup Parity (OASIS Format Compatibility)
|
||||
### Requirement 7: Locale Switching Continues to Work via `get_language_instruction()`
|
||||
|
||||
**Objective:** As a MiroFish operator validating the change, I want the OASIS subprocess to accept the generated profiles unchanged, so that the translation does not silently break Step 2 → Step 3 hand-off.
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: zh` (or any other configured non-English locale), I want the profile output to remain in the requested locale of equivalent quality, so that translating the base prompt does not regress non-English support.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. While `uv run python -m pytest backend/scripts/test_profile_format.py` runs against the changed code, the test suite shall pass with zero regressions versus the pre-change baseline.
|
||||
2. While a representative Reddit-format profile dictionary is produced under locale `en`, every field name shall match the existing OASIS-required schema: `user_id`, `username`, `name`, `bio`, `persona`, `karma`, `created_at`, plus optional `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`.
|
||||
3. While a representative Twitter-format profile dictionary is produced under locale `en`, every field name shall match the existing OASIS-required schema: `user_id`, `username`, `name`, `bio`, `persona`, `friend_count`, `follower_count`, `statuses_count`, `created_at`, plus optional `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`.
|
||||
4. The OASIS Profile Generator shall produce `gender` values that are exactly one of `"male"`, `"female"`, `"other"` regardless of locale, satisfying the OASIS subprocess's expected enum.
|
||||
1. The OASIS Profile Generator shall preserve the call to `get_language_instruction()` exactly at its existing locations (currently inside `_get_system_prompt` and inside both `_build_individual_persona_prompt` and `_build_group_persona_prompt` rules blocks), continuing to read locale via the existing thread-local / request-header resolution chain.
|
||||
2. When the locale is `zh`, the OASIS Profile Generator shall produce profile JSON whose `bio` and `persona` fields are in Chinese, equivalent in quality to the pre-change behaviour.
|
||||
3. When the locale is `en`, the OASIS Profile Generator shall produce profile JSON whose `bio` and `persona` fields are in English.
|
||||
4. The OASIS Profile Generator shall not alter `backend/app/utils/locale.py`, the `_languages` registry, the `_translations` registries, or the locales under `/locales/`.
|
||||
|
||||
### Requirement 8: Out-of-Scope Surfaces Remain Untouched
|
||||
### Requirement 8: Public API and Call-Site Stability
|
||||
|
||||
**Objective:** As a reviewer of this PR, I want the change to remain narrowly scoped to prompt strings, so that translation responsibilities for adjacent surfaces (issues #6, #7, and the rule-based fallback) are not absorbed into this change.
|
||||
**Objective:** As a developer maintaining the rest of the MiroFish backend pipeline, I want the public surface of `OasisProfileGenerator` to remain unchanged, so that the simulation pipeline and existing callers continue to work without modification.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The change shall not modify any `logger.warning(...)`, `logger.info(...)`, `logger.error(...)`, or `logger.debug(...)` call in `oasis_profile_generator.py` (covered by issue #6).
|
||||
2. The change shall not modify the module docstring, class docstrings, method docstrings, or inline comments in `oasis_profile_generator.py` (covered by issue #7).
|
||||
3. The change shall not modify the rule-based fallback Chinese fragments inside `_try_fix_json` (e.g. `f"{entity_name}是一个{entity_type}。"`) and the rule-based path inside `_generate_profile_rule_based` — those are runtime data fallbacks, not LLM prompts, and remain out of scope here.
|
||||
4. The change shall not edit any file outside `backend/app/services/oasis_profile_generator.py` for production code.
|
||||
5. The change shall not introduce a new dependency or modify `backend/pyproject.toml` / `backend/uv.lock`.
|
||||
6. The change shall not modify `backend/scripts/test_profile_format.py` (the test is the contract; the implementation must match it).
|
||||
1. The OASIS Profile Generator shall preserve the signatures of `OasisProfileGenerator.__init__`, `generate_profile_from_entity`, `generate_profiles_from_entities`, `set_graph_id`, `save_profiles`, and `save_profiles_to_json`.
|
||||
2. The OASIS Profile Generator shall preserve the signatures of all private helpers, including `_generate_profile_with_llm`, `_build_individual_persona_prompt`, `_build_group_persona_prompt`, `_get_system_prompt`, `_build_entity_context`, `_search_zep_for_entity`, `_print_generated_profile`, `_normalize_gender`, `_save_twitter_csv`, `_save_reddit_json`, `_try_fix_json`, `_fix_truncated_json`, `_is_individual_entity`, `_is_group_entity`, `_generate_profile_rule_based`, `_generate_username`.
|
||||
3. The OASIS Profile Generator shall preserve the return shape of `generate_profile_from_entity` (a populated `OasisAgentProfile` dataclass instance) and `generate_profiles_from_entities` (a `List[OasisAgentProfile]`).
|
||||
4. The OASIS Profile Generator shall preserve the LLM invocation parameters (`response_format={"type": "json_object"}`, the `temperature=0.7 - (attempt * 0.1)` schedule, the absence of `max_tokens`) and the call to `self.client.chat.completions.create(...)`.
|
||||
5. The OASIS Profile Generator shall preserve the `_normalize_gender` mapping table verbatim (the Chinese keys `男`, `女`, `机构`, `其他` continue to accept upstream Chinese input).
|
||||
6. The OASIS Profile Generator shall preserve the rule-based `country: "中国"` default in `_generate_profile_rule_based` (this is a data value, not a prompt; changing it is out of scope per the boundary commitments).
|
||||
|
||||
### Requirement 9: Reasoning-Model Output Compatibility
|
||||
|
||||
**Objective:** As a MiroFish operator using a reasoning-model provider (e.g. MiniMax, GLM with `<think>` tags or markdown code fences), I want JSON parsing of the profile response to continue working, so that translating the base prompt does not regress provider compatibility.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The OASIS Profile Generator shall continue to call `self.client.chat.completions.create(...)` with `response_format={"type": "json_object"}` and parse the response via the existing `json.loads` / `_try_fix_json` / `_fix_truncated_json` chain unchanged.
|
||||
2. The OASIS Profile Generator shall not introduce any new pre-processing of the LLM response that depends on prompt language.
|
||||
3. The fallback persona templates from Requirement 5 shall be safe to embed in JSON (no embedded raw newlines, balanced quotes).
|
||||
|
||||
### Requirement 10: Out-of-Scope Surfaces Remain Untouched
|
||||
|
||||
**Objective:** As a reviewer of this PR, I want the change to remain narrowly scoped to prompt strings and the immediately-adjacent context labels and console output, so that translation responsibilities for adjacent surfaces (issues #6 and #7) are not absorbed into this change.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The change shall not modify any `logger.warning(...)`, `logger.info(...)`, `logger.error(...)`, or `logger.debug(...)` call in `oasis_profile_generator.py` (covered by issues #6 / #24 / #25-style backend-log work — the calls already use `t("log.profile_generator.*")`).
|
||||
2. The change shall not modify the module docstring, class docstrings, method docstrings, or inline comments in `oasis_profile_generator.py` (covered by issue #7) — including the inline comments at lines 65, 93, 641, 804–807, 816–819, etc.
|
||||
3. The change shall not modify the `_normalize_gender` mapping table (Chinese gender keys must remain to handle upstream input).
|
||||
4. The change shall not modify the rule-based `country: "中国"` default in `_generate_profile_rule_based`.
|
||||
5. The change shall not modify the `ValueError("LLM_API_KEY 未配置")` raise (covered by issue #6).
|
||||
6. The change shall not edit any file outside `backend/app/services/oasis_profile_generator.py` for production code, except for adding test fixtures or scripts under a clearly-isolated directory if a verification harness is needed.
|
||||
7. The change shall not introduce a new dependency or modify `backend/pyproject.toml` / `backend/uv.lock`.
|
||||
|
|
|
|||
|
|
@ -1,10 +1,9 @@
|
|||
{
|
||||
"feature_name": "i18n-oasis-profile-generator-prompts",
|
||||
"created_at": "2026-05-08T05:26:06Z",
|
||||
"updated_at": "2026-05-08T05:30:00Z",
|
||||
"created_at": "2026-05-07T22:50:00Z",
|
||||
"updated_at": "2026-05-07T22:50:00Z",
|
||||
"language": "en",
|
||||
"phase": "tasks-generated",
|
||||
"ticket": 3,
|
||||
"approvals": {
|
||||
"requirements": {
|
||||
"generated": true,
|
||||
|
|
@ -19,5 +18,10 @@
|
|||
"approved": true
|
||||
}
|
||||
},
|
||||
"ready_for_implementation": true
|
||||
"ready_for_implementation": true,
|
||||
"ticket": {
|
||||
"number": 25,
|
||||
"url": "https://github.com/salestech-group/MiroFish/issues/25",
|
||||
"snapshot": ".ticket/25.md"
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,66 +1,92 @@
|
|||
# Implementation Plan
|
||||
|
||||
- [x] 1. Translate the system-prompt builder to English
|
||||
- Replace the Chinese `base_prompt` literal inside `_get_system_prompt` (currently `"你是社交媒体用户画像生成专家。…"` at line ~664) with an English rendering that conveys the same role and intent: identifies the model as an expert in social-media user-persona generation, asks for detailed and realistic personas suitable for opinion-simulation that faithfully reflect existing real-world conditions, mandates valid JSON output, and forbids unescaped newlines inside string values
|
||||
- Preserve the assembled return shape `f"{base_prompt}\n\n{get_language_instruction()}"` exactly — the call to `get_language_instruction()` is unchanged in name and position
|
||||
- Preserve the method signature `_get_system_prompt(self, is_individual: bool) -> str`; do not branch on `is_individual` (current behaviour preserved)
|
||||
- Observable completion: `_get_system_prompt(True)` and `_get_system_prompt(False)` both return non-empty English strings ending with the per-locale postfix from `get_language_instruction()`; the `base_prompt` body contains zero CJK characters
|
||||
- [ ] 1. Translate the system-prompt base string in `_get_system_prompt`
|
||||
- Replace the body of `base_prompt` (currently `"你是社交媒体用户画像生成专家。生成详细、真实的人设用于舆论模拟,最大程度还原已有现实情况。必须返回有效的JSON格式,所有字符串值不能包含未转义的换行符。"`) with an English equivalent that preserves the same intent: define the LLM as an expert social-media-persona generator; require detailed, realistic personas grounded in supplied context; require valid JSON output; forbid unescaped newlines in string values
|
||||
- Preserve the trailing `f"{base_prompt}\n\n{get_language_instruction()}"` concatenation site exactly
|
||||
- Preserve the `is_individual` parameter (still accepted, still unused — no signature change)
|
||||
- Observable completion: `_get_system_prompt(...)` returns an English-only base prompt followed by the locale-appropriate `get_language_instruction()` postfix
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4_
|
||||
|
||||
- [x] 2. Translate the individual-persona user-message builder to English
|
||||
- Replace the Chinese f-string body inside `_build_individual_persona_prompt` (currently lines ~680–714) with an English rendering structured as: a lead sentence requesting a detailed social-media persona faithful to existing reality; an entity-context block with English labels for `entity_name`, `entity_type`, `entity_summary`, `entity_attributes`; a `Context information:` block; a `Generate JSON with the following fields:` enumeration of the eight output keys (`bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`); and a trailing `Important:` rules block
|
||||
- Translate the field-level descriptions verbatim in spirit: `bio` ≈ 200 chars; `persona` ≈ 2000 chars covering basic info (age, profession, education, location), background (notable experience, event association, social ties), personality (MBTI, core traits, emotional expression), social-media behaviour (posting frequency, content preferences, interaction style, language traits), stance (attitudes toward the topic, emotional triggers), unique features (catchphrases, special experiences, hobbies), and personal memory (the entity's relation to the event and prior actions/reactions); `age` integer; `gender` MUST be the literal `"male"` or `"female"`; `mbti` four-letter type; `country` country name; `profession`; `interested_topics` array
|
||||
- Translate the trailing rules block to English while keeping every locale-independent constraint intact: all values are strings or numbers; `persona` is a single coherent text without unescaped newlines; the inline `{get_language_instruction()}` call remains followed by the parenthetical reminder that `gender` MUST use the English values `"male"` / `"female"`; content stays consistent with the entity; `age` MUST be a valid integer
|
||||
- Replace the `attrs_str` and `context_str` Chinese fallback defaults with English: `"无"` → `"None"` (used when `entity_attributes` is empty/falsy) and `"无额外上下文"` → `"No additional context"` (used when `context` is empty/falsy)
|
||||
- Drop the country-language hint `(使用中文,如"中国")` so `get_language_instruction()` steers the country language; preserve the country line as a neutral `country: country name` entry
|
||||
- [ ] 2. Translate the individual-persona user-message template in `_build_individual_persona_prompt`
|
||||
- Replace the introductory line (`"为实体生成详细的社交媒体用户人设,..."`) with an English equivalent
|
||||
- Replace the field-label rows (`实体名称`, `实体类型`, `实体摘要`, `实体属性`, `上下文信息`) with English equivalents
|
||||
- Replace the `请生成JSON,包含以下字段:` enumeration block with an English equivalent that preserves the eight required output keys verbatim by name (`bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`)
|
||||
- Translate the per-field guidance: `bio` is a 200-character social-media bio; `persona` is a coherent ~2000-character text containing basic info, background, personality (with MBTI), social-media behavior, stance, distinctive traits, and event-specific memories; `age` must be an integer; `gender` must be the literal English token `"male"` or `"female"`; `mbti` is an MBTI four-letter code; `country` is a free-form country name; `profession` is a free-form occupation; `interested_topics` is a list of topics
|
||||
- Replace the trailing `重要:` rules block with an English equivalent: all field values must be strings or numbers, no embedded newlines; persona must be a coherent single text block; `gender` must use English `male`/`female`; content must remain consistent with the entity information; `age` must be a valid integer
|
||||
- Preserve the call to `get_language_instruction()` interpolated into the rules block
|
||||
- Replace the `attrs_str` no-attributes placeholder `"无"` with `"None"` (or English equivalent) at line 677
|
||||
- Replace the `context_str` no-context placeholder `"无额外上下文"` with `"No additional context"` (or English equivalent) at line 678
|
||||
- Preserve every f-string interpolation by name and position: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}`
|
||||
- Preserve the `context[:3000]` truncation behaviour and the method signature `_build_individual_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`
|
||||
- Observable completion: calling `_build_individual_persona_prompt("Alice", "Student", "summary", {"k": "v"}, "ctx")` returns a non-empty English string with all six interpolations resolved, with zero CJK characters in any literal contributed by this method, and the string contains the `gender` enum lock-in `"male"` / `"female"` exactly once
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 4.1, 4.5_
|
||||
- Observable completion: `_build_individual_persona_prompt(...)` produces an English-only message body for any input combination, with zero CJK characters in any string literal it contributes; under the same inputs as before, all interpolated values still appear in the rendered output
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9_
|
||||
|
||||
- [x] 3. Translate the group/institution-persona user-message builder to English
|
||||
- Replace the Chinese f-string body inside `_build_group_persona_prompt` (currently lines ~729–762) with an English rendering structured the same way as Task 2 but adapted for institutional voice: lead sentence requesting a detailed social-media account profile for an institution/group faithful to existing reality; entity-context block; `Context information:` block; `Generate JSON with the following fields:` enumeration of the eight output keys; trailing `Important:` rules block
|
||||
- Translate the field-level descriptions verbatim in spirit: `bio` ≈ 200 chars in an official-account voice; `persona` ≈ 2000 chars covering institutional basics (formal name, type, founding background, primary functions), account positioning (account type, target audience, core function), voice (language traits, common phrasing, taboo topics), publishing pattern (content types, publishing frequency, active hours), stance (official position on the core topic, controversy-handling style), special notes (group portrait represented, operational habits), and institutional memory (the institution's relation to the event and prior actions/reactions); `age` MUST be the integer `30`; `gender` MUST be the literal `"other"`; `mbti` four-letter type characterizing account voice; `country`; `profession` describes institutional function; `interested_topics` array
|
||||
- Translate the trailing rules block to English while keeping every locale-independent constraint intact: all values are strings or numbers, no `null` allowed; `persona` is a single coherent text without unescaped newlines; the inline `{get_language_instruction()}` call remains followed by the parenthetical reminder that `gender` MUST use the English value `"other"`; `age` MUST be the integer `30` and `gender` MUST be the string `"other"`; account voice must match identity positioning
|
||||
- Replace the `attrs_str` and `context_str` Chinese fallback defaults with the same English replacements applied in Task 2 (`"None"` and `"No additional context"`)
|
||||
- Drop the country-language hint as in Task 2
|
||||
- Preserve every f-string interpolation by name and position: `{entity_name}`, `{entity_type}`, `{entity_summary}`, `{attrs_str}`, `{context_str}`, `{get_language_instruction()}`
|
||||
- Preserve the `context[:3000]` truncation behaviour and the method signature `_build_group_persona_prompt(self, entity_name: str, entity_type: str, entity_summary: str, entity_attributes: Dict[str, Any], context: str) -> str`
|
||||
- Observable completion: calling `_build_group_persona_prompt("ACME Corp", "Organization", "summary", {"k": "v"}, "ctx")` returns a non-empty English string with all six interpolations resolved, with zero CJK characters in any literal contributed by this method, and the string contains both the `age == 30` lock-in and the `gender == "other"` lock-in
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.1, 4.5_
|
||||
- [ ] 3. Translate the group-persona user-message template in `_build_group_persona_prompt`
|
||||
- Replace the introductory line (`"为机构/群体实体生成详细的社交媒体账号设定,..."`) with an English equivalent
|
||||
- Replace the field-label rows (`实体名称`, `实体类型`, `实体摘要`, `实体属性`, `上下文信息`) with English equivalents (matching task 2)
|
||||
- Replace the `请生成JSON,包含以下字段:` enumeration block with an English equivalent that preserves the eight required output keys verbatim by name (`bio`, `persona`, `age`, `gender`, `mbti`, `country`, `profession`, `interested_topics`)
|
||||
- Translate the per-field guidance: `bio` is a polished ~200-character official-account bio; `persona` is a coherent ~2000-character text covering institutional background, account positioning, voice, content patterns, official stance, distinctive traits, and event-specific memories; `age` must be the integer literal `30`; `gender` must be the literal English token `"other"`; `mbti` describes account voice; `country` is a free-form country name; `profession` is the institution's role; `interested_topics` is a list of focus areas
|
||||
- Replace the trailing `重要:` rules block with an English equivalent: all field values must be strings or numbers (no nulls); persona must be a coherent single text block (no embedded newlines); `gender` must use English `"other"`; `age` must be the integer `30`; the institutional account's voice must match its identity
|
||||
- Preserve the call to `get_language_instruction()` interpolated into the rules block
|
||||
- Replace the `attrs_str` and `context_str` placeholders the same way as in task 2 (lines 726, 727)
|
||||
- Preserve every f-string interpolation by name and position
|
||||
- Observable completion: `_build_group_persona_prompt(...)` produces an English-only message body for any input combination, with zero CJK characters; under the same inputs as before, all interpolated values still appear in the rendered output
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9_
|
||||
|
||||
- [x] 4. Confirm boundary commitments around the translation
|
||||
- Confirm every existing `get_language_instruction()` call site is preserved verbatim: the system-prompt assembly inside `_get_system_prompt`, the inline call inside the trailing rules block of `_build_individual_persona_prompt`, and the inline call inside the trailing rules block of `_build_group_persona_prompt`
|
||||
- Confirm the locale-thread plumbing in `generate_profiles_for_entities` (capture `current_locale = get_locale()` at line ~910 and `set_locale(current_locale)` inside the worker at line ~914) is byte-identical
|
||||
- Confirm the public signatures of `OasisProfileGenerator.__init__`, `generate_profile_from_entity`, `generate_profiles_for_entities`, `set_graph_id`, and the private helpers `_call_llm_with_retry`, `_generate_profile_rule_based`, `_print_generated_profile`, `_fix_truncated_json`, `_try_fix_json`, `_save_twitter_csv`, `_save_reddit_json`, `_generate_username` are unchanged
|
||||
- Confirm the `OasisAgentProfile` dataclass field set, default values, and the `to_reddit_format`, `to_twitter_format`, `to_full_dict` serializers are unchanged
|
||||
- Confirm class constants `MBTI_TYPES`, `COUNTRIES`, `INDIVIDUAL_ENTITY_TYPES`, `GROUP_ENTITY_TYPES` are unchanged
|
||||
- Confirm the LLM invocation parameters at the call site that consumes the translated prompts (`response_format={"type": "json_object"}`, `temperature=0.7 - (attempt * 0.1)`, `max_attempts=3`) are unchanged
|
||||
- Confirm `_fix_truncated_json` and `_try_fix_json` (including their Chinese persona fragments such as `f"{entity_name}是一个{entity_type}。"`) are not modified — these are runtime data fallbacks, not prompts, and are out of scope
|
||||
- Confirm `_generate_profile_rule_based` is not modified — including its Chinese country defaults `"中国"` at lines ~807 and ~819
|
||||
- Confirm `backend/app/utils/locale.py`, `/locales/languages.json`, `/locales/en.json`, and `/locales/zh.json` are not modified
|
||||
- Confirm `logger.warning(...)`, `logger.info(...)`, `logger.error(...)`, the print banner at line ~945, module / class / method docstrings, and inline comments in `oasis_profile_generator.py` are not modified (owned by issues #6 and #7)
|
||||
- Confirm `backend/scripts/test_profile_format.py`, `backend/pyproject.toml`, `backend/uv.lock`, and any file outside `backend/app/services/oasis_profile_generator.py` are not modified
|
||||
- Observable completion: a `git diff` review against `main` shows changes only inside `backend/app/services/oasis_profile_generator.py`, only inside `_get_system_prompt`, `_build_individual_persona_prompt`, `_build_group_persona_prompt`, and the surrounding lines (method headers, neighbouring methods) are byte-identical
|
||||
- _Requirements: 1.4, 2.6, 3.6, 4.1, 4.2, 4.6, 5.1, 5.2, 5.3, 5.4, 6.1, 6.3, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_
|
||||
- [ ] 4. Translate the section labels in `_search_zep_for_entity` and `_build_entity_context`
|
||||
- Replace the related-node prefix `f"相关实体: {node.name}"` with an English equivalent (e.g. `f"Related entity: {node.name}"`) at line 384
|
||||
- Replace the facts block heading `"事实信息:\n"` with `"Facts:\n"` (or equivalent) at line 390
|
||||
- Replace the related-entities block heading `"相关实体:\n"` with `"Related entities:\n"` (or equivalent) at line 392
|
||||
- Replace the entity-attributes section heading `"### 实体属性\n"` with `"### Entity attributes\n"` (or equivalent) at line 422
|
||||
- Replace the inline edge-direction placeholder `(相关实体)` with `(related entity)` (or equivalent) at lines 438 and 440 (both outgoing and incoming branches)
|
||||
- Replace the related-facts/relationships section heading `"### 相关事实和关系\n"` with `"### Related facts and relationships\n"` (or equivalent) at line 443
|
||||
- Replace the related-entity-information section heading `"### 关联实体信息\n"` with `"### Related entity information\n"` (or equivalent) at line 463
|
||||
- Replace the Zep-retrieved facts section heading `"### Zep检索到的事实信息\n"` with `"### Facts retrieved from the graph\n"` (or equivalent) at line 472
|
||||
- Replace the Zep-retrieved related-nodes section heading `"### Zep检索到的相关节点\n"` with `"### Related nodes retrieved from the graph\n"` (or equivalent) at line 475
|
||||
- Preserve the structure (heading + bulleted body, joined by `"\n".join(...)`)
|
||||
- Observable completion: the context string returned by `_build_entity_context(...)` contains zero CJK characters in section labels for any input
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 4.10_
|
||||
|
||||
- [x] 5. Verify smoke import and OASIS profile-format pytest
|
||||
- Run `cd backend && uv run python -c "from app.services.oasis_profile_generator import OasisProfileGenerator, OasisAgentProfile"` and confirm it exits 0 (catches f-string syntax errors)
|
||||
- Run `cd backend && uv run python -m pytest backend/scripts/test_profile_format.py` (or equivalent invocation per project convention) and confirm it passes — the test does not exercise prompts, so a pure-translation diff must keep it green
|
||||
- Construct an instance of `OasisProfileGenerator` (using `OasisProfileGenerator.__new__(OasisProfileGenerator)` to skip `__init__` if the LLM key is unavailable, mirroring the pattern in `test_profile_format.py`) and confirm `_get_system_prompt(True)`, `_build_individual_persona_prompt("Alice", "Student", "summary", {"k": "v"}, "ctx")`, and `_build_group_persona_prompt("ACME", "Organization", "summary", {"k": "v"}, "ctx")` each return a string with zero CJK matches against the regex `[一-鿿]`
|
||||
- Observable completion: smoke import exits 0; pytest passes with zero regressions; the three prompt-builder calls each produce English-only output under the default `zh` locale (the `get_language_instruction()` postfix at the end is the only place where Chinese is allowed to appear, and only when locale is `zh`)
|
||||
- _Requirements: 6.4, 7.1, 7.2, 7.3, 7.4_
|
||||
- [ ] 5. Translate the fallback persona templates
|
||||
- Replace `f"{entity_name}是一个{entity_type}。"` with `f"{entity_name} is a {entity_type}."` (or equivalent) at line 547 (`_generate_profile_with_llm`, missing-persona branch)
|
||||
- Replace the same template at line 644 (`_try_fix_json`, regex-extraction branch)
|
||||
- Replace the same template at line 659 (`_try_fix_json`, catastrophic-failure branch)
|
||||
- Preserve the `entity_summary or template` priority order at every site
|
||||
- Observable completion: when the LLM fails JSON parse and the fallback template is invoked, the resulting `persona` value is English
|
||||
- _Requirements: 5.1, 5.2, 5.3_
|
||||
|
||||
- [x] 6. Verify locale-driven output language under both `en` and `zh`
|
||||
- With the thread-local locale forced via `set_locale("en")`, render each of the three builders against representative inputs and confirm: each output contains zero CJK characters; each ends with the English locale postfix `"Please respond in English."`; the `gender` enum constraint appears as English `"male"` / `"female"` (individual) or `"other"` (group)
|
||||
- With `set_locale("zh")`, render the same three builders and confirm: the per-prompt body remains English-only (the translated base prompt does not depend on locale); each ends with the Chinese locale postfix `"请使用中文回答。"`; the `gender` enum constraint still appears as the English literal values
|
||||
- Optionally, with a configured LLM key, run `OasisProfileGenerator().generate_profile_from_entity(...)` end-to-end under each locale against a synthetic `EntityNode` and spot-check that the produced `bio`, `persona`, `profession` are English under `en` and Chinese under `zh`, while `gender` is one of the three English enum literals under both
|
||||
- Observable completion: the locale-`en` rendering is CJK-free in the prompt body and ends with the English locale postfix; the locale-`zh` rendering preserves the prompt body in English and ends with the Chinese locale postfix; if the LLM round-trip is exercised, results are recorded in the PR description
|
||||
- _Requirements: 4.3, 4.4, 4.5_
|
||||
- [ ] 6. Translate the console-output formatting in `_print_generated_profile` and the surrounding banners
|
||||
- Replace the section headings in `_print_generated_profile`: `f"【简介】"` → English equivalent (e.g. `"[Bio]"`), `f"【详细人设】"` → English equivalent (e.g. `"[Persona]"`), `f"【基本属性】"` → English equivalent (e.g. `"[Basic attributes]"`)
|
||||
- Replace the row labels in `_print_generated_profile`: `f"用户名:"` → `f"Username: {profile.user_name}"`, `f"年龄: {profile.age} | 性别: {profile.gender} | MBTI: {profile.mbti}"` → `f"Age: {profile.age} | Gender: {profile.gender} | MBTI: {profile.mbti}"`, `f"职业: {profile.profession} | 国家: {profile.country}"` → `f"Profession: {profile.profession} | Country: {profile.country}"`, `f"兴趣话题: {topics_str}"` → `f"Interested topics: {topics_str}"`
|
||||
- Replace the empty-topics sentinel `'无'` with `'None'` (or equivalent) at line 1011
|
||||
- Replace the start-of-batch banner in `generate_profiles_from_entities` (currently `f"开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}"` at line 945) with an English equivalent (e.g. `f"Generating agent profiles — {total} entities, parallel: {parallel_count}"`)
|
||||
- Replace the end-of-batch banner (currently `f"人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent"` at line 1001) with an English equivalent (e.g. `f"Profile generation complete — produced {len([p for p in profiles if p])} agents"`)
|
||||
- Preserve all f-string interpolations
|
||||
- Preserve the existing `t('progress.profileGenerated', name=entity_name, type=entity_type)` call (already locale-keyed)
|
||||
- Observable completion: the console output stream contains zero CJK characters in literals contributed by `_print_generated_profile` and the two batch banners (the entity name itself may still contain CJK because it is data, not a literal)
|
||||
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7_
|
||||
|
||||
- [x] 7. Final CJK regression sweep on the three builders
|
||||
- Run a regex audit limited to the three method bodies (`_get_system_prompt`, `_build_individual_persona_prompt`, `_build_group_persona_prompt`) using the project-level CJK guard regex (`[一-鿿]`) and confirm zero matches inside their string literals
|
||||
- Run a CJK audit on the rendered output of the three builders for representative inputs and confirm zero matches in the prompt body (the locale postfix is excluded — its Chinese form is a deliberate kept use under `zh`)
|
||||
- Confirm the file-level `git grep -nE '[\\x{4e00}-\\x{9fff}]' -- backend/app/services/oasis_profile_generator.py` output still flags only known out-of-scope locations: docstrings, comments, logger keys, rule-based fallback country `"中国"` defaults, and resilience-helper Chinese fragments — and does not flag any line inside the three translated method bodies
|
||||
- Observable completion: the targeted regex audit returns zero matches inside the three method bodies; the file-level audit's residual CJK lines all fall outside the three method bodies and match the out-of-scope inventory in `design.md` § Boundary Commitments → Out of Boundary
|
||||
- _Requirements: 1.1, 2.8, 3.8, 8.1, 8.2, 8.3_
|
||||
- [ ] 7. Confirm boundary commitments around the translation
|
||||
- Confirm `logger.warning(...)`, `logger.info(...)`, `logger.error(...)`, `logger.debug(...)` calls and their `t("log.profile_generator.*")` keys in this file are unchanged
|
||||
- Confirm the module/class/method docstrings and inline comments are unchanged (including lines 65, 93, 641, 804–807, 816–819)
|
||||
- Confirm `_normalize_gender` mapping table (Chinese keys `男`/`女`/`机构`/`其他`) is unchanged
|
||||
- Confirm the rule-based `country: "中国"` default at lines 807, 819 is unchanged
|
||||
- Confirm the `ValueError("LLM_API_KEY 未配置")` raise at line 194 is unchanged
|
||||
- Confirm public signatures (`__init__`, `generate_profile_from_entity`, `generate_profiles_from_entities`, `set_graph_id`, `save_profiles`, `save_profiles_to_json`) and private helper signatures are unchanged
|
||||
- Confirm the `OasisAgentProfile` dataclass schema is unchanged
|
||||
- Confirm the LLM call (`response_format={"type": "json_object"}`, `temperature=0.7 - (attempt * 0.1)`, no `max_tokens`) is unchanged
|
||||
- Confirm `backend/app/utils/locale.py`, `/locales/languages.json`, `/locales/en.json`, `/locales/zh.json` are not modified
|
||||
- Confirm `backend/pyproject.toml`, `backend/uv.lock`, and any file outside `backend/app/services/oasis_profile_generator.py` are not modified
|
||||
- Observable completion: a `git diff` review against `main` shows changes only inside `backend/app/services/oasis_profile_generator.py`, only inside the seven owned regions
|
||||
- _Requirements: 7.1, 7.4, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7_
|
||||
|
||||
- [ ] 8. Verify CJK-free invariant in the seven owned regions
|
||||
- Run a one-shot script that imports `OasisProfileGenerator`, calls `_build_individual_persona_prompt(...)`, `_build_group_persona_prompt(...)`, `_get_system_prompt(...)`, and `_build_entity_context(...)` with representative inputs that contain no CJK in the inputs themselves, and asserts the rendered output contains zero matches against the regex `[一-鿿]`
|
||||
- Manually inspect the seven owned regions in the patched file with a CJK regex (`grep -nP '[\x{4e00}-\x{9fff}]'`) and confirm there are no remaining matches inside the owned regions
|
||||
- Observable completion: the inspection passes; if it fails, fix the offending region and re-run before completing this task
|
||||
- _Requirements: 1.1, 2.8, 3.8, 4.10, 5.3, 6.6_
|
||||
|
||||
- [ ] 9. Verify locale-driven output language under both `en` and `zh`
|
||||
- Set the thread-local locale to `en` via `set_locale("en")`, run `OasisProfileGenerator().generate_profile_from_entity(...)` against the configured LLM with a small representative entity, and confirm the returned `bio` and `persona` are in English
|
||||
- Set the thread-local locale to `zh` via `set_locale("zh")`, run the same round-trip, and confirm the returned `bio` and `persona` are in Chinese, equivalent in quality to the pre-change baseline
|
||||
- Observable completion: both runs succeed; the `en` run is CJK-free in `bio` and `persona`; the `zh` run continues to produce Chinese; results recorded in the PR description
|
||||
- _Requirements: 7.2, 7.3_
|
||||
|
|
|
|||
|
|
@ -374,15 +374,15 @@ class OasisProfileGenerator:
|
|||
if hasattr(node, 'summary') and node.summary:
|
||||
all_summaries.add(node.summary)
|
||||
if hasattr(node, 'name') and node.name and node.name != entity_name:
|
||||
all_summaries.add(f"相关实体: {node.name}")
|
||||
all_summaries.add(f"Related entity: {node.name}")
|
||||
results["node_summaries"] = list(all_summaries)
|
||||
|
||||
# Assemble the combined context block.
|
||||
context_parts = []
|
||||
if results["facts"]:
|
||||
context_parts.append("事实信息:\n" + "\n".join(f"- {f}" for f in results["facts"][:20]))
|
||||
context_parts.append("Facts:\n" + "\n".join(f"- {f}" for f in results["facts"][:20]))
|
||||
if results["node_summaries"]:
|
||||
context_parts.append("相关实体:\n" + "\n".join(f"- {s}" for s in results["node_summaries"][:10]))
|
||||
context_parts.append("Related entities:\n" + "\n".join(f"- {s}" for s in results["node_summaries"][:10]))
|
||||
results["context"] = "\n\n".join(context_parts)
|
||||
|
||||
logger.info(t("log.profile_generator.m006", entity_name=entity_name, len=len(results['facts']), len_2=len(results['node_summaries'])))
|
||||
|
|
@ -411,7 +411,7 @@ class OasisProfileGenerator:
|
|||
if value and str(value).strip():
|
||||
attrs.append(f"- {key}: {value}")
|
||||
if attrs:
|
||||
context_parts.append("### 实体属性\n" + "\n".join(attrs))
|
||||
context_parts.append("### Entity attributes\n" + "\n".join(attrs))
|
||||
|
||||
# 2. Related edges (facts / relationships).
|
||||
existing_facts = set()
|
||||
|
|
@ -427,12 +427,12 @@ class OasisProfileGenerator:
|
|||
existing_facts.add(fact)
|
||||
elif edge_name:
|
||||
if direction == "outgoing":
|
||||
relationships.append(f"- {entity.name} --[{edge_name}]--> (相关实体)")
|
||||
relationships.append(f"- {entity.name} --[{edge_name}]--> (related entity)")
|
||||
else:
|
||||
relationships.append(f"- (相关实体) --[{edge_name}]--> {entity.name}")
|
||||
|
||||
relationships.append(f"- (related entity) --[{edge_name}]--> {entity.name}")
|
||||
|
||||
if relationships:
|
||||
context_parts.append("### 相关事实和关系\n" + "\n".join(relationships))
|
||||
context_parts.append("### Related facts and relationships\n" + "\n".join(relationships))
|
||||
|
||||
# 3. Detailed information for related nodes.
|
||||
if entity.related_nodes:
|
||||
|
|
@ -452,7 +452,7 @@ class OasisProfileGenerator:
|
|||
related_info.append(f"- **{node_name}**{label_str}")
|
||||
|
||||
if related_info:
|
||||
context_parts.append("### 关联实体信息\n" + "\n".join(related_info))
|
||||
context_parts.append("### Related entity information\n" + "\n".join(related_info))
|
||||
|
||||
# 4. Augment with Zep hybrid retrieval.
|
||||
zep_results = self._search_zep_for_entity(entity)
|
||||
|
|
@ -461,10 +461,10 @@ class OasisProfileGenerator:
|
|||
# Deduplicate against already-known facts.
|
||||
new_facts = [f for f in zep_results["facts"] if f not in existing_facts]
|
||||
if new_facts:
|
||||
context_parts.append("### Zep检索到的事实信息\n" + "\n".join(f"- {f}" for f in new_facts[:15]))
|
||||
|
||||
context_parts.append("### Facts retrieved from the graph\n" + "\n".join(f"- {f}" for f in new_facts[:15]))
|
||||
|
||||
if zep_results.get("node_summaries"):
|
||||
context_parts.append("### Zep检索到的相关节点\n" + "\n".join(f"- {s}" for s in zep_results["node_summaries"][:10]))
|
||||
context_parts.append("### Related nodes retrieved from the graph\n" + "\n".join(f"- {s}" for s in zep_results["node_summaries"][:10]))
|
||||
|
||||
return "\n\n".join(context_parts)
|
||||
|
||||
|
|
@ -535,7 +535,7 @@ class OasisProfileGenerator:
|
|||
if "bio" not in result or not result["bio"]:
|
||||
result["bio"] = entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}"
|
||||
if "persona" not in result or not result["persona"]:
|
||||
result["persona"] = entity_summary or f"{entity_name}是一个{entity_type}。"
|
||||
result["persona"] = entity_summary or f"{entity_name} is a {entity_type}."
|
||||
|
||||
return result
|
||||
|
||||
|
|
@ -631,7 +631,7 @@ class OasisProfileGenerator:
|
|||
persona_match = re.search(r'"persona"\s*:\s*"([^"]*)', content) # May be truncated.
|
||||
|
||||
bio = bio_match.group(1) if bio_match else (entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}")
|
||||
persona = persona_match.group(1) if persona_match else (entity_summary or f"{entity_name}是一个{entity_type}。")
|
||||
persona = persona_match.group(1) if persona_match else (entity_summary or f"{entity_name} is a {entity_type}.")
|
||||
|
||||
# If we recovered something meaningful, mark the result as fixed.
|
||||
if bio_match or persona_match:
|
||||
|
|
@ -646,12 +646,12 @@ class OasisProfileGenerator:
|
|||
logger.warning(t("log.profile_generator.m014"))
|
||||
return {
|
||||
"bio": entity_summary[:200] if entity_summary else f"{entity_type}: {entity_name}",
|
||||
"persona": entity_summary or f"{entity_name}是一个{entity_type}。"
|
||||
"persona": entity_summary or f"{entity_name} is a {entity_type}."
|
||||
}
|
||||
|
||||
def _get_system_prompt(self, is_individual: bool) -> str:
|
||||
"""Return the system prompt for persona generation."""
|
||||
base_prompt = "You are an expert in social-media user-persona generation. Produce detailed, realistic personas for opinion simulation that faithfully reflect existing real-world conditions. You MUST return valid JSON; no string value may contain unescaped newlines."
|
||||
base_prompt = "You are an expert at generating social-media user personas. Produce detailed, realistic personas for opinion-simulation, faithfully grounded in the supplied real-world context. You MUST return valid JSON; no string value may contain unescaped newline characters."
|
||||
return f"{base_prompt}\n\n{get_language_instruction()}"
|
||||
|
||||
def _build_individual_persona_prompt(
|
||||
|
|
@ -667,40 +667,41 @@ class OasisProfileGenerator:
|
|||
attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else "None"
|
||||
context_str = context[:3000] if context else "No additional context"
|
||||
|
||||
return f"""Generate a detailed social-media user persona for the entity, faithfully reflecting existing real-world conditions.
|
||||
return f"""Generate a detailed social-media user persona for an entity, faithfully grounded in the supplied real-world context.
|
||||
|
||||
|
||||
Entity name: {entity_name}
|
||||
Entity type: {entity_type}
|
||||
Entity summary: {entity_summary}
|
||||
Entity attributes: {attrs_str}
|
||||
|
||||
Context information:
|
||||
Context:
|
||||
{context_str}
|
||||
|
||||
Generate JSON with the following fields:
|
||||
Produce a JSON object with the following fields:
|
||||
|
||||
1. bio: social-media biography, ~200 characters
|
||||
2. persona: detailed persona description (~2000 characters of plain text), covering:
|
||||
- Basic information (age, profession, education, location)
|
||||
- Background (notable experience, association with the event, social ties)
|
||||
- Personality (MBTI type, core traits, emotional expression)
|
||||
- Social-media behavior (posting frequency, content preferences, interaction style, language traits)
|
||||
- Stance (attitudes toward the topic, content likely to anger or move them)
|
||||
- Unique features (catchphrases, special experiences, hobbies)
|
||||
- Personal memory (a key part of the persona: this individual's relation to the event and prior actions/reactions in it)
|
||||
3. age: age number (MUST be an integer)
|
||||
4. gender: gender, MUST be one of the English literals: "male" or "female"
|
||||
5. mbti: MBTI type (e.g. INTJ, ENFP)
|
||||
6. country: country name
|
||||
7. profession: profession
|
||||
8. interested_topics: array of interest topics
|
||||
1. bio: ~200-character social-media bio.
|
||||
2. persona: detailed persona description as a single coherent ~2000-character plain-text passage covering:
|
||||
- basic info (age, profession, educational background, location)
|
||||
- background (notable experiences, link to the focal event, social relationships)
|
||||
- personality (MBTI type, core traits, emotional expression style)
|
||||
- social-media behaviour (posting frequency, content preferences, interaction style, voice)
|
||||
- stance and opinions (attitude toward the topic, content likely to provoke or move them)
|
||||
- distinctive traits (catchphrases, unusual experiences, hobbies)
|
||||
- personal memories (a key part of the persona; describe this individual's link to the focal event and any actions / reactions they have already taken in connection with it)
|
||||
3. age: an integer.
|
||||
4. gender: must be the literal English token "male" or "female".
|
||||
5. mbti: MBTI type (e.g. INTJ, ENFP).
|
||||
6. country: free-form country name.
|
||||
7. profession: free-form occupation.
|
||||
8. interested_topics: array of topic strings.
|
||||
|
||||
Important:
|
||||
- All field values MUST be strings or numbers; do not use unescaped newlines.
|
||||
- persona MUST be a single coherent block of text.
|
||||
- {get_language_instruction()} (gender field MUST use the English values "male" or "female")
|
||||
- Content must remain consistent with the entity information.
|
||||
- age MUST be a valid integer; gender MUST be "male" or "female".
|
||||
- All field values must be strings or numbers; do not include newline characters in any string value.
|
||||
- persona must be a single coherent prose passage.
|
||||
- {get_language_instruction()} (the gender field must remain English: male/female.)
|
||||
- The content must remain consistent with the supplied entity information.
|
||||
- age must be a valid integer; gender must be exactly "male" or "female".
|
||||
"""
|
||||
|
||||
def _build_group_persona_prompt(
|
||||
|
|
@ -716,40 +717,41 @@ Important:
|
|||
attrs_str = json.dumps(entity_attributes, ensure_ascii=False) if entity_attributes else "None"
|
||||
context_str = context[:3000] if context else "No additional context"
|
||||
|
||||
return f"""Generate a detailed social-media account profile for the institution/group entity, faithfully reflecting existing real-world conditions.
|
||||
return f"""Generate a detailed social-media account profile for an institutional or group entity, faithfully grounded in the supplied real-world context.
|
||||
|
||||
|
||||
Entity name: {entity_name}
|
||||
Entity type: {entity_type}
|
||||
Entity summary: {entity_summary}
|
||||
Entity attributes: {attrs_str}
|
||||
|
||||
Context information:
|
||||
Context:
|
||||
{context_str}
|
||||
|
||||
Generate JSON with the following fields:
|
||||
Produce a JSON object with the following fields:
|
||||
|
||||
1. bio: official-account biography, ~200 characters, professional and appropriate
|
||||
2. persona: detailed account-profile description (~2000 characters of plain text), covering:
|
||||
- Institutional basics (formal name, institution type, founding background, primary functions)
|
||||
- Account positioning (account type, target audience, core function)
|
||||
- Voice (language traits, common phrasing, taboo topics)
|
||||
- Publishing pattern (content types, publishing frequency, active hours)
|
||||
- Stance (official position on the core topic, controversy-handling style)
|
||||
- Special notes (the group portrait represented, operational habits)
|
||||
- Institutional memory (a key part of the account profile: this institution's relation to the event and prior actions/reactions in it)
|
||||
3. age: fixed integer 30 (the institutional virtual age)
|
||||
4. gender: fixed literal "other" (institutional accounts use "other" to indicate non-individual)
|
||||
5. mbti: MBTI type used to characterize account voice (e.g. ISTJ for strict/conservative)
|
||||
6. country: country name
|
||||
7. profession: institutional function description
|
||||
8. interested_topics: array of focus areas
|
||||
1. bio: ~200-character official-account bio, polished and professional.
|
||||
2. persona: detailed account profile as a single coherent ~2000-character plain-text passage covering:
|
||||
- institution basics (formal name, type of institution, founding background, primary functions)
|
||||
- account positioning (account type, target audience, core purpose)
|
||||
- voice (linguistic style, common expressions, taboo topics)
|
||||
- content patterns (content types, posting frequency, active hours)
|
||||
- stance (official position on the focal topic, how disputes are handled)
|
||||
- special notes (the group profile it represents, operational habits)
|
||||
- institutional memory (a key part of the persona; describe this institution's link to the focal event and any actions / reactions it has already taken in connection with it)
|
||||
3. age: must be the integer 30 (a virtual age used for institutional accounts).
|
||||
4. gender: must be the literal English token "other" (institutional accounts use "other" to indicate non-individual).
|
||||
5. mbti: MBTI type used to describe the account's voice (e.g. ISTJ for a rigorous, conservative tone).
|
||||
6. country: free-form country name.
|
||||
7. profession: free-form description of the institution's role.
|
||||
8. interested_topics: array of focus areas.
|
||||
|
||||
Important:
|
||||
- All field values MUST be strings or numbers; null values are not allowed.
|
||||
- persona MUST be a single coherent block of text without unescaped newlines.
|
||||
- {get_language_instruction()} (gender field MUST use the English value "other")
|
||||
- age MUST be the integer 30; gender MUST be the string "other".
|
||||
- Account voice MUST match the institution's identity positioning."""
|
||||
- All field values must be strings or numbers; null values are not allowed.
|
||||
- persona must be a single coherent prose passage; do not include newline characters in any string value.
|
||||
- {get_language_instruction()} (the gender field must remain English: "other".)
|
||||
- age must be the integer 30; gender must be exactly the string "other".
|
||||
- The institutional account's voice must match its identity."""
|
||||
|
||||
def _generate_profile_rule_based(
|
||||
self,
|
||||
|
|
@ -959,7 +961,7 @@ Important:
|
|||
progress_callback(
|
||||
current,
|
||||
total,
|
||||
f"已完成 {current}/{total}: {entity.name}({entity_type})"
|
||||
f"Completed {current}/{total}: {entity.name} ({entity_type})"
|
||||
)
|
||||
|
||||
if error:
|
||||
|
|
@ -994,24 +996,25 @@ Important:
|
|||
separator = "-" * 70
|
||||
|
||||
# Assemble the full output (no truncation).
|
||||
topics_str = ', '.join(profile.interested_topics) if profile.interested_topics else '无'
|
||||
|
||||
topics_str = ', '.join(profile.interested_topics) if profile.interested_topics else 'None'
|
||||
|
||||
|
||||
output_lines = [
|
||||
f"\n{separator}",
|
||||
t('progress.profileGenerated', name=entity_name, type=entity_type),
|
||||
f"{separator}",
|
||||
f"用户名: {profile.user_name}",
|
||||
f"Username: {profile.user_name}",
|
||||
f"",
|
||||
f"【简介】",
|
||||
f"[Bio]",
|
||||
f"{profile.bio}",
|
||||
f"",
|
||||
f"【详细人设】",
|
||||
f"[Persona]",
|
||||
f"{profile.persona}",
|
||||
f"",
|
||||
f"【基本属性】",
|
||||
f"年龄: {profile.age} | 性别: {profile.gender} | MBTI: {profile.mbti}",
|
||||
f"职业: {profile.profession} | 国家: {profile.country}",
|
||||
f"兴趣话题: {topics_str}",
|
||||
f"[Basic attributes]",
|
||||
f"Age: {profile.age} | Gender: {profile.gender} | MBTI: {profile.mbti}",
|
||||
f"Profession: {profile.profession} | Country: {profile.country}",
|
||||
f"Interested topics: {topics_str}",
|
||||
separator
|
||||
]
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue