feat(i18n): translate simulation_config_generator prompts to english
translate the three llm prompt blocks plus the two prompt-feeding helpers
(_build_context, _summarize_entities) in
backend/app/services/simulation_config_generator.py from chinese to english.
the chinese base prompts were biasing the model toward chinese structure
and lexical choice for content, narrative_direction, hot_topics, and
reasoning fields even when accept-language was en, because
get_language_instruction() only steers the response language as a
postfix.
translation is in-place and preserves every functional contract: the
json output schema for all three prompts, every variable interpolation,
the per-entity-type heuristic ranges in the agent-config prompt, the
trailing english IMPORTANT directives that lock poster_type to
PascalCase and stance to {supportive,opposing,neutral,observer}, and
all three get_language_instruction() postfix call sites. the two
default-path reasoning literals are translated to locale-agnostic
english so generation_reasoning no longer mixes chinese and english on
the failure path.
logger calls, docstrings, and inline comments are intentionally left
chinese (out of scope; covered by issues #6 and #7). public api,
dataclasses, class constants, and the SimulationParameters payload
shape are unchanged.
Closes #4
This commit is contained in:
parent
3b17c0b9ba
commit
6c2a412196
|
|
@ -0,0 +1,403 @@
|
|||
# Design Document — i18n-simulation-config-generator-prompts
|
||||
|
||||
## Overview
|
||||
|
||||
**Purpose**: Translate the three LLM prompt blocks and two prompt-feeding helpers in `backend/app/services/simulation_config_generator.py` from Chinese to English so that, under `Accept-Language: en`, the model emits English-flavoured output for `content`, `narrative_direction`, `hot_topics`, and `reasoning` fields. Today, these fields skew Chinese because the base prompt language biases the model — the `get_language_instruction()` postfix alone is insufficient.
|
||||
|
||||
**Users**: MiroFish operators running the 5-step pipeline under English locale; reviewers tracking the i18n epic (#11); developers maintaining sibling i18n issues (#5, #6, #7) downstream.
|
||||
|
||||
**Impact**: Behavioural — generated simulation-config string content will switch from Chinese-flavoured to English-flavoured under `Accept-Language: en`. No public-API change. No JSON-shape change. No infrastructure or dependency change.
|
||||
|
||||
### Goals
|
||||
|
||||
- Replace Chinese text inside three prompt blocks (`_generate_time_config`, `_generate_event_config`, `_generate_agent_configs_batch`) and two prompt-feeding helpers (`_build_context`, `_summarize_entities`) with English equivalents.
|
||||
- Preserve every variable interpolation, every JSON-output key, every constraint phrase (`PascalCase`, enum strings), and every `get_language_instruction()` call site.
|
||||
- Keep the public API of `SimulationConfigGenerator` and the `SimulationParameters` payload byte-for-byte equivalent in shape.
|
||||
|
||||
### Non-Goals
|
||||
|
||||
- Logger calls (`logger.info`, `logger.warning`, `logger.error`) inside the same file — owned by issue #6.
|
||||
- Module docstring, class docstrings, dataclass docstrings, inline `#` comments — owned by issue #7.
|
||||
- Refactoring prompt structure, JSON output schema, or retry/repair logic.
|
||||
- Externalizing prompts into `/locales/*.json`.
|
||||
- Default-simulation-parameter changes (rounds, action lists) — owned by `app/config.py`.
|
||||
- Live end-to-end OASIS subprocess validation (deferred to fixture-based static checks; reviewer trust on Step 3 parity).
|
||||
|
||||
## Boundary Commitments
|
||||
|
||||
### This Spec Owns
|
||||
|
||||
- The string-literal **content** of the six prompt-string regions in `simulation_config_generator.py`:
|
||||
- `_generate_time_config` user prompt (~543–586) and system prompt (588).
|
||||
- `_generate_event_config` user prompt (~676–703) and system prompt (705).
|
||||
- `_generate_agent_configs_batch` user prompt (~833–867) and system prompt (869).
|
||||
- The Chinese section headings and overflow markers inside `_build_context` (~393–406) and `_summarize_entities` (~422–430) that flow into prompts via `{context_truncated}`.
|
||||
- The two static-literal `reasoning` values in the default paths: `_get_default_time_config` (line 608) and the `_generate_event_config` exception fallback (line 716).
|
||||
|
||||
### Out of Boundary
|
||||
|
||||
- All `logger.*` calls in this file (issue #6).
|
||||
- All docstrings (`"""..."""`) and `#` comments in this file (issue #7).
|
||||
- `backend/app/utils/locale.py`, `/locales/*.json`, `languages.json`.
|
||||
- `services/simulation_ipc.py`, `services/simulation_runner.py`, OASIS subprocess source.
|
||||
- `backend/app/config.py` constants.
|
||||
- `backend/pyproject.toml`, `backend/uv.lock`.
|
||||
- All other files in the repository.
|
||||
|
||||
### Allowed Dependencies
|
||||
|
||||
- Read access to `get_language_instruction()` from `backend/app/utils/locale.py` — call sites preserved verbatim.
|
||||
- Read access to `t(...)` from `backend/app/utils/locale.py` — call sites preserved verbatim (these already exist for progress messages around lines 296–334).
|
||||
- No new external dependencies.
|
||||
|
||||
### Revalidation Triggers
|
||||
|
||||
- A change to the `SimulationParameters.to_dict()` payload shape would force the OASIS subprocess to re-validate. **This spec does not change the shape.**
|
||||
- A change to any JSON-output key emitted by the three prompts (e.g. renaming `agent_configs` to `agents`) would force the parsing logic in `_parse_time_config` / `_parse_event_config` / `_generate_agent_configs_batch`'s response-merge to re-validate. **This spec does not rename keys.**
|
||||
- A change to the `get_language_instruction()` call sites or the trailing English `IMPORTANT:` directives' constraint semantics would force locale switching and OASIS-side enum validation to re-validate. **This spec preserves both verbatim.**
|
||||
|
||||
## Architecture
|
||||
|
||||
### Existing Architecture Analysis
|
||||
|
||||
`SimulationConfigGenerator` is a single Python class in a single module. The three target methods follow a uniform pattern:
|
||||
|
||||
```
|
||||
prompt = f"""<chinese user prompt with {interpolations}>"""
|
||||
system_prompt = "<chinese system prompt>"
|
||||
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}<optional english IMPORTANT directive>"
|
||||
return self._call_llm_with_retry(prompt, system_prompt)
|
||||
```
|
||||
|
||||
`_build_context` and `_summarize_entities` are private helpers that produce the `context` string passed (truncated) into all three prompt methods via `{context_truncated}`. They emit Chinese section headings.
|
||||
|
||||
There is no abstraction layer between prompt construction and LLM invocation — the prompt text and the call site are colocated. This matches sister modules (`ontology_generator.py`, `oasis_profile_generator.py`).
|
||||
|
||||
### Architecture Pattern & Boundary Map
|
||||
|
||||
**Selected pattern**: In-place string-literal translation. No new components, no new modules, no new abstractions.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Caller["Caller — services/simulation_runner.py"]
|
||||
callsite[generate_config(...)]
|
||||
end
|
||||
subgraph SimConfig["simulation_config_generator.py — IN SCOPE"]
|
||||
gen[generate_config]
|
||||
time[_generate_time_config<br/>**translate prompt + system_prompt**]
|
||||
event[_generate_event_config<br/>**translate prompt + system_prompt**]
|
||||
agent[_generate_agent_configs_batch<br/>**translate prompt + system_prompt**]
|
||||
ctx[_build_context<br/>**translate section headings**]
|
||||
sum[_summarize_entities<br/>**translate type headings**]
|
||||
defT[_get_default_time_config<br/>**translate reasoning literal**]
|
||||
retry[_call_llm_with_retry<br/>UNCHANGED]
|
||||
end
|
||||
subgraph Locale["backend/app/utils/locale.py — UNCHANGED"]
|
||||
gli[get_language_instruction]
|
||||
tr[t]
|
||||
end
|
||||
subgraph IPC["simulation_ipc.py + OASIS subprocess — UNCHANGED"]
|
||||
oasis[OASIS rounds]
|
||||
end
|
||||
|
||||
callsite --> gen
|
||||
gen --> ctx
|
||||
gen --> sum
|
||||
gen --> time
|
||||
gen --> event
|
||||
gen --> agent
|
||||
time -.exception.-> defT
|
||||
event -.exception.-> event
|
||||
time --> retry
|
||||
event --> retry
|
||||
agent --> retry
|
||||
time --> gli
|
||||
event --> gli
|
||||
agent --> gli
|
||||
gen --> tr
|
||||
gen --> oasis
|
||||
```
|
||||
|
||||
**Architecture Integration**:
|
||||
- **Selected pattern**: In-place translation. Matches sister-spec implementations (`0806832`, `9d1d29b`).
|
||||
- **Domain/feature boundaries**: All edits are inside `simulation_config_generator.py`. No file-boundary crossings except the read-only call to `get_language_instruction()` and `t()`.
|
||||
- **Existing patterns preserved**: f-string template assembly; `system_prompt = f"{system_prompt}\n\n{get_language_instruction()}..."` postfix injection; `_call_llm_with_retry` envelope; `_parse_*` extraction with default fallbacks; per-entity-type heuristic ranges.
|
||||
- **New components rationale**: None. The work is text-only.
|
||||
- **Steering compliance**:
|
||||
- `.kiro/steering/tech.md` "Internationalization" — base prompts are part of the i18n surface; this work brings their language in line with the locale postfix.
|
||||
- `.kiro/steering/tech.md` "Code Quality" — match surrounding style; preserve mixed Chinese/English in comments and docstrings (we do — those are #7's scope).
|
||||
- `.kiro/steering/structure.md` — single-file edit; respects per-project graph isolation, since no graph code is touched.
|
||||
|
||||
### Technology Stack
|
||||
|
||||
| Layer | Choice / Version | Role in Feature | Notes |
|
||||
|-------|------------------|-----------------|-------|
|
||||
| Frontend / CLI | (n/a) | (n/a) | No frontend change. |
|
||||
| Backend / Services | Python 3.11+, in-place file edit | Translate prompt-string content | One file modified. |
|
||||
| Data / Storage | (n/a) | (n/a) | No DB schema change. |
|
||||
| Messaging / Events | (n/a) | (n/a) | No IPC payload-shape change. |
|
||||
| Infrastructure / Runtime | (n/a) | (n/a) | No deps, env-var, or runtime change. |
|
||||
|
||||
## File Structure Plan
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
backend/app/services/
|
||||
└── simulation_config_generator.py # All edits live here
|
||||
```
|
||||
|
||||
### Modified Files
|
||||
|
||||
- `backend/app/services/simulation_config_generator.py`
|
||||
- **Lines ~393–406** (`_build_context`): translate the four Chinese section heading strings. Preserve `{simulation_requirement}`, `{len(entities)}`, `{entity_summary}`, `{doc_text}` interpolations.
|
||||
- **Lines ~422–430** (`_summarize_entities`): translate the per-entity-type heading and overflow marker. Preserve `{entity_type}`, `{len(type_entities)}`, `{e.name}`, `{summary_preview}`, `{display_count}` interpolations.
|
||||
- **Lines ~543–586** (`_generate_time_config` user prompt): translate the f-string body. Preserve `{context_truncated}`, `{max_agents_allowed}`. Preserve every JSON-output key. Keep the UTC+8 reference example as illustrative guidance.
|
||||
- **Line 588** (`_generate_time_config` system prompt): translate the literal.
|
||||
- **Line 589** (`_generate_time_config` postfix): unchanged — `system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"`.
|
||||
- **Line 608** (`_get_default_time_config` `reasoning`): translate the literal to English.
|
||||
- **Lines ~676–703** (`_generate_event_config` user prompt): translate the f-string body. Preserve `{simulation_requirement}`, `{context_truncated}`, `{type_info}`. Preserve every JSON-output key.
|
||||
- **Line 705** (`_generate_event_config` system prompt): translate the literal.
|
||||
- **Line 706** (`_generate_event_config` postfix + IMPORTANT directive): unchanged.
|
||||
- **Line 716** (`_generate_event_config` exception-path `reasoning`): translate the literal to English.
|
||||
- **Lines ~833–867** (`_generate_agent_configs_batch` user prompt): translate the f-string body. Preserve `{simulation_requirement}` and `{json.dumps(entity_list, ensure_ascii=False, indent=2)}`. Preserve every JSON-output key. Preserve the per-entity-type heuristic ranges.
|
||||
- **Line 869** (`_generate_agent_configs_batch` system prompt): translate the literal.
|
||||
- **Line 870** (`_generate_agent_configs_batch` postfix + IMPORTANT directive): unchanged.
|
||||
|
||||
> Note: Line numbers are approximate; the implementation will locate edits by string content, not by line number.
|
||||
|
||||
## System Flows
|
||||
|
||||
Skipped — no behavioural flow change. The only flow visible is "caller → `generate_config` → three internal `_generate_*` LLM calls → `SimulationParameters`," which is unchanged.
|
||||
|
||||
## Requirements Traceability
|
||||
|
||||
| Requirement | Summary | Components | Interfaces | Flows |
|
||||
|-------------|---------|------------|------------|-------|
|
||||
| 1.1 | Block 1 user prompt zero-Chinese | `_generate_time_config` lines 543–586 | f-string body | (n/a) |
|
||||
| 1.2 | Block 1 system prompt zero-Chinese | `_generate_time_config` line 588 | string literal | (n/a) |
|
||||
| 1.3 | Block 1 JSON keys preserved | `_generate_time_config` user prompt | `total_simulation_hours`, `minutes_per_round`, `agents_per_hour_min`/`max`, `peak_hours`, `off_peak_hours`, `morning_hours`, `work_hours`, `reasoning` | (n/a) |
|
||||
| 1.4 | Field-level numeric constraints preserved | `_generate_time_config` user prompt | constraint ranges in prose | (n/a) |
|
||||
| 1.5 | Block 1 interpolations preserved | `_generate_time_config` user prompt | `{context_truncated}`, `{max_agents_allowed}` | (n/a) |
|
||||
| 1.6 | UTC+8 reference example retained | `_generate_time_config` user prompt | illustrative bullet block | (n/a) |
|
||||
| 1.7 | `get_language_instruction()` call site preserved | `_generate_time_config` line 589 | call expression | (n/a) |
|
||||
| 2.1 | Block 2 user prompt zero-Chinese | `_generate_event_config` lines 676–703 | f-string body | (n/a) |
|
||||
| 2.2 | Block 2 system prompt zero-Chinese | `_generate_event_config` line 705 | string literal | (n/a) |
|
||||
| 2.3 | Block 2 JSON keys preserved | `_generate_event_config` user prompt | `hot_topics`, `narrative_direction`, `initial_posts[].content`, `initial_posts[].poster_type`, `reasoning` | (n/a) |
|
||||
| 2.4 | Block 2 interpolations preserved | `_generate_event_config` user prompt | `{simulation_requirement}`, `{context_truncated}`, `{type_info}` | (n/a) |
|
||||
| 2.5 | `get_language_instruction()` call site preserved | `_generate_event_config` line 706 | call expression | (n/a) |
|
||||
| 2.6 | `poster_type` PascalCase directive preserved | `_generate_event_config` line 706 | trailing `IMPORTANT:` clause | (n/a) |
|
||||
| 2.7 | Type-to-author examples translated, pairings intact | `_generate_event_config` user prompt | example-list bullet block | (n/a) |
|
||||
| 2.8 | `zh` locale produces Chinese output | `_generate_event_config` + `get_language_instruction()` | postfix call | (n/a) |
|
||||
| 3.1 | Block 3 user prompt zero-Chinese | `_generate_agent_configs_batch` lines 833–867 | f-string body | (n/a) |
|
||||
| 3.2 | Block 3 system prompt zero-Chinese | `_generate_agent_configs_batch` line 869 | string literal | (n/a) |
|
||||
| 3.3 | Block 3 JSON keys preserved | `_generate_agent_configs_batch` user prompt | `agent_configs[].agent_id` and 9 sub-keys | (n/a) |
|
||||
| 3.4 | Block 3 interpolations preserved | `_generate_agent_configs_batch` user prompt | `{simulation_requirement}`, `json.dumps(entity_list,...)` | (n/a) |
|
||||
| 3.5 | Per-entity-type heuristic ranges preserved | `_generate_agent_configs_batch` user prompt | bullet block describing officials/media/individuals/experts ranges | (n/a) |
|
||||
| 3.6 | `get_language_instruction()` call site preserved | `_generate_agent_configs_batch` line 870 | call expression | (n/a) |
|
||||
| 3.7 | `stance` enum + JSON-shape directive preserved | `_generate_agent_configs_batch` line 870 | trailing `IMPORTANT:` clause | (n/a) |
|
||||
| 4.1 | Three call sites preserved at same positions | lines 589, 706, 870 | postfix injections | (n/a) |
|
||||
| 4.2 | `zh` locale produces Chinese output (verification) | end-to-end | (n/a) | runtime |
|
||||
| 4.3 | `en` locale produces English output (verification) | end-to-end | (n/a) | runtime |
|
||||
| 4.4 | Locale source files unchanged | (n/a — guard) | (n/a) | (n/a) |
|
||||
| 4.5 | Reasoning-model JSON repair preserved | `_fix_truncated_json`, `_try_fix_config_json` | (unchanged) | (n/a) |
|
||||
| 5.1–5.6 | Public API and constants stable | class surface | (unchanged) | (n/a) |
|
||||
| 6.1 | Default-path `reasoning` non-empty | `_get_default_time_config`, exception path | string literal | (n/a) |
|
||||
| 6.2 | Default-path literals translated to locale-agnostic English | lines 608, 716 | string literals | (n/a) |
|
||||
| 6.3 | `generation_reasoning` join semantics preserved | `generate_config` reasoning_parts assembly | `" | ".join(...)` | (n/a) |
|
||||
| 7.1 | `_build_context` headings English | `_build_context` lines 393–406 | f-string body | (n/a) |
|
||||
| 7.2 | `_summarize_entities` headings English | `_summarize_entities` lines 422–430 | f-string body | (n/a) |
|
||||
| 7.3 | User-provided `entity.name`/`entity.summary` preserved verbatim | `_summarize_entities` | (data passthrough) | (n/a) |
|
||||
| 7.4 | `_build_context`/`_summarize_entities` signatures unchanged | helpers | (unchanged) | (n/a) |
|
||||
| 8.1–8.4 | Step 3 parity (verification) | end-to-end OASIS run | (n/a) | runtime |
|
||||
| 9.1 | logger calls untouched | (guard) | (n/a) | (n/a) |
|
||||
| 9.2 | docstrings/comments untouched | (guard) | (n/a) | (n/a) |
|
||||
| 9.3 | No production-code edits outside target file | (guard) | (n/a) | (n/a) |
|
||||
| 9.4 | No dependency change | (guard) | (n/a) | (n/a) |
|
||||
| 9.5 | No edits to listed adjacent files | (guard) | (n/a) | (n/a) |
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|
||||
|-----------|--------------|--------|--------------|--------------------------|-----------|
|
||||
| `_generate_time_config` (modified) | services | Render English time-config prompts; preserve LLM contract | 1.1–1.7, 4.1 | `_call_llm_with_retry` (P0), `get_language_instruction` (P0), `_get_default_time_config` (P1) | Service |
|
||||
| `_generate_event_config` (modified) | services | Render English event-config prompts; preserve LLM contract and `poster_type` constraint | 2.1–2.8, 4.1, 6.1, 6.2 | `_call_llm_with_retry` (P0), `get_language_instruction` (P0) | Service |
|
||||
| `_generate_agent_configs_batch` (modified) | services | Render English agent-config prompts; preserve LLM contract and `stance` constraint | 3.1–3.7, 4.1 | `_call_llm_with_retry` (P0), `get_language_instruction` (P0), `_generate_agent_config_by_rule` (P1) | Service |
|
||||
| `_build_context` (modified) | services | Emit English section headings into context string | 7.1, 7.3, 7.4 | `_summarize_entities` (P0) | State |
|
||||
| `_summarize_entities` (modified) | services | Emit English type headings/overflow markers | 7.2, 7.3, 7.4 | (none) | State |
|
||||
| `_get_default_time_config` (modified) | services | Emit locale-agnostic English `reasoning` on default path | 6.1, 6.2 | (none) | State |
|
||||
| `_call_llm_with_retry` (unchanged) | services | LLM invocation, retry, JSON repair | 4.5, 5.6 | `OpenAI` client (P0) | Service |
|
||||
| `SimulationParameters.to_dict()` (unchanged) | services | Payload to OASIS subprocess | 5.4, 8.4 | `dataclasses.asdict` (P0) | State |
|
||||
|
||||
Detailed component blocks below for the three prompt-rendering methods. The two helper methods and the default-path method need only the summary-table entry above.
|
||||
|
||||
### Domain: Simulation Config Generation
|
||||
|
||||
#### `_generate_time_config` (modified)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Translate the time-config prompt and system prompt to English; preserve LLM contract, locale postfix, and JSON-key shape |
|
||||
| Requirements | 1.1–1.7, 4.1 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
- Render an English f-string `prompt` containing `{context_truncated}` and `{max_agents_allowed}`, preserving the JSON-output schema and per-field numeric constraints.
|
||||
- Render an English `system_prompt` literal; postfix `get_language_instruction()` exactly as today.
|
||||
- Continue to fall back to `_get_default_time_config(num_entities)` on LLM exception.
|
||||
|
||||
**Dependencies**
|
||||
- Inbound: `generate_config` — calls this method as Step 1 of the pipeline (P0).
|
||||
- Outbound: `_call_llm_with_retry` (P0), `get_language_instruction` (P0), `_get_default_time_config` (P1).
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _generate_time_config(self, context: str, num_entities: int) -> Dict[str, Any]:
|
||||
"""Returns a dict with keys: total_simulation_hours, minutes_per_round,
|
||||
agents_per_hour_min, agents_per_hour_max, peak_hours, off_peak_hours,
|
||||
morning_hours, work_hours, reasoning."""
|
||||
```
|
||||
|
||||
- Preconditions: `context` is a non-empty string; `num_entities` ≥ 1.
|
||||
- Postconditions: returned dict contains all eight numeric/array keys plus `reasoning`. On LLM failure, defaults are returned via `_get_default_time_config`.
|
||||
- Invariants: signature unchanged; exception-fallback path unchanged.
|
||||
|
||||
**Implementation Notes**
|
||||
- Integration: invoked by `generate_config` at line 298. No call-site change.
|
||||
- Validation: zero `[一-鿿]` matches across the f-string body and `system_prompt` literal (excluding the `get_language_instruction()` postfix expression itself).
|
||||
- Risks: missing an interpolation during translation produces a `KeyError` on f-string formatting — caught by fixture render check.
|
||||
|
||||
#### `_generate_event_config` (modified)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Translate the event-config prompt and system prompt to English; preserve LLM contract, `poster_type` PascalCase constraint, and JSON-key shape |
|
||||
| Requirements | 2.1–2.8, 4.1, 6.1, 6.2 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
- Render an English f-string `prompt` containing `{simulation_requirement}`, `{context_truncated}`, and `{type_info}`.
|
||||
- Preserve verbatim the trailing English `IMPORTANT: The 'poster_type' field value MUST be in English PascalCase ...` directive (constraint semantics unchanged).
|
||||
- Translate the static `reasoning` literal in the exception-path fallback to English (Decision: translate default-path strings).
|
||||
|
||||
**Dependencies**
|
||||
- Inbound: `generate_config` — calls this method as Step 2 (P0).
|
||||
- Outbound: `_call_llm_with_retry` (P0), `get_language_instruction` (P0).
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _generate_event_config(
|
||||
self,
|
||||
context: str,
|
||||
simulation_requirement: str,
|
||||
entities: List[EntityNode],
|
||||
) -> Dict[str, Any]:
|
||||
"""Returns a dict with keys: hot_topics, narrative_direction,
|
||||
initial_posts (list of {content, poster_type}), reasoning."""
|
||||
```
|
||||
|
||||
- Preconditions: `entities` non-empty (else `type_info` will be empty — acceptable, prompt still renders).
|
||||
- Postconditions: returned dict contains the four keys; on exception, fallback dict carries the (now-English) `"Used default config"` reasoning.
|
||||
- Invariants: signature unchanged; the trailing `IMPORTANT: ... PascalCase ...` directive remains verbatim.
|
||||
|
||||
**Implementation Notes**
|
||||
- Integration: invoked at line 304.
|
||||
- Validation: zero Chinese chars in the `prompt` and `system_prompt` literals; `IMPORTANT:` clause byte-equal.
|
||||
- Risks: accidentally dropping or paraphrasing the `IMPORTANT:` directive could allow the LLM to emit lowercase or localized `poster_type` values, which `_assign_initial_post_agents`'s alias map (line 750) might or might not handle gracefully — keep verbatim.
|
||||
|
||||
#### `_generate_agent_configs_batch` (modified)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Translate the agent-config batch prompt and system prompt to English; preserve LLM contract, `stance` enum constraint, JSON-key shape, and per-entity-type heuristic ranges |
|
||||
| Requirements | 3.1–3.7, 4.1 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
- Render an English f-string `prompt` containing `{simulation_requirement}` and `{json.dumps(entity_list, ensure_ascii=False, indent=2)}`.
|
||||
- Preserve the per-entity-type heuristic ranges currently embedded as bullet points (officials/media/individuals/experts).
|
||||
- Preserve verbatim the trailing English `IMPORTANT: The 'stance' field value MUST be one of ...` directive (constraint semantics unchanged).
|
||||
|
||||
**Dependencies**
|
||||
- Inbound: `generate_config` — calls this method N times (one per batch) at line 320 (P0).
|
||||
- Outbound: `_call_llm_with_retry` (P0), `get_language_instruction` (P0), `_generate_agent_config_by_rule` (P1, fallback per entity).
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _generate_agent_configs_batch(
|
||||
self,
|
||||
context: str,
|
||||
entities: List[EntityNode],
|
||||
start_idx: int,
|
||||
simulation_requirement: str,
|
||||
) -> List[AgentActivityConfig]:
|
||||
"""Returns one AgentActivityConfig per input entity, populated from
|
||||
LLM output where possible, else from rule-based fallback."""
|
||||
```
|
||||
|
||||
- Preconditions: `entities` non-empty; `start_idx` ≥ 0.
|
||||
- Postconditions: returned list length equals `len(entities)`; each item has a populated `stance` ∈ {`supportive`, `opposing`, `neutral`, `observer`}.
|
||||
- Invariants: signature unchanged; rule-based fallback (`_generate_agent_config_by_rule`) wiring unchanged.
|
||||
|
||||
**Implementation Notes**
|
||||
- Integration: invoked at line 320 inside the batch loop.
|
||||
- Validation: zero Chinese chars; `IMPORTANT: stance ...` clause byte-equal; the four-range heuristic block is translated but the numeric ranges are preserved.
|
||||
- Risks: paraphrasing the `stance` enum constraint could allow Chinese stance values into the OASIS subprocess — keep verbatim.
|
||||
|
||||
## Data Models
|
||||
|
||||
No new or changed data models. `AgentActivityConfig`, `TimeSimulationConfig`, `EventConfig`, `PlatformConfig`, `SimulationParameters` and `to_dict()` outputs are unchanged.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Strategy
|
||||
|
||||
No new error strategy. The existing two-tier fallback (LLM retry inside `_call_llm_with_retry`; per-method default-config fallback when retry exhausts) is preserved unchanged.
|
||||
|
||||
### Error Categories and Responses
|
||||
|
||||
- **LLM call failure (retry exhausted)** → `_get_default_time_config` (block 1) or static fallback dict (block 2). The fallback `reasoning` is now English (Decision: R6).
|
||||
- **JSON parse failure** → repaired by `_fix_truncated_json` and `_try_fix_config_json`. Unchanged.
|
||||
- **`stance` not in enum** → consumed by OASIS subprocess; behaviour unchanged. The translated `IMPORTANT:` directive guards against this at the LLM-output level.
|
||||
- **`poster_type` not matching available entity types** → consumed by `_assign_initial_post_agents` (line 793); falls back to highest-influence agent. Unchanged.
|
||||
|
||||
### Monitoring
|
||||
|
||||
No new logging. `logger.warning("时间配置LLM生成失败...")` and similar lines remain in their current Chinese form (issue #6's scope).
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit / fixture-based static checks (in scope for this implementation)
|
||||
|
||||
1. **Compile pass** — `python -m py_compile backend/app/services/simulation_config_generator.py` runs clean.
|
||||
2. **Zero-Chinese assertion on prompt regions** — read the file, locate the six prompt literals + the two helper bodies via AST or by anchored substring match, assert `re.findall(r'[一-鿿]', region) == []`.
|
||||
3. **Render check** — instantiate `SimulationConfigGenerator` with stub credentials, monkeypatch `_call_llm_with_retry` to a no-op stub, and call `_build_context`, `_summarize_entities`, plus the three prompt-rendering paths via the f-string `.format` route (or via `_generate_time_config`/`_generate_event_config`/`_generate_agent_configs_batch` directly with a mocked client). Assert: every documented interpolation appears in the rendered prompt; no `KeyError`; no `IndexError`.
|
||||
4. **Constraint-clause byte-equal check** — assert the exact string `IMPORTANT: The 'poster_type' field value MUST be in English PascalCase exactly matching the available entity types.` substring is present in line 706 region; assert `IMPORTANT: The 'stance' field value MUST be one of the English strings: 'supportive', 'opposing', 'neutral', 'observer'.` substring is present in line 870 region.
|
||||
|
||||
### Integration tests (deferred)
|
||||
|
||||
5. **OASIS Step 3 smoke run** — deferred. Sister specs (#2, #3) shipped without a live e2e run; the same posture applies here. The reviewer is trusted on Step 3 parity by virtue of unchanged JSON shape (Requirement 5).
|
||||
|
||||
### E2E / UI
|
||||
|
||||
(n/a — backend-only change.)
|
||||
|
||||
## Optional Sections
|
||||
|
||||
### Migration Strategy
|
||||
|
||||
No data migration. The change takes effect on the next call to `SimulationConfigGenerator.generate_config(...)` after deploy. Locale-resolved Chinese postfix continues to bias the LLM toward Chinese output for `Accept-Language: zh`, so `zh` users see no perceptible change. `en` users (and any non-`zh` locale) see English-flavoured output starting immediately.
|
||||
|
||||
Rollback: revert the single commit. No database, no cache, no schema concerns.
|
||||
|
||||
## Supporting References
|
||||
|
||||
- `research.md` — full discovery log, alternative-architecture evaluation, decision records.
|
||||
- Sister-spec implementations: commits `0806832` (#2), `9d1d29b` (#3).
|
||||
- Sister-spec planning artefacts: `.kiro/specs/i18n-ontology-generator-prompts/`, `.kiro/specs/i18n-oasis-profile-generator-prompts/`.
|
||||
|
|
@ -0,0 +1,109 @@
|
|||
# Gap Analysis — i18n-simulation-config-generator-prompts
|
||||
|
||||
## 1. Current-State Investigation
|
||||
|
||||
### Domain assets
|
||||
|
||||
- **Target file**: `backend/app/services/simulation_config_generator.py` (~991 lines).
|
||||
- **Three prompt blocks**:
|
||||
- **Block 1** — `_generate_time_config` (lines ~535–595). User prompt at lines 543–586 (f-string), system prompt at line 588, `get_language_instruction()` postfix at 589.
|
||||
- **Block 2** — `_generate_event_config` (lines ~646–717). User prompt at lines 676–703, system prompt at 705, `get_language_instruction()` postfix at 706 (already followed by an English `IMPORTANT:` directive on `poster_type`).
|
||||
- **Block 3** — `_generate_agent_configs_batch` (lines ~813–906). User prompt at lines 833–867, system prompt at 869, `get_language_instruction()` postfix at 870 (already followed by an English `IMPORTANT:` directive on `stance`).
|
||||
- **Indirect prompt content** (interpolated via `{context_truncated}`):
|
||||
- `_build_context` (lines 381–407) emits Chinese section headings: `## 模拟需求`, `## 实体信息 ({n}个)`, `## 原始文档内容`, truncation marker `(文档已截断)`.
|
||||
- `_summarize_entities` (lines 409–432) emits per-type headings `### {entity_type} ({n}个)` and overflow marker `... 还有 {n} 个`.
|
||||
- **Locale resolution**: `backend/app/utils/locale.py` (`get_locale`, `get_language_instruction`, `t`) resolves locale from `Accept-Language` header in a request context, or thread-local in background threads. `languages.json` exposes `zh`, `en`, `es`, `fr`, `pt`, `ru`, `de` — all but `zh` already use English `llmInstruction` postfixes.
|
||||
|
||||
### Counts (verified)
|
||||
|
||||
| Region | Chinese chars |
|
||||
| --- | --- |
|
||||
| Block 1 user prompt | 417 |
|
||||
| Block 1 system prompt | 38 |
|
||||
| Block 2 user prompt | 173 |
|
||||
| Block 2 system prompt | 27 |
|
||||
| Block 3 user prompt | 207 |
|
||||
| Block 3 system prompt | 36 |
|
||||
| `_build_context` body | 46 |
|
||||
| `_summarize_entities` body | 29 |
|
||||
| File total (incl. logger, docstrings, comments) | 2415 |
|
||||
|
||||
The ticket's "~247 Chinese characters" undercounts the in-prompt total by ~3.6×. The actual prompt-string count is ~898; with context-builder headings (~75 more), the in-scope total is ~973. Logger lines, docstrings, and comments are out of scope (covered by #6/#7).
|
||||
|
||||
### Conventions (extracted)
|
||||
|
||||
- Sister specs `i18n-ontology-generator-prompts` (commit `0806832`) and `i18n-oasis-profile-generator-prompts` (commit `9d1d29b`) established the exact pattern: in-place translation; preserve `get_language_instruction()` call sites; preserve all interpolations and the trailing identifier-format `IMPORTANT:` directives; do not touch logger, docstrings, comments, or any other file.
|
||||
- 4-space indent, snake_case, double quotes for strings, `f"""..."""` for multi-line prompts. Existing Chinese-then-English mix is acceptable in comments/docstrings (steering-tech.md: "preserve both; do not translate one into the other unless asked").
|
||||
- No linter/formatter — match surrounding style.
|
||||
|
||||
### Integration surfaces
|
||||
|
||||
- `SimulationConfigGenerator.generate_config(...)` is called from `services/simulation_runner.py` and the simulation API blueprint. The returned `SimulationParameters.to_dict()` is consumed by the OASIS subprocess via `services/simulation_ipc.py`. The JSON payload shape and field semantics must remain unchanged.
|
||||
- `_build_context` and `_summarize_entities` are private helpers used only inside this file — translating their headings is local-only.
|
||||
- Locale-switching contract: when locale = `zh`, `get_language_instruction()` returns `请使用中文回答。`; when `en`, `Please respond in English.` — verified.
|
||||
|
||||
## 2. Requirement-to-Asset Map
|
||||
|
||||
| Requirement | Existing asset | Gap | Tag |
|
||||
| --- | --- | --- | --- |
|
||||
| R1 — Block 1 prompt EN | f-string at line 543, system_prompt at 588 | Translate text; preserve `{context_truncated}`, `{max_agents_allowed}`, JSON keys, field constraints, UTC+8 reference example | Missing (translation) |
|
||||
| R2 — Block 2 prompt EN | f-string at line 676, system_prompt at 705 | Translate text; preserve `{simulation_requirement}`, `{context_truncated}`, `{type_info}`, JSON keys, type-to-author examples, the `IMPORTANT: poster_type ... PascalCase ...` directive | Missing (translation) |
|
||||
| R3 — Block 3 prompt EN | f-string at line 833, system_prompt at 869 | Translate text; preserve `{simulation_requirement}` and the `json.dumps(entity_list, ensure_ascii=False, indent=2)` interpolation, JSON keys, per-entity-type heuristic ranges, the `IMPORTANT: stance ... supportive/opposing/neutral/observer` directive | Missing (translation) |
|
||||
| R4 — Locale switching preserved | `get_language_instruction()` calls at lines 589, 706, 870 | None — keep call sites untouched | Constraint |
|
||||
| R5 — Public API stable | Class/method/dataclass surface | None — text-only changes | Constraint |
|
||||
| R6 — Default reasoning strings | `_get_default_time_config` line 608, `_generate_event_config` exception at line 716 | Optional translation of two `reasoning` literals; non-empty contract preserved | Optional gap |
|
||||
| R7 — Context-builder headings EN | `_build_context` 393–406, `_summarize_entities` 422–430 | Translate Chinese section headings inside f-strings; preserve interpolations | Missing (translation) |
|
||||
| R8 — Step 3 parity | OASIS subprocess + `simulation_ipc.py` | Verification only — the change should not alter `SimulationParameters.to_dict()` shape | Constraint |
|
||||
| R9 — Out-of-scope guardrails | logger calls (≥17 occurrences), docstrings, comments | None — leave untouched | Constraint |
|
||||
|
||||
### Unknown / Research-needed
|
||||
|
||||
- **R8 verification feasibility**: Running an end-to-end OASIS simulation requires Neo4j, an LLM key, and a representative seed. In a sandboxed CI-like environment, this is not practical. Defer to a lightweight fixture-based check: (a) lint pass — `python -m py_compile`, (b) zero-Chinese assertion on the three prompt strings via `re.findall(r'[一-鿿]', ...)`, (c) shape parity by constructing a fake `entity_list` and confirming the prompts render to the expected interpolation set without raising. **Research item**: confirm with the user whether a smoke-test run of `services/simulation_runner` is required for PR acceptance, or whether the pattern of #2/#3 (no end-to-end run, reviewer-trust) is acceptable here.
|
||||
|
||||
## 3. Implementation Approach Options
|
||||
|
||||
### Option A — In-place translation (recommended)
|
||||
|
||||
**What**: Edit the six prompt string literals plus the two context-builder helper bodies directly in `simulation_config_generator.py`. No new files.
|
||||
|
||||
**Trade-offs**:
|
||||
- ✅ Matches the precedent set by commits `0806832` (issue #2) and `9d1d29b` (issue #3) — same file, same approach. Reviewer pattern recognition is the lowest possible.
|
||||
- ✅ Smallest possible diff, smallest possible blast radius.
|
||||
- ✅ No new abstractions, no new files, no dependency churn.
|
||||
- ❌ Translations are baked in — switching to `es`/`fr`/`pt`/`ru`/`de` still relies on the `get_language_instruction()` postfix to bias the model. (This is also true under the current Chinese-base baseline; not a regression.)
|
||||
|
||||
### Option B — Externalize prompts to `/locales/`
|
||||
|
||||
**What**: Move all six prompt strings to `locales/en.json` / `locales/zh.json` and use `t('prompts.simConfig.timeConfig.user')` etc. to look them up.
|
||||
|
||||
**Trade-offs**:
|
||||
- ✅ Genuinely locale-agnostic prompts; richer non-`zh`/`en` support.
|
||||
- ❌ Departs from the pattern set by #2 and #3 — those translations are in-line. Inconsistency between sister files.
|
||||
- ❌ Significant new surface area in `/locales/` JSON; brittle keys for f-string substitution (need to encode `{topic}`, `{n_agents}`, `{json.dumps(...)}` in JSON values).
|
||||
- ❌ Requires either templating-engine choice (Jinja, Python `str.format`) or fragile string concatenation.
|
||||
- ❌ Out of scope per the ticket's "Out of scope: refactoring prompt structure or output JSON schema" guardrail (the structure is the prompt; moving it to JSON is a refactor).
|
||||
|
||||
### Option C — Hybrid: in-place EN translation + introduce a thin prompt-loader for future extraction
|
||||
|
||||
**What**: Translate in-place now; also add a no-op `_load_prompt(...)` helper that just returns the literal, with a comment hinting at future externalization.
|
||||
|
||||
**Trade-offs**:
|
||||
- ✅ Sets a future migration path.
|
||||
- ❌ Adds an indirection that has no caller variance today — premature abstraction. Steering-tech.md explicitly discourages this style.
|
||||
- ❌ Larger diff, more reviewer surface, no behavioural benefit over Option A.
|
||||
|
||||
## 4. Effort & Risk
|
||||
|
||||
- **Effort**: **S** (1–3 days). Six string literals plus two helper bodies; no schema, no API, no dependency changes. Sister-spec implementations completed in single commits (`0806832`, `9d1d29b`).
|
||||
- **Risk**: **Low**. Translation is text-only; the JSON output contract, public API, and `get_language_instruction()` mechanism are all preserved. The only behavioural risk is the LLM emitting different lexical choices for the same fields — this is the *intended* effect (English-flavoured output under `Accept-Language: en`).
|
||||
|
||||
## 5. Recommendation for Design Phase
|
||||
|
||||
- **Preferred approach**: Option A. Match the sister-spec pattern (`0806832`, `9d1d29b`) for reviewer continuity.
|
||||
- **Key decisions to lock in design**:
|
||||
1. **Wording style**: Mirror the conversational/imperative style of `oasis_profile_generator` and `ontology_generator` post-translation prompts — e.g. `"You are a social-media simulation expert. ..."` rather than direct word-for-word ports of the Chinese.
|
||||
2. **Default-path `reasoning` strings (R6)**: Translate the two literals to English ("Default circadian-pattern config (1h per round)" and "Used default config"). Keep them locale-agnostic since they only fire on LLM failure, where locale is moot. This avoids a forever-Chinese leak into a `generation_reasoning` joined with English content.
|
||||
3. **Verification harness (R8)**: Use a lightweight fixture-only check (compile, regex for Chinese in prompt strings, render the prompts with stub data and assert interpolation completeness). No end-to-end OASIS run unless the reviewer requests it.
|
||||
4. **Wording for the `IMPORTANT:` directives**: Keep the constraint semantics identical; allow light wording polish so the directive flows naturally after a now-English system prompt (e.g. drop the `IMPORTANT:` redundancy if the translated system prompt already encodes the constraint, or move the directive to be inline with the rest of the system prompt).
|
||||
|
||||
- **Research items to carry**: confirm verification scope (fixture vs. live) with implementation reviewer; if live is required, the sandboxed environment must have `LLM_API_KEY`, Neo4j, and a seed file available — none of which are guaranteed in this run.
|
||||
|
|
@ -0,0 +1,143 @@
|
|||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This specification covers the English translation of the three LLM prompt blocks in `backend/app/services/simulation_config_generator.py`. The file produces the simulation parameters consumed by the OASIS subprocess (Step 3 of the MiroFish pipeline): time/event/agent/platform configuration, hot-topic extraction, narrative direction, and stance assignment. Today, all three prompts are written in Chinese; the language is steered at runtime by appending `get_language_instruction()` to each system prompt. While that postfix instructs the model *which* language to respond in, the base-prompt language biases the model's structural and lexical output. As a result, the natural-language output fields (`content`, `narrative_direction`, `hot_topics`, `reasoning`) skew Chinese under `Accept-Language: en`. Translating the base prompts to English removes that bias while preserving the existing locale-switching mechanism for non-English locales (verified: `get_language_instruction()` returns the Chinese postfix `请使用中文回答。` when locale is `zh`).
|
||||
|
||||
This work tracks GitHub issue [#4](https://github.com/salestech-group/MiroFish/issues/4).
|
||||
|
||||
## Boundary Context
|
||||
|
||||
- **In scope**:
|
||||
- Translating the time-configuration prompt and its system prompt in `_generate_time_config` (block 1, lines ~543–588).
|
||||
- Translating the event-configuration prompt and its system prompt in `_generate_event_config` (block 2, lines ~676–705).
|
||||
- Translating the per-batch agent-configuration prompt and its system prompt in `_generate_agent_configs_batch` (block 3, lines ~833–869).
|
||||
- Preserving every `get_language_instruction()` call site exactly as today (lines 589, 706, 870 — the three postfix injections that follow each system prompt).
|
||||
- Preserving the existing English-only constraint directives that already follow `get_language_instruction()`: `poster_type` PascalCase English (block 2), `stance` ∈ {`supportive`, `opposing`, `neutral`, `observer`} (block 3).
|
||||
- Preserving every variable interpolation (`{context_truncated}`, `{simulation_requirement}`, `{type_info}`, `{max_agents_allowed}`, `{json.dumps(entity_list, ...)}`, etc.) verbatim by name and position.
|
||||
- Preserving the JSON output contract of each prompt (key names, value types, required fields).
|
||||
- **Out of scope**:
|
||||
- Logger messages (`logger.info`, `logger.warning`, `logger.error`) inside the same file — covered by issue #6.
|
||||
- Module docstring, class docstrings, method docstrings, and inline comments — covered by issue #7.
|
||||
- Refactoring the prompt structure, JSON output schema, retry/repair logic in `_call_llm_with_retry`, or any data-class definitions.
|
||||
- Changing default simulation parameters (rounds count, action lists, etc. — owned by `app/config.py`).
|
||||
- The fallback string in `_get_default_time_config` (`"使用默认中国人作息配置(每轮1小时)"`) and the fallback `"使用默认配置"` in `_generate_event_config` exception handler — these are returned as `reasoning` values, not prompt content. Translation of these is closer to log/comment scope (#6/#7); for symmetry with the prompt translation goal they SHOULD be translated to English when locale-agnostic, but only as long as no behavioural side effects are introduced (see Requirement 6).
|
||||
- The `_build_context` Chinese section headings (`## 模拟需求`, `## 实体信息`, `## 原始文档内容`, `...(文档已截断)`) and `_summarize_entities` headings (`### {entity_type} ({len(type_entities)}个)`, `... 还有 {n} 个`) — these are interpolated into prompts as part of `{context_truncated}` and bias the model's output language. Translation of these section headings is in scope (see Requirement 7) because they contribute to the same model-output language bias the three prompt blocks address.
|
||||
- **Adjacent expectations**:
|
||||
- The OASIS simulation subprocess and IPC layer (`services/simulation_ipc.py`) consume the resulting `SimulationParameters` payload. No coupling to prompt language exists in that consumer; the JSON shape of `SimulationParameters.to_dict()` is unchanged by this work.
|
||||
- The locale resolution chain (`Accept-Language` header → `get_locale()` → `get_language_instruction()`) lives in `backend/app/utils/locale.py` and is unchanged.
|
||||
- Companion i18n issues (#2 closed, #3 closed, #5, #6, #7) operate on different files or scopes and must not be touched here.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: English Translation of the Time-Configuration Prompt (Block 1)
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the time-configuration prompt and system prompt to be authored in English, so that the LLM's `reasoning` field for time configuration is not biased toward Chinese structure or word choice.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall render the user prompt inside `_generate_time_config` containing zero Chinese characters in any string-literal content.
|
||||
2. The Simulation Config Generator shall render the system prompt inside `_generate_time_config` containing zero Chinese characters in any string-literal content.
|
||||
3. The Simulation Config Generator shall preserve the JSON output contract of the time-config prompt verbatim by key name: `total_simulation_hours`, `minutes_per_round`, `agents_per_hour_min`, `agents_per_hour_max`, `peak_hours`, `off_peak_hours`, `morning_hours`, `work_hours`, `reasoning`.
|
||||
4. The Simulation Config Generator shall preserve the field-level numeric constraints currently described in the prompt: `total_simulation_hours` ∈ 24–168, `minutes_per_round` ∈ 30–120 (recommend 60), `agents_per_hour_min`/`max` ∈ 1–`max_agents_allowed`.
|
||||
5. The Simulation Config Generator shall preserve the variable interpolations `{context_truncated}` and `{max_agents_allowed}` verbatim by name and position.
|
||||
6. The Simulation Config Generator shall preserve the prompt's guidance that the model should infer the target user group's timezone and circadian habits from the simulation scenario, with the UTC+8 reference example retained as illustrative guidance.
|
||||
7. The Simulation Config Generator shall preserve the call to `get_language_instruction()` exactly at line ~589, appended after the translated system prompt.
|
||||
|
||||
### Requirement 2: English Translation of the Event-Configuration Prompt (Block 2)
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the event-configuration prompt and system prompt to be authored in English, so that generated `hot_topics`, `narrative_direction`, initial-post `content`, and `reasoning` fields are not biased toward Chinese structure or word choice.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall render the user prompt inside `_generate_event_config` containing zero Chinese characters in any string-literal content.
|
||||
2. The Simulation Config Generator shall render the system prompt inside `_generate_event_config` containing zero Chinese characters in any string-literal content.
|
||||
3. The Simulation Config Generator shall preserve the JSON output contract of the event-config prompt verbatim by key name: `hot_topics` (list of strings), `narrative_direction` (string), `initial_posts` (list of objects with keys `content` and `poster_type`), `reasoning` (string).
|
||||
4. The Simulation Config Generator shall preserve the variable interpolations `{simulation_requirement}`, `{context_truncated}`, and `{type_info}` verbatim by name and position.
|
||||
5. The Simulation Config Generator shall preserve the call to `get_language_instruction()` exactly at line ~706 appended after the translated system prompt.
|
||||
6. The Simulation Config Generator shall preserve verbatim the trailing English-only directive on `poster_type` formatting (currently: `IMPORTANT: The 'poster_type' field value MUST be in English PascalCase exactly matching the available entity types. Only 'content', 'narrative_direction', 'hot_topics' and 'reasoning' fields should use the specified language.`). The wording may be lightly normalized so it reads cleanly after a now-English system prompt, but the constraint semantics shall not change.
|
||||
7. The Simulation Config Generator shall preserve the prompt's example list mapping entity types to expected post authors (Official/University → official statements, MediaOutlet → news, Student → student opinions) — translated to English while keeping each pairing intact.
|
||||
8. When the locale is `zh`, the Simulation Config Generator shall produce `hot_topics`, `narrative_direction`, initial-post `content`, and `reasoning` fields in Chinese, equivalent in quality to the pre-change behaviour.
|
||||
|
||||
### Requirement 3: English Translation of the Agent-Config Batch Prompt (Block 3)
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the agent-config batch prompt and system prompt to be authored in English, so that the LLM's per-agent configuration emission is not biased by Chinese-specific behavioural priors when the seed scenario is non-Chinese.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall render the user prompt inside `_generate_agent_configs_batch` containing zero Chinese characters in any string-literal content.
|
||||
2. The Simulation Config Generator shall render the system prompt inside `_generate_agent_configs_batch` containing zero Chinese characters in any string-literal content.
|
||||
3. The Simulation Config Generator shall preserve the JSON output contract of the agent-config batch prompt verbatim by key name: `agent_configs` (list) with sub-keys `agent_id`, `activity_level`, `posts_per_hour`, `comments_per_hour`, `active_hours`, `response_delay_min`, `response_delay_max`, `sentiment_bias`, `stance`, `influence_weight`.
|
||||
4. The Simulation Config Generator shall preserve the variable interpolations `{simulation_requirement}` and the embedded `json.dumps(entity_list, ensure_ascii=False, indent=2)` rendering of the entity list verbatim.
|
||||
5. The Simulation Config Generator shall preserve the per-entity-type heuristic ranges currently embedded in the prompt: officials (University/GovernmentAgency) — low activity 0.1–0.3, work hours, slow response 60–240 min, high influence 2.5–3.0; media (MediaOutlet) — mid activity 0.4–0.6, all-day 8–23, fast response 5–30 min, high influence 2.0–2.5; individuals (Student/Person/Alumni) — high activity 0.6–0.9, evening 18–23, fast response 1–15 min, low influence 0.8–1.2; public figures/experts — mid activity 0.4–0.6, mid-high influence 1.5–2.0.
|
||||
6. The Simulation Config Generator shall preserve the call to `get_language_instruction()` exactly at line ~870, appended after the translated system prompt.
|
||||
7. The Simulation Config Generator shall preserve verbatim the trailing English-only directive on `stance` and JSON-key formatting (currently: `IMPORTANT: The 'stance' field value MUST be one of the English strings: 'supportive', 'opposing', 'neutral', 'observer'. All JSON field names and numeric values must remain unchanged. Only natural language text fields should use the specified language.`). The wording may be lightly normalized so it reads cleanly after a now-English system prompt, but the constraint semantics shall not change.
|
||||
|
||||
### Requirement 4: Locale Switching Continues to Work via `get_language_instruction()`
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: zh` (or any other configured non-English locale), I want the simulation-config output to remain in the requested locale of equivalent quality, so that translating the base prompts does not regress non-English support.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall preserve the three call sites of `get_language_instruction()` at the same line positions (relative to each prompt block) and in the same syntactic form: `system_prompt = f"{system_prompt}\n\n{get_language_instruction()}..."`.
|
||||
2. When the locale is `zh`, the Simulation Config Generator shall produce a `time_config.reasoning`, `event_config.narrative_direction`, `event_config.hot_topics`, `event_config.initial_posts[*].content`, and a final `generation_reasoning` whose natural-language portions are in Chinese.
|
||||
3. When the locale is `en`, the Simulation Config Generator shall produce the same set of natural-language fields in English.
|
||||
4. The Simulation Config Generator shall not alter `backend/app/utils/locale.py`, the `_languages` registry, the `_translations` registry, or any file under `/locales/`.
|
||||
5. Where a locale produces JSON output that is structurally invalid (e.g. a reasoning model emits `<think>` tags), the existing JSON repair logic in `_fix_truncated_json` and `_try_fix_config_json` shall continue to apply unchanged, regardless of prompt language.
|
||||
|
||||
### Requirement 5: Public API and Call-Site Stability
|
||||
|
||||
**Objective:** As a developer maintaining the rest of the MiroFish backend pipeline, I want the public surface of `SimulationConfigGenerator` to remain unchanged, so that the simulation pipeline (Step 3) continues to work without modification.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall preserve the signature of `SimulationConfigGenerator.__init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None, model_name: Optional[str] = None)`.
|
||||
2. The Simulation Config Generator shall preserve the signature of `SimulationConfigGenerator.generate_config(...)` including all parameters and return type.
|
||||
3. The Simulation Config Generator shall preserve the signatures of the private methods `_generate_time_config`, `_generate_event_config`, `_generate_agent_configs_batch`, `_parse_time_config`, `_parse_event_config`, `_assign_initial_post_agents`, `_generate_agent_config_by_rule`, `_call_llm_with_retry`, `_fix_truncated_json`, `_try_fix_config_json`, `_get_default_time_config`, `_build_context`, `_summarize_entities`.
|
||||
4. The Simulation Config Generator shall preserve the dataclass definitions `AgentActivityConfig`, `TimeSimulationConfig`, `EventConfig`, `PlatformConfig`, `SimulationParameters` exactly (no field additions, removals, renames, or default-value changes).
|
||||
5. The Simulation Config Generator shall preserve the class-level constants `MAX_CONTEXT_LENGTH = 50000`, `AGENTS_PER_BATCH = 15`, `TIME_CONFIG_CONTEXT_LENGTH = 10000`, `EVENT_CONFIG_CONTEXT_LENGTH = 8000`, `ENTITY_SUMMARY_LENGTH = 300`, `AGENT_SUMMARY_LENGTH = 300`, `ENTITIES_PER_TYPE_DISPLAY = 20`.
|
||||
6. The Simulation Config Generator shall preserve the LLM invocation parameters in `_call_llm_with_retry`: `response_format={"type": "json_object"}`, `temperature=0.7 - (attempt * 0.1)`, `max_attempts = 3`, no `max_tokens` setting.
|
||||
|
||||
### Requirement 6: Default-Path Output Compatibility
|
||||
|
||||
**Objective:** As a MiroFish operator hitting an LLM-failure fallback path, I want the default `reasoning` strings to remain compatible with downstream consumers, so that translating prompts does not silently break the `generation_reasoning` join or any downstream display.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall continue to produce a non-empty `reasoning` field on the default path returned by `_get_default_time_config` and the exception path of `_generate_event_config`.
|
||||
2. The Simulation Config Generator may translate the two literal default-path `reasoning` strings (`"使用默认中国人作息配置(每轮1小时)"` and `"使用默认配置"`) to English. If translated, both translations shall be locale-agnostic English (no Chinese characters), and both shall remain non-empty.
|
||||
3. The Simulation Config Generator shall preserve the join semantics of `generation_reasoning = " | ".join(reasoning_parts)` — a `" | "` separator with the existing label prefixes contributed by `t('progress.timeConfigLabel')`, `t('progress.eventConfigLabel')`, etc.
|
||||
|
||||
### Requirement 7: Context-Builder Section Headings Translated
|
||||
|
||||
**Objective:** As a MiroFish operator running the pipeline under `Accept-Language: en`, I want the section headings injected into prompts via `_build_context` and `_summarize_entities` to be authored in English, so that the assembled prompt does not interleave English instruction blocks with Chinese section markers, which would otherwise re-introduce the same model-output language bias the prompt translations seek to eliminate.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The Simulation Config Generator shall render the section headings emitted by `_build_context` in English: replacing `## 模拟需求` with an English equivalent (e.g. `## Simulation Requirement`), `## 实体信息 ({n}个)` with `## Entities ({n})`, `## 原始文档内容` with `## Source Document Content`, and the truncation marker `(文档已截断)` with an English equivalent (e.g. `(document truncated)`).
|
||||
2. The Simulation Config Generator shall render the per-entity-type breakdown in `_summarize_entities` in English: replacing `### {entity_type} ({n}个)` with `### {entity_type} ({n})` and the trailing overflow marker `... 还有 {n} 个` with an English equivalent (e.g. `... and {n} more`).
|
||||
3. The Simulation Config Generator shall preserve `entity.name` and `entity.summary` data verbatim in the rendered context (no translation of user-provided content).
|
||||
4. The change to context-builder headings shall not modify the public signatures of `_build_context` or `_summarize_entities`.
|
||||
|
||||
### Requirement 8: End-to-End Step 3 Parity
|
||||
|
||||
**Objective:** As a MiroFish operator validating the change, I want the OASIS subprocess to start cleanly and run at least one round under the English-prompt configuration, so that the translation does not silently degrade the simulation pipeline.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. When a representative seed simulation requirement is processed end-to-end with locale `en`, `SimulationConfigGenerator.generate_config(...)` shall return a fully-populated `SimulationParameters` object (non-empty `agent_configs`, populated `time_config`, populated `event_config`).
|
||||
2. When the resulting `SimulationParameters` is handed to the OASIS subprocess via `simulation_ipc.py`, the subprocess shall start without raising a schema or validation error attributable to the translated prompts.
|
||||
3. When the resulting `SimulationParameters` is handed to the OASIS subprocess, the subprocess shall execute at least one simulation round without erroring on a `stance` not being one of `supportive`/`opposing`/`neutral`/`observer`, or a `poster_type` not matching an available entity type.
|
||||
4. The Simulation Config Generator shall not change the `SimulationParameters.to_dict()` payload shape consumed by the IPC layer (verified via Requirement 5).
|
||||
|
||||
### Requirement 9: Out-of-Scope Surfaces Remain Untouched
|
||||
|
||||
**Objective:** As a reviewer of this PR, I want the change to remain narrowly scoped to prompt-content strings (and the directly related context-builder headings of Requirement 7), so that translation responsibilities for adjacent surfaces (issues #6 and #7) are not absorbed into this change.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The change shall not modify any `logger.info(...)`, `logger.warning(...)`, `logger.error(...)`, or `logger.debug(...)` call in `simulation_config_generator.py` (covered by issue #6).
|
||||
2. The change shall not modify the module docstring at lines 1–11, the class docstring on `SimulationConfigGenerator`, the dataclass docstrings (`AgentActivityConfig`, `TimeSimulationConfig`, `EventConfig`, `PlatformConfig`, `SimulationParameters`), or any inline `#` comment in `simulation_config_generator.py` (covered by issue #7).
|
||||
3. The change shall not modify any file outside `backend/app/services/simulation_config_generator.py` for production code, except for adding test fixtures or scripts under a clearly-isolated directory if a verification harness is needed.
|
||||
4. The change shall not introduce a new dependency or modify `backend/pyproject.toml` / `backend/uv.lock`.
|
||||
5. The change shall not edit `backend/app/config.py`, `backend/app/services/simulation_ipc.py`, `backend/app/services/simulation_runner.py`, `backend/app/utils/locale.py`, or any file under `/locales/`.
|
||||
|
|
@ -0,0 +1,126 @@
|
|||
# Research & Design Decisions — i18n-simulation-config-generator-prompts
|
||||
|
||||
## Summary
|
||||
|
||||
- **Feature**: `i18n-simulation-config-generator-prompts`
|
||||
- **Discovery Scope**: Extension (in-place translation of three prompt blocks plus two helpers in a single file)
|
||||
- **Key Findings**:
|
||||
- Two near-identical sister specs (#2, #3) have already established an in-place translation pattern. Following Option A (in-place) keeps reviewer continuity and matches commits `0806832` and `9d1d29b`.
|
||||
- The actual in-prompt Chinese-character footprint (~898 chars across six string literals + ~75 in two context-builder helpers ≈ 973) is ~3.6× the ticket's "~247 chars" estimate. The ticket undercounted by ignoring system prompts and the `_build_context`/`_summarize_entities` headings that flow into prompts via `{context_truncated}`.
|
||||
- Locale-switching contract (`Accept-Language` → `get_locale()` → `get_language_instruction()`) is unaffected by base-prompt translation; `zh` postfix is `请使用中文回答。`, all other supported locales already use English postfixes. No change to `backend/app/utils/locale.py` or `/locales/*.json` needed.
|
||||
|
||||
## Research Log
|
||||
|
||||
### Topic: Sister-spec implementation patterns (#2, #3)
|
||||
|
||||
- **Context**: Issue #4 explicitly references the same rationale as #5 ("translate prompts even though `get_language_instruction()` exists") and is one of a family of i18n issues. Sister specs already shipped — what pattern did they use?
|
||||
- **Sources Consulted**: `git show 0806832` (issue #2, ontology_generator), `git show 9d1d29b` (issue #3, oasis_profile_generator), `.kiro/specs/i18n-ontology-generator-prompts/requirements.md`, `.kiro/specs/i18n-oasis-profile-generator-prompts/`.
|
||||
- **Findings**:
|
||||
- Both sister specs translated in-place: edit prompt string literals, leave `get_language_instruction()` postfix call sites intact, leave logger/docstrings/comments alone (those are owned by #6/#7).
|
||||
- Both preserved the trailing English `IMPORTANT:` directives that lock identifier formats (`PascalCase`, `snake_case`, `UPPER_SNAKE_CASE`).
|
||||
- Both kept module/class docstrings in Chinese (out of scope per #7).
|
||||
- **Implications**: Use the same pattern. No deviation, no new abstractions, no externalization.
|
||||
|
||||
### Topic: Locale resolution & non-`zh` postfix verification
|
||||
|
||||
- **Context**: R4 requires that locale switching to `zh` continues to produce Chinese output, and to other locales continues to produce locale-appropriate output. Verify that `get_language_instruction()` returns useful postfixes for all supported locales.
|
||||
- **Sources Consulted**: `backend/app/utils/locale.py:66-69`, `locales/languages.json`.
|
||||
- **Findings**:
|
||||
- `languages.json` has 7 locales: `zh`, `en`, `es`, `fr`, `pt`, `ru`, `de`. Each provides a `llmInstruction` postfix in its native language (e.g. `de` → `Bitte antworten Sie auf Deutsch.`).
|
||||
- `get_locale()` reads `Accept-Language` header in request context, or thread-local in background threads; falls back to `zh`. Confirms the existing fallback semantics.
|
||||
- **Implications**: Translating the base prompts to English does not regress non-English support — every other locale already gets a native postfix that biases the model away from English. (Today, every other locale fights *against* a Chinese base prompt. After this change, `zh` is the only locale fighting the base.)
|
||||
|
||||
### Topic: OASIS subprocess JSON contract
|
||||
|
||||
- **Context**: R8 requires Step 3 parity. What does the OASIS subprocess actually consume from `SimulationParameters.to_dict()`?
|
||||
- **Sources Consulted**: `backend/app/services/simulation_config_generator.py:176-197` (`SimulationParameters.to_dict`), grep for `simulation_ipc` consumers.
|
||||
- **Findings**:
|
||||
- `SimulationParameters.to_dict()` returns a flat dict with `simulation_id`, `project_id`, `graph_id`, `simulation_requirement`, `time_config` (dict), `agent_configs` (list of dicts), `event_config` (dict with `initial_posts`, `scheduled_events`, `hot_topics`, `narrative_direction`), `twitter_config`, `reddit_config`, `llm_model`, `llm_base_url`, `generated_at`, `generation_reasoning`.
|
||||
- Field types and shapes are entirely structural — no language-conditioned parsing exists in the consumer side.
|
||||
- **Implications**: Translation only changes the **content** of natural-language string fields; **shape** is untouched. Step 3 parity is a verification concern, not a design concern.
|
||||
|
||||
### Topic: Verification depth — fixture vs. live
|
||||
|
||||
- **Context**: R8 acceptance is "OASIS subprocess starts cleanly and runs at least one round." Sandboxed run unlikely to have the live infra (Neo4j, LLM key, OASIS workers).
|
||||
- **Sources Consulted**: Sister-spec PRs (no live e2e captured in their commit messages or test additions); steering-tech.md (test posture: "coverage is intentionally minimal — don't add a heavy test harness without discussing scope").
|
||||
- **Findings**:
|
||||
- Sister specs shipped without live e2e tests. The verification was static: zero-Chinese assertion in the touched strings, plus reviewer trust on the JSON shape preservation.
|
||||
- The repo intentionally avoids heavy test scaffolding.
|
||||
- **Implications**: Use a fixture-based static verification:
|
||||
1. `python -m py_compile backend/app/services/simulation_config_generator.py` (compile pass).
|
||||
2. Regex `[一-鿿]` over the six prompt-string literals and the two context-builder f-strings — must yield zero matches.
|
||||
3. Construct stub entities and call `_build_context`, `_summarize_entities`, plus the three prompt-rendering paths (mocking the LLM client) — confirm every expected interpolation is present in the rendered prompt and no `KeyError` on missing variables.
|
||||
- This avoids depending on live infra while still catching every regression mode the requirements care about.
|
||||
|
||||
## Architecture Pattern Evaluation
|
||||
|
||||
| Option | Description | Strengths | Risks / Limitations | Notes |
|
||||
|--------|-------------|-----------|---------------------|-------|
|
||||
| A — In-place translation | Edit string literals directly in `simulation_config_generator.py` | Matches sister-spec precedent (#2, #3); minimum diff; minimum risk | Translations baked in; harder to localize beyond `zh`/`en` later (but `get_language_instruction()` postfix already handles that) | **Selected** |
|
||||
| B — Externalize prompts to `/locales/` | Move prompts into JSON locale files | Genuine multi-locale prompt support | Departs from sister-spec pattern; out-of-scope per ticket guardrails ("no refactoring"); requires templating choice; brittle for `json.dumps`-shaped interpolations | Rejected |
|
||||
| C — Hybrid: in-place + indirection helper | Translate in-place, add a thin `_load_prompt()` indirection for future externalization | Sets future migration path | Premature abstraction; larger diff; no caller variance today | Rejected |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Decision: Adopt Option A (in-place translation)
|
||||
|
||||
- **Context**: How to translate three Chinese prompt blocks plus two context-builder helpers without breaking locale switching, public API, or downstream OASIS consumption.
|
||||
- **Alternatives Considered**:
|
||||
1. Option A — In-place translation (sister-spec pattern).
|
||||
2. Option B — Externalize to `/locales/`.
|
||||
3. Option C — Hybrid with a no-op indirection.
|
||||
- **Selected Approach**: Option A. Translate the six prompt literals and the two context-builder helper bodies directly. Leave `get_language_instruction()` call sites and the trailing English `IMPORTANT:` directives intact (light wording polish allowed for grammatical flow but constraint semantics unchanged). Logger calls, docstrings, comments untouched.
|
||||
- **Rationale**: Pattern-consistent with #2 and #3, minimum-risk text edit, preserves all behavioural contracts, and produces the smallest reviewable diff.
|
||||
- **Trade-offs**: Future locales beyond `en`/`zh` continue to rely on the `get_language_instruction()` postfix to bias output — same as the current state, not a regression.
|
||||
- **Follow-up**: Confirm via fixture that prompt rendering produces no `KeyError` and zero Chinese in the targeted regions; reviewer self-check on prompt wording quality.
|
||||
|
||||
### Decision: Translate context-builder section headings (R7)
|
||||
|
||||
- **Context**: `_build_context` and `_summarize_entities` emit Chinese section headings (`## 模拟需求`, `### {entity_type} ({n}个)`, etc.) that are interpolated into all three prompts via `{context_truncated}`. Leaving them Chinese re-introduces the bias the prompt translations are designed to remove.
|
||||
- **Alternatives Considered**:
|
||||
1. Translate only the six prompt literals (literal interpretation of ticket scope).
|
||||
2. Translate prompts + context-builder headings (functional interpretation of ticket goal).
|
||||
- **Selected Approach**: Option 2 — translate context-builder headings as part of this spec. The headings have no public surface; they are internal to prompt assembly.
|
||||
- **Rationale**: Acceptance criterion 1 of issue #4 (no Chinese characters in any prompt string literal) is interpreted strictly only at the prompt-block level by the literal text of the ticket — but the spirit of the ticket (English output under `Accept-Language: en`) demands the helpers be translated too. Sister specs (#2, #3) made the same call for their analogous helpers.
|
||||
- **Trade-offs**: Slightly larger diff than the literal ticket scope; offset by avoiding a follow-up "we missed this" issue.
|
||||
- **Follow-up**: Note in the PR body that R7 expands literal ticket scope by ~75 chars across two helpers, with the rationale above.
|
||||
|
||||
### Decision: Translate the two default-path `reasoning` strings (R6)
|
||||
|
||||
- **Context**: `_get_default_time_config` emits `"使用默认中国人作息配置(每轮1小时)"` as a `reasoning` value, and the `_generate_event_config` exception path emits `"使用默认配置"`. These are static literals (not LLM output) and are joined into the user-visible `generation_reasoning`.
|
||||
- **Alternatives Considered**:
|
||||
1. Leave both Chinese (literal scope per #6/#7 split).
|
||||
2. Translate both to locale-agnostic English.
|
||||
3. Wire both through `t('progress.*')` for full i18n.
|
||||
- **Selected Approach**: Option 2 — translate to locale-agnostic English literals (`"Default circadian-pattern config (1h per round)"` and `"Used default config"`). They are not log lines (those are #6's domain) — they are user-facing string values returned in a JSON payload. Wiring them through `t()` is a refactor and likely belongs to a future broader i18n pass.
|
||||
- **Rationale**: Avoids a forever-Chinese leak into a `generation_reasoning` joined with otherwise-English content. Locale-agnostic English is the lowest-overhead solution for a fallback-only path.
|
||||
- **Trade-offs**: Under `zh`, these two strings appear in English in `generation_reasoning` only on the failure path. Acceptable: the failure path is rare and the rest of the joined `reasoning` already mixes label-translated and LLM-output content.
|
||||
- **Follow-up**: None.
|
||||
|
||||
### Decision: Light wording polish of the `IMPORTANT:` directive lines
|
||||
|
||||
- **Context**: Lines 706 and 870 currently glue an English `IMPORTANT:` directive onto a Chinese system prompt with `f"{system_prompt}\n\n{get_language_instruction()}\nIMPORTANT: ..."`. After translating the system prompts to English, the `IMPORTANT:` directive can either remain verbatim or be merged into the system prompt for cleaner flow.
|
||||
- **Alternatives Considered**:
|
||||
1. Keep the directive lines exactly as-is (preserves byte-for-byte behaviour).
|
||||
2. Lightly polish wording for grammatical flow with the now-English system prompt.
|
||||
3. Merge the directive into the system prompt body.
|
||||
- **Selected Approach**: Option 1 with a small caveat — keep the directives verbatim, including the existing English wording. The constraint semantics (PascalCase `poster_type`; `stance` ∈ {`supportive`, `opposing`, `neutral`, `observer`}) MUST not change. If a reviewer requests Option 2, the wording polish can be applied with the constraint semantics held constant.
|
||||
- **Rationale**: Minimum-diff principle. The directives already work; touching them adds risk for no functional gain.
|
||||
- **Trade-offs**: Slight grammatical awkwardness from concatenating an English system prompt with another English directive. Cosmetic only.
|
||||
- **Follow-up**: None unless a reviewer flags wording.
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
- **Risk**: Reviewer interprets ticket scope strictly and rejects the context-builder translation (R7). **Mitigation**: Document the rationale explicitly in the PR body and reference the sister-spec precedent. If rejected, revert just the helper changes — they are isolated edits.
|
||||
- **Risk**: LLM produces lower-quality output for `zh` locale because the base prompt is now English (model has to do more work to honour the `请使用中文回答。` postfix). **Mitigation**: Sister specs (#2, #3) shipped without observed regressions. If a regression is reported, the fix is to expand the postfix's locale instruction, not to revert this spec.
|
||||
- **Risk**: Light wording polish of the `IMPORTANT:` directive accidentally drops the `'supportive'/'opposing'/'neutral'/'observer'` enum or the PascalCase requirement, causing the OASIS subprocess to reject `stance` or `poster_type` values. **Mitigation**: R3.7 and R2.6 explicitly forbid changing constraint semantics; the implementation will keep these directives as verbatim string literals.
|
||||
- **Risk**: A new prompt-time interpolation is missed during translation, producing a `KeyError` at runtime. **Mitigation**: Fixture-based render check (Decision: verification depth) will catch this before commit.
|
||||
|
||||
## References
|
||||
|
||||
- Sister-spec implementations:
|
||||
- `git show 0806832` — `feat(i18n): translate ontology_generator prompts to english` (#2).
|
||||
- `git show 9d1d29b` — `feat(i18n): translate oasis_profile_generator prompts to english` (#3).
|
||||
- Sister-spec planning artefacts: `.kiro/specs/i18n-ontology-generator-prompts/`, `.kiro/specs/i18n-oasis-profile-generator-prompts/`.
|
||||
- Locale layer: `backend/app/utils/locale.py`, `locales/languages.json`, `locales/en.json`, `locales/zh.json`.
|
||||
- Steering: `.kiro/steering/tech.md` (i18n notes; "no enforced linter or formatter; preserve mixed Chinese/English in comments unless asked"); `.kiro/steering/structure.md`.
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
{
|
||||
"feature_name": "i18n-simulation-config-generator-prompts",
|
||||
"created_at": "2026-05-07T11:24:30Z",
|
||||
"updated_at": "2026-05-07T11:55:00Z",
|
||||
"language": "en",
|
||||
"phase": "tasks-generated",
|
||||
"ticket": 4,
|
||||
"approvals": {
|
||||
"requirements": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"design": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"tasks": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
}
|
||||
},
|
||||
"ready_for_implementation": true
|
||||
}
|
||||
|
|
@ -0,0 +1,109 @@
|
|||
# Implementation Plan
|
||||
|
||||
## 1. Foundation: confirm scope and stage a verification harness
|
||||
|
||||
- [x] 1.1 Stage a one-shot verification harness for prompt-string content
|
||||
- Add a small, isolated verification script (placed under `backend/scripts/` so it can be removed in a follow-up if undesired) that, given the path to `simulation_config_generator.py`, asserts: (a) the file compiles, (b) the six prompt regions and the two prompt-feeding helper bodies contain zero `[一-鿿]` matches, (c) the trailing `IMPORTANT:` directives on the event-config and agent-config system prompts are present byte-equal as documented in design.md.
|
||||
- Wire the script to be runnable via `cd backend && uv run python scripts/verify_simulation_config_prompts.py`.
|
||||
- Observable completion: running the script before any translation prints concrete failures (block 1 user prompt: 417 zh chars, etc.) so the operator can see the harness works; after translation it prints "all checks passed" and exits 0.
|
||||
- _Requirements: 1.1, 1.2, 2.1, 2.2, 3.1, 3.2, 7.1, 7.2_
|
||||
|
||||
## 2. Core: translate context-builder helpers (prompt-feeding inputs)
|
||||
|
||||
- [x] 2.1 Translate `_build_context` section headings to English
|
||||
- Replace the four Chinese strings inside the `_build_context` f-string list (`## 模拟需求`, `## 实体信息 ({n}个)`, `## 原始文档内容`, `(文档已截断)`) with English equivalents that read naturally for a native-English reader and preserve the markdown heading structure.
|
||||
- Preserve every interpolation: `{simulation_requirement}`, `{len(entities)}`, `{entity_summary}`, `{doc_text}`. Preserve the truncation logic and the 500-character buffer.
|
||||
- Observable completion: calling `_build_context(...)` with stub inputs returns a string whose section headings are English, whose entity-name and document content portions remain user-data verbatim, and whose total length math is unchanged.
|
||||
- _Requirements: 7.1, 7.3, 7.4_
|
||||
- _Boundary: simulation_config_generator._build_context_
|
||||
|
||||
- [x] 2.2 (P) Translate `_summarize_entities` headings and overflow marker to English
|
||||
- Replace `### {entity_type} ({len(type_entities)}个)` and `... 还有 {n} 个` with English equivalents (e.g. `### {entity_type} ({len(type_entities)})` and `... and {n} more`). Preserve the per-type display-count limit and the summary-length truncation logic.
|
||||
- Preserve `entity.name` and `entity.summary` data passthrough verbatim.
|
||||
- Observable completion: calling `_summarize_entities(...)` with a stub list of two entity types yields English headings and the existing per-entity name + summary lines.
|
||||
- _Requirements: 7.2, 7.3, 7.4_
|
||||
- _Boundary: simulation_config_generator._summarize_entities_
|
||||
|
||||
## 3. Core: translate the three prompt blocks
|
||||
|
||||
- [x] 3.1 (P) Translate the time-configuration prompt and system prompt to English
|
||||
- Rewrite the user-prompt f-string body in `_generate_time_config` (the block currently spanning lines ~543–586) to English while keeping every JSON-schema key (`total_simulation_hours`, `minutes_per_round`, `agents_per_hour_min`, `agents_per_hour_max`, `peak_hours`, `off_peak_hours`, `morning_hours`, `work_hours`, `reasoning`), the per-field numeric ranges (24–168 / 30–120 / 1–`max_agents_allowed`), and the UTC+8 reference example.
|
||||
- Rewrite the system-prompt literal (line 588) to English. Leave the `get_language_instruction()` postfix injection at line 589 untouched.
|
||||
- Preserve `{context_truncated}` and `{max_agents_allowed}` verbatim.
|
||||
- Observable completion: harness from 1.1 reports zero Chinese in the time-config user prompt and system prompt; calling `_generate_time_config(...)` with a mocked `_call_llm_with_retry` renders a prompt containing both interpolations.
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 4.1_
|
||||
- _Boundary: simulation_config_generator._generate_time_config_
|
||||
|
||||
- [x] 3.2 (P) Translate the event-configuration prompt and system prompt to English
|
||||
- Rewrite the user-prompt f-string body in `_generate_event_config` to English while keeping every JSON-schema key (`hot_topics`, `narrative_direction`, `initial_posts[].content`, `initial_posts[].poster_type`, `reasoning`) and the type-to-author example pairings (Official/University → official statements, MediaOutlet → news, Student → student opinions).
|
||||
- Rewrite the system-prompt literal (line 705) to English. Leave the `get_language_instruction()` postfix injection at line 706 untouched and **keep the trailing English `IMPORTANT: The 'poster_type' field value MUST be in English PascalCase exactly matching the available entity types. Only 'content', 'narrative_direction', 'hot_topics' and 'reasoning' fields should use the specified language.` clause byte-equal**.
|
||||
- Preserve `{simulation_requirement}`, `{context_truncated}`, `{type_info}` verbatim.
|
||||
- Observable completion: harness reports zero Chinese in the event-config user prompt and system prompt; the byte-equal `IMPORTANT:` clause check passes.
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 4.1_
|
||||
- _Boundary: simulation_config_generator._generate_event_config_
|
||||
|
||||
- [x] 3.3 (P) Translate the agent-config batch prompt and system prompt to English
|
||||
- Rewrite the user-prompt f-string body in `_generate_agent_configs_batch` to English while keeping every JSON-schema key (`agent_configs[].agent_id`, `activity_level`, `posts_per_hour`, `comments_per_hour`, `active_hours`, `response_delay_min`, `response_delay_max`, `sentiment_bias`, `stance`, `influence_weight`).
|
||||
- Preserve the four per-entity-type heuristic ranges as documented in design.md §Components: officials (University/GovernmentAgency) → low activity 0.1–0.3, work hours, slow response 60–240 min, high influence 2.5–3.0; media (MediaOutlet) → mid activity 0.4–0.6, all-day 8–23, fast response 5–30 min, high influence 2.0–2.5; individuals (Student/Person/Alumni) → high activity 0.6–0.9, evening 18–23, fast response 1–15 min, low influence 0.8–1.2; public figures/experts → mid activity 0.4–0.6, mid-high influence 1.5–2.0.
|
||||
- Rewrite the system-prompt literal (line 869) to English. Leave the `get_language_instruction()` postfix injection at line 870 untouched and **keep the trailing English `IMPORTANT: The 'stance' field value MUST be one of the English strings: 'supportive', 'opposing', 'neutral', 'observer'. All JSON field names and numeric values must remain unchanged. Only natural language text fields should use the specified language.` clause byte-equal**.
|
||||
- Preserve `{simulation_requirement}` and `{json.dumps(entity_list, ensure_ascii=False, indent=2)}` interpolations verbatim.
|
||||
- Observable completion: harness reports zero Chinese in the agent-config user prompt and system prompt; the byte-equal `IMPORTANT:` clause check passes.
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 4.1_
|
||||
- _Boundary: simulation_config_generator._generate_agent_configs_batch_
|
||||
|
||||
## 4. Core: translate the two default-path `reasoning` literals
|
||||
|
||||
- [x] 4.1 Translate the `_get_default_time_config` reasoning literal to English
|
||||
- Replace the static literal `"使用默认中国人作息配置(每轮1小时)"` (line 608) with a locale-agnostic English equivalent (e.g. `"Default circadian-pattern config (1h per round)"`).
|
||||
- Do not change any other field of the returned dict; do not change the method signature; do not introduce locale lookup.
|
||||
- Observable completion: calling `_get_default_time_config(num_entities=10)` returns a dict whose `reasoning` value is locale-agnostic English and whose other eight numeric/array fields are unchanged.
|
||||
- _Requirements: 6.1, 6.2_
|
||||
|
||||
- [x] 4.2 Translate the `_generate_event_config` exception-path reasoning literal to English
|
||||
- Replace the static literal `"使用默认配置"` inside the `_generate_event_config` exception fallback (line 716) with a locale-agnostic English equivalent (e.g. `"Used default config"`).
|
||||
- Preserve the rest of the fallback dict shape (`hot_topics: []`, `narrative_direction: ""`, `initial_posts: []`, `reasoning: <english>`).
|
||||
- Observable completion: forcing the LLM call to raise (e.g. via mock) returns a dict whose `reasoning` is locale-agnostic English and whose other three keys are intact.
|
||||
- _Requirements: 6.1, 6.2_
|
||||
|
||||
## 5. Validation: locale and integration checks
|
||||
|
||||
- [x] 5.1 Confirm `get_language_instruction()` call sites are byte-equal at lines 589, 706, 870
|
||||
- After translation, run the harness from 1.1; it must verify that the three `system_prompt = f"{system_prompt}\n\n{get_language_instruction()}..."` injection lines remain unchanged in form (the only allowed deltas are inside `system_prompt` itself, which the harness already covered).
|
||||
- Observable completion: harness prints a "locale-postfix injection unchanged at lines 589/706/870" line and exits 0.
|
||||
- _Requirements: 1.7, 2.5, 3.6, 4.1, 4.5_
|
||||
- _Depends: 3.1, 3.2, 3.3_
|
||||
|
||||
- [x] 5.2 Confirm public-API and constants are byte-stable
|
||||
- Programmatically inspect the module after translation and confirm: `SimulationConfigGenerator.__init__`, `generate_config`, `_generate_time_config`, `_generate_event_config`, `_generate_agent_configs_batch`, `_parse_time_config`, `_parse_event_config`, `_assign_initial_post_agents`, `_generate_agent_config_by_rule`, `_call_llm_with_retry`, `_fix_truncated_json`, `_try_fix_config_json`, `_get_default_time_config`, `_build_context`, `_summarize_entities` all retain their existing parameter names and return annotations; the dataclasses (`AgentActivityConfig`, `TimeSimulationConfig`, `EventConfig`, `PlatformConfig`, `SimulationParameters`) are unchanged; the class-level constants `MAX_CONTEXT_LENGTH = 50000`, `AGENTS_PER_BATCH = 15`, `TIME_CONFIG_CONTEXT_LENGTH = 10000`, `EVENT_CONFIG_CONTEXT_LENGTH = 8000`, `ENTITY_SUMMARY_LENGTH = 300`, `AGENT_SUMMARY_LENGTH = 300`, `ENTITIES_PER_TYPE_DISPLAY = 20` are unchanged.
|
||||
- Inspection can be by `inspect.signature` checks plus `re.search` for the constant declarations.
|
||||
- Observable completion: a single signature/constant-stability check runs from the harness and prints "public surface stable" before exit.
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6_
|
||||
- _Depends: 3.1, 3.2, 3.3_
|
||||
|
||||
- [x] 5.3 Confirm out-of-scope guardrails: logger calls, docstrings, comments, adjacent files
|
||||
- Run a targeted check that confirms: `logger.info`/`logger.warning`/`logger.error`/`logger.debug` call lines retain their pre-existing Chinese content (no translation creep into #6's scope); `"""..."""` docstrings (module, class, dataclasses, methods) retain their pre-existing Chinese content (no translation creep into #7's scope); `git status` shows only `backend/app/services/simulation_config_generator.py` (and optionally `backend/scripts/verify_simulation_config_prompts.py`) modified — no edits to `backend/app/config.py`, `backend/app/services/simulation_ipc.py`, `backend/app/services/simulation_runner.py`, `backend/app/utils/locale.py`, `/locales/`, `backend/pyproject.toml`, or `backend/uv.lock`.
|
||||
- Observable completion: a check prints "out-of-scope guardrails respected" listing the count of Chinese chars remaining in logger lines (>0 expected) and in docstrings (>0 expected) as positive indicators; `git status` is clean except for the two allowed paths.
|
||||
- _Requirements: 9.1, 9.2, 9.3, 9.4, 9.5_
|
||||
- _Depends: 3.1, 3.2, 3.3, 4.1, 4.2_
|
||||
|
||||
- [x] 5.4 Locale-switching smoke test: `en` and `zh`
|
||||
- Sandbox lacks runtime dependencies (flask, openai, camel-ai stack — `tiktoken` requires a Rust compiler that is not available here). Substituted runtime smoke with **static evidence** that locale switching is preserved: (a) harness check confirms `get_language_instruction()` call-site count is exactly 3; (b) harness check confirms the time-config postfix injection line is byte-equal; (c) harness confirms both `IMPORTANT:` clauses are byte-equal at lines 706 and 870; (d) `git status` confirms `backend/app/utils/locale.py` and `locales/*.json` are unchanged. Together these guarantee that `set_locale('en')` continues to append `Please respond in English.` and `set_locale('zh')` continues to append `请使用中文回答。` at the same call sites with no semantic delta. Sister specs (#2, #3) used the same static-only posture.
|
||||
- Observable completion: harness exits 0 with all three of those checks reported as PASS.
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||||
- _Depends: 3.1, 3.2, 3.3_
|
||||
|
||||
- [ ] 5.5* Optional fixture-based JSON-shape parity check
|
||||
- Build a stub `entities` list with three `EntityNode` instances (Student, MediaOutlet, Official) and a stub `simulation_requirement`. Patch `_call_llm_with_retry` to return realistic well-shaped JSON dicts for each of the three calls. Run `generate_config(...)` end-to-end. Assert that the returned `SimulationParameters.to_dict()` payload contains all 13 expected top-level keys (`simulation_id`, `project_id`, `graph_id`, `simulation_requirement`, `time_config`, `agent_configs`, `event_config`, `twitter_config`, `reddit_config`, `llm_model`, `llm_base_url`, `generated_at`, `generation_reasoning`).
|
||||
- Confirms R8 functional coverage without depending on a live OASIS subprocess. Marked optional because R5 + R8.4 already lock the shape stability via guard checks (5.2) and design-level reasoning; this is auxiliary belt-and-braces test coverage.
|
||||
- Observable completion: a single fixture-based test prints the asdict output and asserts all 13 keys present; exits 0.
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4_
|
||||
- _Depends: 3.1, 3.2, 3.3, 4.1, 4.2_
|
||||
|
||||
## 6. Cleanup
|
||||
|
||||
- [x] 6.1 Remove or move the verification harness as appropriate
|
||||
- If the verification harness from 1.1 is intended as a one-shot check, delete `backend/scripts/verify_simulation_config_prompts.py` after the implementation passes its checks. If it is intended as a permanent regression test, keep it under `backend/scripts/` and ensure it is callable via `uv run python scripts/verify_simulation_config_prompts.py` with no test framework required.
|
||||
- Decision rule: keep the harness only if it costs less than 30 lines and reads as a usable smoke check; otherwise remove it. Sister specs (#2, #3) shipped without permanent harnesses, so the default is "remove."
|
||||
- Observable completion: `git status` shows only `backend/app/services/simulation_config_generator.py` modified, with no harness artefacts left behind (preferred); or, if kept, the harness lives under `backend/scripts/` with a one-line module docstring linking back to spec `i18n-simulation-config-generator-prompts`.
|
||||
- _Requirements: 9.3_
|
||||
- _Depends: 5.1, 5.2, 5.3, 5.4_
|
||||
|
|
@ -388,22 +388,22 @@ class SimulationConfigGenerator:
|
|||
|
||||
# 实体摘要
|
||||
entity_summary = self._summarize_entities(entities)
|
||||
|
||||
|
||||
# 构建上下文
|
||||
context_parts = [
|
||||
f"## 模拟需求\n{simulation_requirement}",
|
||||
f"\n## 实体信息 ({len(entities)}个)\n{entity_summary}",
|
||||
f"## Simulation Requirement\n{simulation_requirement}",
|
||||
f"\n## Entities ({len(entities)})\n{entity_summary}",
|
||||
]
|
||||
|
||||
|
||||
current_length = sum(len(p) for p in context_parts)
|
||||
remaining_length = self.MAX_CONTEXT_LENGTH - current_length - 500 # 留500字符余量
|
||||
|
||||
|
||||
if remaining_length > 0 and document_text:
|
||||
doc_text = document_text[:remaining_length]
|
||||
if len(document_text) > remaining_length:
|
||||
doc_text += "\n...(文档已截断)"
|
||||
context_parts.append(f"\n## 原始文档内容\n{doc_text}")
|
||||
|
||||
doc_text += "\n...(document truncated)"
|
||||
context_parts.append(f"\n## Source Document Content\n{doc_text}")
|
||||
|
||||
return "\n".join(context_parts)
|
||||
|
||||
def _summarize_entities(self, entities: List[EntityNode]) -> str:
|
||||
|
|
@ -419,7 +419,7 @@ class SimulationConfigGenerator:
|
|||
by_type[t].append(e)
|
||||
|
||||
for entity_type, type_entities in by_type.items():
|
||||
lines.append(f"\n### {entity_type} ({len(type_entities)}个)")
|
||||
lines.append(f"\n### {entity_type} ({len(type_entities)})")
|
||||
# 使用配置的显示数量和摘要长度
|
||||
display_count = self.ENTITIES_PER_TYPE_DISPLAY
|
||||
summary_len = self.ENTITY_SUMMARY_LENGTH
|
||||
|
|
@ -427,7 +427,7 @@ class SimulationConfigGenerator:
|
|||
summary_preview = (e.summary[:summary_len] + "...") if len(e.summary) > summary_len else e.summary
|
||||
lines.append(f"- {e.name}: {summary_preview}")
|
||||
if len(type_entities) > display_count:
|
||||
lines.append(f" ... 还有 {len(type_entities) - display_count} 个")
|
||||
lines.append(f" ... and {len(type_entities) - display_count} more")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
|
@ -540,28 +540,28 @@ class SimulationConfigGenerator:
|
|||
# 计算最大允许值(80%的agent数)
|
||||
max_agents_allowed = max(1, int(num_entities * 0.9))
|
||||
|
||||
prompt = f"""基于以下模拟需求,生成时间模拟配置。
|
||||
prompt = f"""Based on the simulation requirement below, generate a time-simulation configuration.
|
||||
|
||||
{context_truncated}
|
||||
|
||||
## 任务
|
||||
请生成时间配置JSON。
|
||||
## Task
|
||||
Produce a time-configuration JSON.
|
||||
|
||||
### 基本原则(仅供参考,需根据具体事件和参与群体灵活调整):
|
||||
- 请根据模拟场景推断目标用户群体所在时区和作息习惯,以下为东八区(UTC+8)的参考示例
|
||||
- 凌晨0-5点几乎无人活动(活跃度系数0.05)
|
||||
- 早上6-8点逐渐活跃(活跃度系数0.4)
|
||||
- 工作时间9-18点中等活跃(活跃度系数0.7)
|
||||
- 晚间19-22点是高峰期(活跃度系数1.5)
|
||||
- 23点后活跃度下降(活跃度系数0.5)
|
||||
- 一般规律:凌晨低活跃、早间渐增、工作时段中等、晚间高峰
|
||||
- **重要**:以下示例值仅供参考,你需要根据事件性质、参与群体特点来调整具体时段
|
||||
- 例如:学生群体高峰可能是21-23点;媒体全天活跃;官方机构只在工作时间
|
||||
- 例如:突发热点可能导致深夜也有讨论,off_peak_hours 可适当缩短
|
||||
### Guiding principles (illustrative only — adapt to the specific event and audience):
|
||||
- Infer the timezone and daily rhythm of the target audience from the simulation scenario. The following are example values for the UTC+8 timezone.
|
||||
- 00:00-05:00: almost no activity (activity multiplier 0.05).
|
||||
- 06:00-08:00: gradually waking up (activity multiplier 0.4).
|
||||
- 09:00-18:00: working hours, moderate activity (activity multiplier 0.7).
|
||||
- 19:00-22:00: evening peak (activity multiplier 1.5).
|
||||
- After 23:00: activity declines (activity multiplier 0.5).
|
||||
- General rule of thumb: low overnight, ramping up in the morning, moderate during working hours, peaking in the evening.
|
||||
- **Important**: the example values above are only a reference. Tailor the schedule to the nature of the event and the audience's habits.
|
||||
- For example: a student-heavy audience may peak from 21:00-23:00; news outlets may stay active all day; official agencies are only active during working hours.
|
||||
- For example: a breaking-news topic may keep discussion going late at night, in which case off_peak_hours can be shortened.
|
||||
|
||||
### 返回JSON格式(不要markdown)
|
||||
### Return strict JSON (no markdown)
|
||||
|
||||
示例:
|
||||
Example:
|
||||
{{
|
||||
"total_simulation_hours": 72,
|
||||
"minutes_per_round": 60,
|
||||
|
|
@ -571,21 +571,21 @@ class SimulationConfigGenerator:
|
|||
"off_peak_hours": [0, 1, 2, 3, 4, 5],
|
||||
"morning_hours": [6, 7, 8],
|
||||
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
|
||||
"reasoning": "针对该事件的时间配置说明"
|
||||
"reasoning": "Time configuration rationale for this event"
|
||||
}}
|
||||
|
||||
字段说明:
|
||||
- total_simulation_hours (int): 模拟总时长,24-168小时,突发事件短、持续话题长
|
||||
- minutes_per_round (int): 每轮时长,30-120分钟,建议60分钟
|
||||
- agents_per_hour_min (int): 每小时最少激活Agent数(取值范围: 1-{max_agents_allowed})
|
||||
- agents_per_hour_max (int): 每小时最多激活Agent数(取值范围: 1-{max_agents_allowed})
|
||||
- peak_hours (int数组): 高峰时段,根据事件参与群体调整
|
||||
- off_peak_hours (int数组): 低谷时段,通常深夜凌晨
|
||||
- morning_hours (int数组): 早间时段
|
||||
- work_hours (int数组): 工作时段
|
||||
- reasoning (string): 简要说明为什么这样配置"""
|
||||
Field guide:
|
||||
- total_simulation_hours (int): total simulated duration, 24-168 hours; short for breaking events, long for sustained topics.
|
||||
- minutes_per_round (int): minutes per round, 30-120; recommended 60.
|
||||
- agents_per_hour_min (int): minimum number of agents activated per hour (allowed range: 1-{max_agents_allowed}).
|
||||
- agents_per_hour_max (int): maximum number of agents activated per hour (allowed range: 1-{max_agents_allowed}).
|
||||
- peak_hours (int array): peak hours, adjusted to the audience.
|
||||
- off_peak_hours (int array): off-peak hours, typically overnight.
|
||||
- morning_hours (int array): morning hours.
|
||||
- work_hours (int array): working hours.
|
||||
- reasoning (string): brief explanation of why this configuration was chosen."""
|
||||
|
||||
system_prompt = "你是社交媒体模拟专家。返回纯JSON格式,时间配置需符合模拟场景中目标用户群体的作息习惯。"
|
||||
system_prompt = "You are a social-media simulation expert. Return plain JSON. The time configuration should match the daily rhythm of the simulation's target audience."
|
||||
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}"
|
||||
|
||||
try:
|
||||
|
|
@ -605,7 +605,7 @@ class SimulationConfigGenerator:
|
|||
"off_peak_hours": [0, 1, 2, 3, 4, 5],
|
||||
"morning_hours": [6, 7, 8],
|
||||
"work_hours": [9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
|
||||
"reasoning": "使用默认中国人作息配置(每轮1小时)"
|
||||
"reasoning": "Default circadian-pattern config (1h per round)"
|
||||
}
|
||||
|
||||
def _parse_time_config(self, result: Dict[str, Any], num_entities: int) -> TimeSimulationConfig:
|
||||
|
|
@ -673,36 +673,36 @@ class SimulationConfigGenerator:
|
|||
# 使用配置的上下文截断长度
|
||||
context_truncated = context[:self.EVENT_CONFIG_CONTEXT_LENGTH]
|
||||
|
||||
prompt = f"""基于以下模拟需求,生成事件配置。
|
||||
prompt = f"""Based on the simulation requirement below, generate an event configuration.
|
||||
|
||||
模拟需求: {simulation_requirement}
|
||||
Simulation requirement: {simulation_requirement}
|
||||
|
||||
{context_truncated}
|
||||
|
||||
## 可用实体类型及示例
|
||||
## Available entity types and examples
|
||||
{type_info}
|
||||
|
||||
## 任务
|
||||
请生成事件配置JSON:
|
||||
- 提取热点话题关键词
|
||||
- 描述舆论发展方向
|
||||
- 设计初始帖子内容,**每个帖子必须指定 poster_type(发布者类型)**
|
||||
## Task
|
||||
Produce an event-configuration JSON:
|
||||
- Extract hot-topic keywords.
|
||||
- Describe the direction in which public opinion is expected to evolve.
|
||||
- Design the initial posts. **Every post must specify a poster_type (the type of the entity that publishes it).**
|
||||
|
||||
**重要**: poster_type 必须从上面的"可用实体类型"中选择,这样初始帖子才能分配给合适的 Agent 发布。
|
||||
例如:官方声明应由 Official/University 类型发布,新闻由 MediaOutlet 发布,学生观点由 Student 发布。
|
||||
**Important**: poster_type MUST be one of the values listed in "Available entity types" above so each initial post can be assigned to an appropriate agent.
|
||||
For example: official statements should be published by Official/University, news by MediaOutlet, student opinions by Student.
|
||||
|
||||
返回JSON格式(不要markdown):
|
||||
Return strict JSON (no markdown):
|
||||
{{
|
||||
"hot_topics": ["关键词1", "关键词2", ...],
|
||||
"narrative_direction": "<舆论发展方向描述>",
|
||||
"hot_topics": ["keyword 1", "keyword 2", ...],
|
||||
"narrative_direction": "<description of how public opinion is expected to evolve>",
|
||||
"initial_posts": [
|
||||
{{"content": "帖子内容", "poster_type": "实体类型(必须从可用类型中选择)"}},
|
||||
{{"content": "post content", "poster_type": "entity type (must be one of the available types)"}},
|
||||
...
|
||||
],
|
||||
"reasoning": "<简要说明>"
|
||||
"reasoning": "<brief rationale>"
|
||||
}}"""
|
||||
|
||||
system_prompt = "你是舆论分析专家。返回纯JSON格式。注意 poster_type 必须精确匹配可用实体类型。"
|
||||
system_prompt = "You are a public-opinion analyst. Return plain JSON. Note that poster_type must exactly match one of the available entity types."
|
||||
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}\nIMPORTANT: The 'poster_type' field value MUST be in English PascalCase exactly matching the available entity types. Only 'content', 'narrative_direction', 'hot_topics' and 'reasoning' fields should use the specified language."
|
||||
|
||||
try:
|
||||
|
|
@ -713,7 +713,7 @@ class SimulationConfigGenerator:
|
|||
"hot_topics": [],
|
||||
"narrative_direction": "",
|
||||
"initial_posts": [],
|
||||
"reasoning": "使用默认配置"
|
||||
"reasoning": "Used default config"
|
||||
}
|
||||
|
||||
def _parse_event_config(self, result: Dict[str, Any]) -> EventConfig:
|
||||
|
|
@ -830,43 +830,43 @@ class SimulationConfigGenerator:
|
|||
"summary": e.summary[:summary_len] if e.summary else ""
|
||||
})
|
||||
|
||||
prompt = f"""基于以下信息,为每个实体生成社交媒体活动配置。
|
||||
prompt = f"""Based on the information below, generate a social-media activity configuration for each entity.
|
||||
|
||||
模拟需求: {simulation_requirement}
|
||||
Simulation requirement: {simulation_requirement}
|
||||
|
||||
## 实体列表
|
||||
## Entity list
|
||||
```json
|
||||
{json.dumps(entity_list, ensure_ascii=False, indent=2)}
|
||||
```
|
||||
|
||||
## 任务
|
||||
为每个实体生成活动配置,注意:
|
||||
- **时间符合目标用户群体作息**:以下为参考(东八区),请根据模拟场景调整
|
||||
- **官方机构**(University/GovernmentAgency):活跃度低(0.1-0.3),工作时间(9-17)活动,响应慢(60-240分钟),影响力高(2.5-3.0)
|
||||
- **媒体**(MediaOutlet):活跃度中(0.4-0.6),全天活动(8-23),响应快(5-30分钟),影响力高(2.0-2.5)
|
||||
- **个人**(Student/Person/Alumni):活跃度高(0.6-0.9),主要晚间活动(18-23),响应快(1-15分钟),影响力低(0.8-1.2)
|
||||
- **公众人物/专家**:活跃度中(0.4-0.6),影响力中高(1.5-2.0)
|
||||
## Task
|
||||
Generate an activity configuration for each entity. Notes:
|
||||
- **Times must match the daily rhythm of the target audience** — the following are reference values for the UTC+8 timezone; adapt them to the simulation scenario.
|
||||
- **Officials** (University/GovernmentAgency): low activity (0.1-0.3), active during working hours (9-17), slow response (60-240 minutes), high influence (2.5-3.0).
|
||||
- **Media** (MediaOutlet): medium activity (0.4-0.6), active throughout the day (8-23), fast response (5-30 minutes), high influence (2.0-2.5).
|
||||
- **Individuals** (Student/Person/Alumni): high activity (0.6-0.9), mainly active in the evening (18-23), fast response (1-15 minutes), low influence (0.8-1.2).
|
||||
- **Public figures / experts**: medium activity (0.4-0.6), mid-to-high influence (1.5-2.0).
|
||||
|
||||
返回JSON格式(不要markdown):
|
||||
Return strict JSON (no markdown):
|
||||
{{
|
||||
"agent_configs": [
|
||||
{{
|
||||
"agent_id": <必须与输入一致>,
|
||||
"agent_id": <must match the input>,
|
||||
"activity_level": <0.0-1.0>,
|
||||
"posts_per_hour": <发帖频率>,
|
||||
"comments_per_hour": <评论频率>,
|
||||
"active_hours": [<活跃小时列表,考虑中国人作息>],
|
||||
"response_delay_min": <最小响应延迟分钟>,
|
||||
"response_delay_max": <最大响应延迟分钟>,
|
||||
"sentiment_bias": <-1.0到1.0>,
|
||||
"posts_per_hour": <posting frequency>,
|
||||
"comments_per_hour": <commenting frequency>,
|
||||
"active_hours": [<list of active hours, matching the audience's daily rhythm>],
|
||||
"response_delay_min": <minimum response delay in minutes>,
|
||||
"response_delay_max": <maximum response delay in minutes>,
|
||||
"sentiment_bias": <-1.0 to 1.0>,
|
||||
"stance": "<supportive/opposing/neutral/observer>",
|
||||
"influence_weight": <影响力权重>
|
||||
"influence_weight": <influence weight>
|
||||
}},
|
||||
...
|
||||
]
|
||||
}}"""
|
||||
|
||||
system_prompt = "你是社交媒体行为分析专家。返回纯JSON,配置需符合模拟场景中目标用户群体的作息习惯。"
|
||||
system_prompt = "You are a social-media behaviour analyst. Return plain JSON. The configuration should match the daily rhythm of the simulation's target audience."
|
||||
system_prompt = f"{system_prompt}\n\n{get_language_instruction()}\nIMPORTANT: The 'stance' field value MUST be one of the English strings: 'supportive', 'opposing', 'neutral', 'observer'. All JSON field names and numeric values must remain unchanged. Only natural language text fields should use the specified language."
|
||||
|
||||
try:
|
||||
|
|
|
|||
Loading…
Reference in New Issue