MicroFish/.kiro/specs/i18n-externalize-backend-logs/requirements.md

88 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Requirements Document
## Introduction
The MiroFish backend currently emits Chinese strings directly from `logger.{info,warning,error,debug,exception}` calls and from a number of `jsonify({"error|message": ...})` API responses. These hardcoded strings bypass the existing `t()` localization helper in `backend/app/utils/locale.py`, so log aggregators receive unreadable messages for English-speaking operators and API responses ignore the active locale. This spec defines the work required to externalize every Chinese log message and user-facing API error/message string in the listed backend modules into the locale dictionaries (`locales/en.json` and `locales/zh.json`), so logs and responses honor the request locale and English operators get a fully readable pipeline.
## Boundary Context
- **In scope**:
- Replace Chinese string literals inside `logger.{info,warning,error,debug,exception}` calls in:
- `backend/app/services/report_agent.py`
- `backend/app/services/zep_tools.py`
- `backend/app/services/simulation_runner.py`
- `backend/app/services/oasis_profile_generator.py`
- `backend/app/services/simulation_config_generator.py`
- `backend/app/services/zep_graph_memory_updater.py`
- `backend/app/services/ontology_generator.py`
- `backend/app/services/simulation_manager.py`
- `backend/app/services/zep_entity_reader.py`
- `backend/app/services/simulation_ipc.py`
- `backend/app/services/graph_builder.py`
- `backend/app/api/simulation.py`
- `backend/app/api/report.py`
- `backend/app/api/graph.py`
- Replace Chinese string literals inside user-facing `jsonify({"error": ...})` and `jsonify({"message": ...})` (or equivalent response builders) in those API modules.
- Add the corresponding keys to both `locales/en.json` (English translation) and `locales/zh.json` (preserve original Chinese verbatim) under a domain-grouped namespace (`log.<domain>.<key>`, `api.error.<scope>`, `api.message.<scope>`).
- Preserve existing interpolation by passing values through `t(key, **kwargs)` (using the helper's `{name}` placeholder syntax) instead of f-strings or `%`-formatting around the call.
- Ensure `t()` returns a safe fallback (and emits a warning, not a crash) when a key is missing.
- **Out of scope**:
- Prompt template strings (handled by tickets #2/#3/#4/#5; the report-agent prompts work is already on the current branch).
- Chinese docstrings and inline comments (handled by ticket #7).
- Re-architecting the `t()` helper, switching i18n libraries, or introducing pluralization/ICU formatting.
- Changing log levels, log structure, or response status codes beyond the string content.
- Frontend `zh.json` parity beyond the new keys this work introduces.
- **Adjacent expectations**:
- The `t()` helper at `backend/app/utils/locale.py` already exposes `set_locale`, `get_locale`, and `t` and is wired up at request time and at background-thread entry; new code must reuse the existing helper.
- Locale files (`locales/en.json`, `locales/zh.json`) currently coexist with frontend `vue-i18n` consumption; new keys must not collide with existing top-level frontend keys (`menu`, `process`, `step1`, etc.). All new backend keys live under the new top-level namespaces `log` and `api` (or extend them if already present).
- Sibling spec `i18n-report-agent-prompts` covered the *prompt* portion of `report_agent.py`; this spec must not regress those translations.
## Requirements
### Requirement 1: Externalize Chinese Logger Messages
**Objective:** As a backend operator viewing logs in an English log aggregator, I want every Chinese log message in the listed backend modules to be emitted in the active locale, so that I can read and triage logs without translation tooling.
#### Acceptance Criteria
1. The Backend Logging Layer shall emit log records whose message text is produced by `t("log.<domain>.<key>", **fmt)` for every `logger.{info,warning,error,debug,exception}` call in the listed in-scope modules that previously contained Chinese characters.
2. When the active locale is `en`, the Backend Logging Layer shall emit the English translation defined in `locales/en.json` for each externalized log key.
3. When the active locale is `zh`, the Backend Logging Layer shall emit the original Chinese text as preserved in `locales/zh.json` for each externalized log key.
4. The Backend Logging Layer shall preserve all interpolated values (entity counts, identifiers, exception text) by passing them as keyword arguments to `t()` rather than concatenating or formatting them around the `t()` call.
5. The Backend Logging Layer shall not contain any Chinese character (`U+4E00``U+9FFF`) inside the string-literal argument of any `logger.{info,warning,error,debug,exception}` call within the listed in-scope modules.
### Requirement 2: Externalize Chinese API Response Strings
**Objective:** As a frontend client (or external API consumer) reading the `Accept-Language` header, I want backend error and message responses in the listed API modules to be returned in the active locale, so that user-facing error surfaces match the rest of the localized UI.
#### Acceptance Criteria
1. The Backend API Layer shall produce the `error` and `message` field values of `jsonify({...})` responses in the listed in-scope API modules (`backend/app/api/{simulation,report,graph}.py`) by calling `t("api.error.<scope>", **fmt)` or `t("api.message.<scope>", **fmt)`.
2. When the request `Accept-Language` header is `en`, the Backend API Layer shall return the English translation for the corresponding response key.
3. When the request `Accept-Language` header is `zh` or absent, the Backend API Layer shall return the original Chinese string as preserved in `locales/zh.json`.
4. The Backend API Layer shall not contain any Chinese character inside the string value of an `error` or `message` field in any `jsonify(...)` (or equivalent response builder) call within the listed in-scope API modules.
5. The Backend API Layer shall keep the HTTP status code, response key set, and (for non-i18n keys) value structure of every modified response unchanged.
### Requirement 3: Locale Dictionary Parity and Structure
**Objective:** As a translator or developer adding a new locale, I want every backend log/API key to exist in both `en.json` and `zh.json` with identical nested structure, so that the locale files can be diffed and validated mechanically.
#### Acceptance Criteria
1. The Locale Dictionary shall contain, in `locales/en.json`, every key introduced by Requirements 1 and 2 with an English translation.
2. The Locale Dictionary shall contain, in `locales/zh.json`, every key introduced by Requirements 1 and 2 with the original Chinese text preserved verbatim from the previous source code.
3. The Locale Dictionary shall organize new backend keys under the top-level namespaces `log` (grouped by domain: `graph`, `simulation`, `report`, `agent`, `pipeline`, etc.) and `api` (grouped as `api.error.<scope>` / `api.message.<scope>`).
4. The Locale Dictionary shall expose a structurally identical key tree across `en.json` and `zh.json`, such that recursively diffing the key paths (ignoring values) of the two files produces an empty difference.
5. The Locale Dictionary shall not collide with or overwrite any pre-existing top-level frontend i18n key when the new namespaces are added.
### Requirement 4: Safe Fallback for Missing Keys
**Objective:** As a backend service author who may ship code ahead of a translation update, I want missing translation keys to produce a visible warning without crashing the request or background task, so that incomplete locale dictionaries degrade gracefully.
#### Acceptance Criteria
1. If a `t(key, ...)` call references a key that exists in neither the active locale nor the `zh` fallback, the Locale Helper shall return a non-empty string (the key itself or an explicit placeholder) rather than `None` or raising.
2. If a `t(key, ...)` call references a missing key, the Locale Helper shall emit a single warning-level log record identifying the missing key, the active locale, and (when available) the call site context.
3. The Locale Helper shall not raise `KeyError`, `AttributeError`, or `TypeError` for any key lookup, irrespective of nesting depth or invalid path segments.
4. When `t()` is invoked from a background thread that called `set_locale(...)` at entry, the Locale Helper shall resolve the locale set on that thread for the entire call chain.
### Requirement 5: Verification and Regression Guards
**Objective:** As a reviewer of this PR, I want repeatable mechanical checks that prove the in-scope files are clean of stray Chinese log/response strings, so that the acceptance criteria can be re-validated on every future change.
#### Acceptance Criteria
1. The Verification Script shall, when run against the repository, report zero matches for the regular expression `logger\.[a-z]+\(["'][^"']*[一-鿿]` across the listed in-scope modules.
2. The Verification Script shall, when run against the repository, report zero matches for any `jsonify({"error": "<chinese>"})` or `jsonify({"message": "<chinese>"})` literal in the listed in-scope API modules.
3. The Verification Script shall, when run against `locales/en.json` and `locales/zh.json`, confirm that every newly introduced key path exists in both files (structural-key parity) and exit non-zero if a key is present in only one file.
4. The Verification Script shall be runnable from the repository root using only tools already available in the dev environment (`grep`, `python`, or `jq` — no new dependencies introduced).
5. The Backend Test Suite shall continue to pass (`uv run python -m pytest`) after the externalization changes, with no new failures introduced by the rename of message strings.