MicroFish/.kiro/specs/i18n-externalize-remaining-.../requirements.md

92 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Requirements Document
## Introduction
After ticket #6 externalised most backend log/print messages into the project's `t()` localization helper, a small set of call sites in three modules still emit hard-coded Chinese strings. As a result, English operators reading backend logs under the `en` locale see Chinese text leaking from these residual sites. This spec finishes the job for ticket #24 by routing every remaining hard-coded Chinese log/print string in `backend/app/api/graph.py`, `backend/app/services/oasis_profile_generator.py`, and `backend/app/utils/retry.py` through `t("log.<domain>.<key>", **fmt)` and adding the corresponding entries to `locales/en.json` and `locales/zh.json`. The goal is locale-correct backend logs with zero behavioural drift in HTTP responses, control flow, or interpolated values.
## Boundary Context
- **In scope**:
- Replace the Chinese string literals in the nine call sites listed by ticket #24:
- `backend/app/api/graph.py:385``build_logger.info(f"[{task_id}] 开始构建图谱...")`
- `backend/app/api/graph.py:494``build_logger.info(f"[{task_id}] 图谱构建完成: graph_id={graph_id}, 节点={node_count}, 边={edge_count}")`
- `backend/app/api/graph.py:513``build_logger.error(f"[{task_id}] 图谱构建失败: {str(e)}")`
- `backend/app/services/oasis_profile_generator.py:945``print(f"开始生成Agent人设 - 共 {total} 个实体,并行数: {parallel_count}")`
- `backend/app/services/oasis_profile_generator.py:1001``print(f"人设生成完成!共生成 {len([p for p in profiles if p])} 个Agent")`
- `backend/app/utils/retry.py:55``logger.error(f"函数 {func.__name__} 在 {max_retries} 次重试后仍失败: {str(e)}")`
- `backend/app/utils/retry.py:108``logger.error(f"异步函数 {func.__name__} 在 {max_retries} 次重试后仍失败: {str(e)}")`
- `backend/app/utils/retry.py:179``logger.error(f"API调用在 {self.max_retries} 次重试后仍失败: {str(e)}")`
- `backend/app/utils/retry.py:227``logger.error(f"处理第 {idx + 1} 项失败: {str(e)}")`
- Add new locale keys for the externalised strings to both `locales/en.json` (English) and `locales/zh.json` (verbatim original Chinese) under the existing top-level `log.<domain>` namespaces (`log.graph_api`, `log.profile_generator`, and a new `log.retry`).
- Pass interpolated values (`task_id`, `graph_id`, `node_count`, `edge_count`, `total`, `parallel_count`, `func_name`, `max_retries`, `idx`, exception text, etc.) through `t()` keyword arguments using the helper's `{name}` placeholder syntax.
- **Out of scope**:
- Other Chinese strings in the same files that are not on the ticket's evidence list (Chinese docstrings, Chinese inline comments, the `task_manager.update_task(... message="...")` Chinese values in `graph.py`, the `logger.warning("…重试…")` calls in `retry.py`, and the in-loop `progress_callback(... f"已完成 …")` and `print(f"-" * 70 …)` decorations in `oasis_profile_generator.py`). Those are tracked elsewhere (#7 for docstrings/comments; #25 for prompt/context labels; future audit may pick up the remaining warning-level retry strings under a separate ticket).
- Any change to log levels, response status codes, control flow, public API surface, or to the `t()` helper itself.
- Adding a new locale or changing the per-thread / per-request locale resolution.
- Frontend `vue-i18n` files; this spec touches only backend usage of `t()` and the shared `locales/{en,zh}.json`.
- **Adjacent expectations**:
- The `t()` helper at `backend/app/utils/locale.py` already covers `set_locale`, `get_locale`, missing-key fallback, and per-thread locale (verified by ticket #6). New code reuses it without modification.
- The two top-level `log` sub-namespaces `log.graph_api` and `log.profile_generator` already exist in `locales/en.json` / `locales/zh.json` with `m###` numeric suffixes; new keys must use the next available `m###` slot in each existing namespace and must not collide with or overwrite any existing key.
- `retry.py` is module-level shared infrastructure used from request handlers, background tasks, and async coroutines — locale resolution must continue to work in each of these contexts without new wiring (Requirement 4 below documents this explicitly so behaviour is mechanically verified).
- Ticket #24's acceptance criterion mentions a verification script under `.kiro/specs/i18n-e2e-english-verification/audit/scripts/run_audit.sh`. That script is not present in the repository at this commit; this spec substitutes a deterministic regex audit (see Requirement 5) that is runnable from the repo root with `grep` + `python` only and that any future `run_audit.sh` can incorporate.
## Requirements
### Requirement 1: Externalise Remaining Chinese Log/Print Strings via `t()`
**Objective:** As a backend operator viewing logs under the `en` locale, I want every Chinese log/print string in the nine listed call sites to be emitted via the existing `t()` helper, so that backend logs no longer leak Chinese text in English deployments.
#### Acceptance Criteria
1. The Backend Logging Layer shall replace the f-string argument of each of the three `build_logger.{info,error}` calls in `backend/app/api/graph.py` at lines 385, 494, and 513 with `t("log.graph_api.<key>", task_id=task_id, ...)`, where the key is a new entry under the existing `log.graph_api` namespace.
2. The Backend Logging Layer shall replace the f-string argument of each of the two `print(...)` calls in `backend/app/services/oasis_profile_generator.py` at lines 945 and 1001 with `print(t("log.profile_generator.<key>", ...))`, keeping the `print` call (so console-output behaviour is preserved) but routing the message text through `t()` under the existing `log.profile_generator` namespace.
3. The Backend Logging Layer shall replace the f-string argument of each of the four `logger.error` calls in `backend/app/utils/retry.py` at lines 55, 108, 179, and 227 with `t("log.retry.<key>", **kwargs)`, introducing a new top-level sub-namespace `log.retry` that mirrors the structure of the other `log.<domain>` sub-namespaces.
4. The Backend Logging Layer shall preserve every interpolated value (`task_id`, `graph_id`, `node_count`, `edge_count`, `total`, `parallel_count`, `func.__name__`, `max_retries`, `idx`, exception text) by passing them as keyword arguments to `t(...)` and referencing them via `{name}` placeholders inside the locale dictionaries; no `f"..."` formatting, `%`-formatting, or string concatenation shall remain around the call.
5. The Backend Logging Layer shall not contain any Chinese character (Unicode range `U+4E00``U+9FFF`) inside the string-literal argument of any `logger.{info,warning,error,debug,exception}`, `build_logger.{info,warning,error,debug,exception}`, or `print(...)` call at the nine listed line locations after the change.
### Requirement 2: Locale Dictionary Parity for the New Keys
**Objective:** As a translator or developer adding a new locale, I want every newly externalised key to exist in both `locales/en.json` and `locales/zh.json` with identical nested structure, so that the locale files remain mechanically diffable.
#### Acceptance Criteria
1. The Locale Dictionary shall add, in `locales/en.json`, an English translation for every key introduced by Requirement 1, placed under the relevant `log.<domain>` sub-namespace (`log.graph_api`, `log.profile_generator`, or the new `log.retry`).
2. The Locale Dictionary shall add, in `locales/zh.json`, the original Chinese text (verbatim, with `{placeholder}` substitutions where the source had `f"…{var}…"`) for every key introduced by Requirement 1, under the same key path used in `en.json`.
3. The Locale Dictionary shall use the next available `m###` numeric suffix per existing sub-namespace (so it does not overwrite or shadow any pre-existing `log.graph_api.m###` or `log.profile_generator.m###` key); the new `log.retry` sub-namespace shall start its keys at `m001`.
4. The Locale Dictionary shall expose a structurally identical key tree across `locales/en.json` and `locales/zh.json` for every newly added key path: a recursive comparison of the two files' key paths (ignoring values) shall produce an empty difference for the keys this spec introduces.
5. The Locale Dictionary shall not introduce a new top-level key (the only addition is the new `log.retry` sub-key under the existing top-level `log` namespace) and shall not modify, remove, or re-order any existing key already present in `locales/{en,zh}.json`.
### Requirement 3: Behavioural and Functional Equivalence
**Objective:** As a reviewer, I want to confirm that swapping the message strings does not change runtime behaviour, so that this PR is purely a localisation change.
#### Acceptance Criteria
1. The Graph Build Pipeline shall, after the change, continue to: update `project.status` to `GRAPH_BUILDING` then `GRAPH_COMPLETED` (or `FAILED` on error), call `task_manager.update_task(...)` with the same status/progress/result payloads, and emit one log record at each of the three pre-existing log points (start, completion, failure) with identical level (`info`/`info`/`error`) and identical interpolated values; only the human-readable text and its language source shall differ.
2. The Profile Generator shall, after the change, continue to print exactly two banner messages around `concurrent.futures.ThreadPoolExecutor`-driven generation (one before, one after), retain the surrounding `'='*60` separator lines verbatim, and not emit additional log records or alter the order of `logger.info`/`logger.warning` calls.
3. The Retry Utility shall, after the change, continue to: raise the original exception after the final retry, sleep for the same backoff durations, and emit exactly one `logger.error` per call site at the same control-flow position; the helper's signature, decorator behaviour, and async/sync split shall be unchanged.
4. The Backend HTTP Layer shall return the same HTTP status code, response key set, and (for non-translated keys) value structure for `/api/graph/build` and any other endpoint that transitively triggers the touched code paths; no `jsonify(...)` payload shape shall change as a side-effect of this work.
### Requirement 4: Locale Resolution in Background and Async Contexts
**Objective:** As a backend service author, I want the new `t()` calls to resolve to the correct locale even when invoked from background threads or async coroutines, so that operators see consistent log language regardless of where the call originates.
#### Acceptance Criteria
1. When `t("log.graph_api.<key>", ...)` is called from the `build_task` background thread inside `backend/app/api/graph.py` (started via `task_manager.run_task`), the Locale Helper shall resolve to the locale that was established for that thread (per the existing per-thread / `set_locale` mechanism), not silently fall back to the default `zh`.
2. When `t("log.retry.<key>", ...)` is called from the synchronous `retry_with_backoff` decorator wrapping a Flask request handler, the Locale Helper shall resolve via the active Flask request context (`Accept-Language` header), consistent with how request-scoped `t()` calls behave elsewhere in the codebase.
3. When `t("log.retry.<key>", ...)` is called from the asynchronous `retry_with_backoff_async` decorator under `asyncio`, the Locale Helper shall resolve via whichever locale source is in scope for that coroutine (request context if present; otherwise the per-thread fallback set by the caller), without raising and without requiring any new locale-propagation wiring inside `retry.py`.
4. If a `t()` call introduced by this spec references a key that is missing from both the active locale and the `zh` fallback, the Locale Helper shall continue to behave per the existing contract: emit a single deduped warning naming the key and locale, and return the key string itself (never `None`, never raise).
### Requirement 5: Verification and Regression Guards
**Objective:** As a reviewer of this PR, I want repeatable mechanical checks that prove the in-scope files are clean of stray hard-coded Chinese log/print strings on those nine lines, so that the acceptance criteria can be re-validated on every future change.
#### Acceptance Criteria
1. The Verification Procedure shall, when run against the repository, report zero matches of any Unicode CJK character (range `U+4E00``U+9FFF`) on the nine specific lines covered by Requirement 1 in their post-change form (i.e., `grep -P "[一-鿿]"` against the replaced lines returns no hits).
2. The Verification Procedure shall, when run against `locales/en.json` and `locales/zh.json`, confirm via a Python `json.load` + recursive key walk that every newly introduced key path exists in both files, and exit non-zero if a key path is present in only one of them.
3. The Verification Procedure shall confirm via Python that for each new key in `locales/zh.json` whose source f-string contained an `{var}` placeholder, the same `{var}` placeholder appears in the new English translation in `locales/en.json` (so interpolation is not silently dropped during translation).
4. The Verification Procedure shall require only tools already available in the dev environment (`grep`, `python3`, optional `jq`) — no new runtime or dev dependencies shall be added by this spec.
5. The Backend Test Suite shall continue to pass (`uv run python -m pytest`) after the change, with no new failures introduced; in particular, any pre-existing tests that assert the prior Chinese log/print text shall be updated to assert via the same `t()` lookup or an English translation rather than removed.