MicroFish/.kiro/specs/i18n-externalize-remaining-.../gap-analysis.md

125 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation Gap Analysis
## 1. Codebase Findings
### 1.1 Existing infrastructure already covers the i18n mechanics
- `backend/app/utils/locale.py` already exports `t(key, **kwargs)` with:
- per-thread locale (`set_locale` writes `_thread_local.locale`)
- per-request locale (`get_locale` checks Flask `has_request_context()` then `Accept-Language`)
- `zh` fallback when the active locale is missing a key, then key-string fallback if `zh` is missing too
- dedup'd warning on missing keys (`_warn_missing_key_once`), no exceptions raised
- All wiring required by Requirement 4 is therefore already in place. **No `locale.py` change is needed for ticket #24.**
### 1.2 The two files we touch already use `t()`
- `backend/app/api/graph.py:21``from ..utils.locale import t`
- `backend/app/services/oasis_profile_generator.py:23``from ..utils.locale import get_language_instruction, get_locale, set_locale, t`
The third file does NOT yet import `t`:
- `backend/app/utils/retry.py` — no `from ..utils.locale import t`. Need to add the import.
### 1.3 Existing locale namespace shape (from `locales/en.json`)
- `log.graph_api` — populated `m006``m019, m026`. Next free slots that are *contiguous* would be `m027`, `m028`, `m029`. (Could also reuse `m009, m010, m012, m020m025` since they are absent, but it is safer to append at the tail to avoid colliding with any unmerged work assuming a particular reservation.)
- `log.profile_generator` — populated `m001``m023` densely. Next free: `m024`, `m025`.
- `log.retry` — does NOT exist. Will be created with `m001``m004`.
The `log.profile_generator.m017` key already covers a *similar* message ("Starting parallel generation of {total} agent profiles (parallelism: {parallel_count})…"). The `print(...)` at `oasis_profile_generator.py:945` and the `logger.info(t("log.profile_generator.m017", ...))` at line 943 are emitting the same logical event in two channels — log + console banner. The cleanest move is **not** to reuse `m017` (which would lose the banner-style separator/centring) but to introduce dedicated `m024` / `m025` keys for the banner text, so the banner has its own copy decoupled from the log line.
### 1.4 Translation pattern already established by ticket #6
Per the prior spec at `.kiro/specs/i18n-externalize-backend-logs/`, the project's convention is:
- `t("log.<domain>.m###", placeholder=value, …)` inside `logger.{info,warning,error,debug,exception}` calls.
- Placeholders use `{name}` syntax (replaced via `str.replace` inside `t()`); positional `{0}`/`{}` are not supported.
- f-string formatting must be removed entirely from the call argument; values are passed as kwargs.
- The Chinese source string is preserved verbatim in `zh.json`, with `f"…{var}…"` rewritten as `"…{var}…"`.
This work strictly extends the existing pattern. **No new convention is introduced.**
### 1.5 `build_logger` vs. module logger
In `graph.py`, the affected calls use a locally-created `build_logger = get_logger('mirofish.build')` inside the `build_task` background function (lines 383). This is a different logger handle, but `t()` is logger-agnostic — it returns a string that any logger can format. No special handling needed.
### 1.6 `print(...)` calls in `oasis_profile_generator.py`
The two banner prints (lines 945 and 1001) are deliberate console-output decorations (visible on stdout for the Flask process), separate from the structured log emitted by `logger.info` on lines 943 and earlier. The task is to keep them as `print(...)` but route the message text through `t(...)`:
```python
print(t("log.profile_generator.m024", total=total, parallel_count=parallel_count))
```
This preserves the user-visible banner cosmetics (`'='*60` separators on lines 944, 946, 1000, 1002) and only changes the text content.
### 1.7 Locale resolution for `retry.py`
`retry.py` is invoked from three contexts:
1. **Flask request handlers (sync)**`has_request_context()` is true; `get_locale()` reads `Accept-Language`. Works.
2. **Background tasks** — the existing background-task entry points (e.g., `task_manager.run_task`) already call `set_locale(...)` per `i18n-externalize-backend-logs` (verified by reading `oasis_profile_generator.py` which uses the same pattern with `set_locale` imported on line 23). Works.
3. **Async coroutines (`retry_with_backoff_async`)**`get_locale()` falls back to `_thread_local.locale`. Asyncio runs coroutines on the same thread by default, so the per-thread locale propagates. If the coroutine is dispatched onto a fresh executor thread without `set_locale`, the helper falls back to `zh` (the default) — still a valid string, just defaulting to Chinese. The default-fallback is acceptable here because (a) the helper still returns a non-None string, and (b) the audit only requires the *source code* to be free of Chinese literals, not that every emitted log record be English regardless of caller context.
**Decision:** No new locale-propagation wiring needed. Document the async fallback in the design and tasks.
## 2. Out-of-scope items (encountered during research)
These were observed in the same files but are explicitly **not** part of ticket #24 and will not be addressed:
- `backend/app/api/graph.py` — Chinese in `task_manager.update_task(..., message="初始化图谱构建服务...")` and similar (#24 lists only the three log calls).
- `backend/app/utils/retry.py` — Chinese in `logger.warning(...)` retry messages (lines 6366, 115117, 185187) and Chinese docstrings (lines 13, 2535, 3639, 90, 156166, 200212).
- `backend/app/services/oasis_profile_generator.py` — Chinese in `progress_callback(... f"已完成 …")` (line 976) and Chinese docstrings/comments throughout.
These are tracked under sibling tickets (#7 for docstrings/comments; the residual `logger.warning` in `retry.py` is a candidate for a future audit ticket).
## 3. Implementation Approaches Considered
### Approach A — Append-at-tail with new `log.retry` namespace (recommended)
- New keys: `log.graph_api.m027`, `m028`, `m029`; `log.profile_generator.m024`, `m025`; new `log.retry.m001``m004`.
- Add `from ..utils.locale import t` to `retry.py`.
- Replace each f-string in the nine call sites with a `t(...)` call.
- Update `locales/en.json` and `locales/zh.json` in lock-step.
- **Pros:** Mirrors the conventions of #6 exactly; no risk of overwriting existing keys; minimal diff.
- **Cons:** Numbering gaps under `log.graph_api` remain (cosmetic).
### Approach B — Fill numbering gaps in `log.graph_api`
- Reuse missing slots `m009`, `m010`, `m012`, `m020``m025`.
- **Pros:** Tighter numbering.
- **Cons:** Risk of colliding with reserved-but-not-yet-merged keys from another branch; harder to review (mixed insertion sites in JSON).
- **Verdict:** Reject. The cost of conflict review is not worth the cosmetic gain.
### Approach C — Consolidate the `print(...)` banners into the existing `log.profile_generator.m017`
- Remove the two `print(...)` calls; rely solely on `logger.info(t(...))`.
- **Pros:** One fewer key to add.
- **Cons:** Deletes user-visible console banner behaviour (a behaviour change), violates Requirement 3.2 ("continue to print exactly two banner messages"), and is out-of-scope per ticket #24 which says "fixed (or explicitly classified as `deliberate`)" — i.e., translate, don't remove.
- **Verdict:** Reject.
## 4. Recommendation
Proceed with **Approach A**.
Implementation will:
1. Add four entries to `log.retry` (new sub-namespace) — one per `logger.error` line in `retry.py`.
2. Add three entries to `log.graph_api` — one per `build_logger` line in `graph.py`.
3. Add two entries to `log.profile_generator` — one per `print(...)` banner in `oasis_profile_generator.py`.
4. Replace all nine f-strings with `t(...)` calls; pass interpolated values as kwargs.
5. Add `from ..utils.locale import t` to `retry.py`.
6. Mirror every new key in `zh.json` with the verbatim original Chinese text.
7. Run a regex / Python audit to confirm parity and absence of CJK on the touched lines.
## 5. Risks / open questions
| Risk | Severity | Mitigation |
|---|---|---|
| `retry.py` async path running on a fresh thread without `set_locale` returns Chinese | Low | Documented; not a blocker for #24 acceptance, which targets *source-code* CJK absence. Any improvement is a separate ticket. |
| Adding `from ..utils.locale import t` introduces a new module import into `retry.py` (low-level utility) | Low | The `locale` module has no transitive imports of `retry.py`, so no circular-import risk. Verified by reading `locale.py`. |
| Existing test that asserts Chinese log text breaks | Low | Searched for `"开始构建图谱"` / `"图谱构建完成"` / `"图谱构建失败"` / `"开始生成Agent人设"` / `"人设生成完成"` / `"重试后仍失败"` / `"处理第"` test fixtures — none found in `backend/`. |
## 6. Conclusion
**Ready to proceed to design.** The gap is small: nine string-literal replacements, eleven new locale entries, one new import. The mechanics are identical to the already-merged ticket #6 work. No design uncertainty remains; design phase will simply formalise the key-naming and the per-file edit plan.