MicroFish/.kiro/specs/i18n-translate-backend-comm.../research.md

81 lines
5.9 KiB
Markdown

# Research & Design Decisions — `i18n-translate-backend-comments`
## Summary
- **Feature**: `i18n-translate-backend-comments`
- **Discovery Scope**: Simple Addition (mechanical translation, no architectural change)
- **Key Findings**:
- 37 in-scope `backend/` Python files contain Chinese characters in docstrings or `#` comments. The full list is in `gap-analysis.md`.
- Existing docstrings mix English-shape Google-style keys (`Args:`/`Returns:`) with Chinese descriptions, and a smaller subset uses Chinese keys (`参数:`/`返回:`/`异常:`). Both patterns must converge to canonical English Google-style.
- Several `tests/test_locale*.py` files contain Chinese only inside string literals (intentional test data) and are out of scope by the ticket's enumerated paths.
## Research Log
### Discovery scan: where is Chinese in `backend/`?
- **Context**: Need a deterministic enumeration of files to translate.
- **Sources Consulted**: `grep`/Python-driven scan against `backend/**/*.py`.
- **Findings**:
- 37 in-app files (under `backend/app/`, `backend/run.py`, `backend/scripts/`).
- 2 additional test files in `backend/tests/` whose Chinese is only in string literals; not in ticket scope.
- `.venv/` matches are noise and excluded.
- **Implications**: The ticket-listed paths are exhaustive; no unexpected location. Order of traversal can be alphabetical within package groups.
### Disambiguation: docstring vs string literal
- **Context**: A triple-quoted string is a docstring iff it is the first statement of a module, class, or function body. Otherwise it is a value (e.g. a prompt template) owned by adjacent tickets.
- **Sources Consulted**: Python language reference; spot inspection of `services/ontology_generator.py`, `services/report_agent.py`.
- **Findings**:
- In-scope files contain both kinds of triple-quoted strings.
- Translating only the *first-statement* triple-quoted string per scope keeps the change comments-and-docstrings-only.
- **Implications**: Translation pass must visually verify each triple-quoted string is the first statement before rewriting; otherwise leave it alone.
### Google-style docstring conversions
- **Context**: `dev-guidelines.md` requires Google-style docstrings; existing Chinese docstrings sometimes use Chinese keys.
- **Findings**: The following key map applies:
- `参数:``Args:`
- `返回:``Returns:`
- `异常:``Raises:`
- `产生:` / `生成:``Yields:`
- `示例:``Example:` (or `Examples:`)
- `注意:` / `备注:``Note:` (or `Notes:`)
- **Implications**: Document this mapping in design.md so the implementation pass is mechanical.
## Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|--------|-------------|-----------|---------------------|-------|
| Manual file-by-file pass | Walk in alphabetical order, package-grouped commits | Predictable, easy to review per package | Human time required | Selected approach |
| Multi-PR per package | One PR per backend package | Smaller diffs to review | Higher overhead, more PR churn | Allowed by ticket but not required |
| Tooling-assisted bulk script | LLM-driven find-and-replace tool | Reusable | Risk of touching string literals; tool itself becomes a deliverable | Out of proportion |
## Design Decisions
### Decision: Single-pass, package-grouped commits, single PR
- **Context**: 37 files, mechanical change, ticket allows either single or split PRs.
- **Alternatives Considered**:
1. Multi-PR per package — more granular review but higher overhead.
2. Tooling-assisted bulk script — overkill for one ticket.
- **Selected Approach**: Single PR with one or more commits, grouped by package (`models/`, `utils/`, `services/`, `api/`, `scripts/`, root) so reviewers can read the diff one package at a time.
- **Rationale**: Mechanical change with low risk; ticket explicitly allows it; reduces PR overhead; `/done` produces one PR per branch by default.
- **Trade-offs**: One large PR, but partitioned by commit. Reviewer can use commit history to navigate.
- **Follow-up**: After each package commit, re-run residual `grep` and `pytest` to maintain the invariant.
### Decision: First-statement disambiguation rule
- **Context**: Distinguish docstrings (in scope) from value strings (out of scope).
- **Selected Approach**: A triple-quoted string is treated as a docstring (in scope) only if it is the first statement of a module / class / function body. All other triple-quoted strings are values (out of scope).
- **Rationale**: Matches Python's own definition; keeps boundary with adjacent tickets unambiguous.
### Decision: Drop comments that restate code
- **Context**: R3 requires deletion of comments whose translated form would merely paraphrase the next line.
- **Selected Approach**: Apply a one-line heuristic: if the translated comment would be a verb phrase that mirrors the immediately following executable line, delete the comment instead of writing it.
- **Rationale**: Aligns with project rule "comment the why, not the what".
## Risks & Mitigations
- **Risk**: Accidental edit to a string literal (would belong to ticket #2/#3/#4/#5/#6) — **Mitigation**: After each package commit, run `git diff --stat` and a per-file diff sanity check; verify only `#` lines and docstring lines change.
- **Risk**: Tests failing because a string-shape changed — **Mitigation**: Run `uv run python -m pytest backend/scripts/test_profile_format.py` after each commit.
- **Risk**: Line length violations after English expansion — **Mitigation**: Reflow long English at <= 120 chars within the docstring/comment only; never reflow code.
## References
- `dev-guidelines.md` — repo-level coding standards, Google-style docstring requirement.
- `.claude/rules/commits.md` — Conventional Commits standard for the commit message.
- Issue #7 — salestech-group/MiroFish: source ticket.
- Issues #2/#3/#4/#5/#6 — adjacent i18n tickets that own the string-literal Chinese.