81 lines
5.9 KiB
Markdown
81 lines
5.9 KiB
Markdown
# Research & Design Decisions — `i18n-translate-backend-comments`
|
|
|
|
## Summary
|
|
- **Feature**: `i18n-translate-backend-comments`
|
|
- **Discovery Scope**: Simple Addition (mechanical translation, no architectural change)
|
|
- **Key Findings**:
|
|
- 37 in-scope `backend/` Python files contain Chinese characters in docstrings or `#` comments. The full list is in `gap-analysis.md`.
|
|
- Existing docstrings mix English-shape Google-style keys (`Args:`/`Returns:`) with Chinese descriptions, and a smaller subset uses Chinese keys (`参数:`/`返回:`/`异常:`). Both patterns must converge to canonical English Google-style.
|
|
- Several `tests/test_locale*.py` files contain Chinese only inside string literals (intentional test data) and are out of scope by the ticket's enumerated paths.
|
|
|
|
## Research Log
|
|
|
|
### Discovery scan: where is Chinese in `backend/`?
|
|
- **Context**: Need a deterministic enumeration of files to translate.
|
|
- **Sources Consulted**: `grep`/Python-driven scan against `backend/**/*.py`.
|
|
- **Findings**:
|
|
- 37 in-app files (under `backend/app/`, `backend/run.py`, `backend/scripts/`).
|
|
- 2 additional test files in `backend/tests/` whose Chinese is only in string literals; not in ticket scope.
|
|
- `.venv/` matches are noise and excluded.
|
|
- **Implications**: The ticket-listed paths are exhaustive; no unexpected location. Order of traversal can be alphabetical within package groups.
|
|
|
|
### Disambiguation: docstring vs string literal
|
|
- **Context**: A triple-quoted string is a docstring iff it is the first statement of a module, class, or function body. Otherwise it is a value (e.g. a prompt template) owned by adjacent tickets.
|
|
- **Sources Consulted**: Python language reference; spot inspection of `services/ontology_generator.py`, `services/report_agent.py`.
|
|
- **Findings**:
|
|
- In-scope files contain both kinds of triple-quoted strings.
|
|
- Translating only the *first-statement* triple-quoted string per scope keeps the change comments-and-docstrings-only.
|
|
- **Implications**: Translation pass must visually verify each triple-quoted string is the first statement before rewriting; otherwise leave it alone.
|
|
|
|
### Google-style docstring conversions
|
|
- **Context**: `dev-guidelines.md` requires Google-style docstrings; existing Chinese docstrings sometimes use Chinese keys.
|
|
- **Findings**: The following key map applies:
|
|
- `参数:` → `Args:`
|
|
- `返回:` → `Returns:`
|
|
- `异常:` → `Raises:`
|
|
- `产生:` / `生成:` → `Yields:`
|
|
- `示例:` → `Example:` (or `Examples:`)
|
|
- `注意:` / `备注:` → `Note:` (or `Notes:`)
|
|
- **Implications**: Document this mapping in design.md so the implementation pass is mechanical.
|
|
|
|
## Architecture Pattern Evaluation
|
|
|
|
| Option | Description | Strengths | Risks / Limitations | Notes |
|
|
|--------|-------------|-----------|---------------------|-------|
|
|
| Manual file-by-file pass | Walk in alphabetical order, package-grouped commits | Predictable, easy to review per package | Human time required | Selected approach |
|
|
| Multi-PR per package | One PR per backend package | Smaller diffs to review | Higher overhead, more PR churn | Allowed by ticket but not required |
|
|
| Tooling-assisted bulk script | LLM-driven find-and-replace tool | Reusable | Risk of touching string literals; tool itself becomes a deliverable | Out of proportion |
|
|
|
|
## Design Decisions
|
|
|
|
### Decision: Single-pass, package-grouped commits, single PR
|
|
- **Context**: 37 files, mechanical change, ticket allows either single or split PRs.
|
|
- **Alternatives Considered**:
|
|
1. Multi-PR per package — more granular review but higher overhead.
|
|
2. Tooling-assisted bulk script — overkill for one ticket.
|
|
- **Selected Approach**: Single PR with one or more commits, grouped by package (`models/`, `utils/`, `services/`, `api/`, `scripts/`, root) so reviewers can read the diff one package at a time.
|
|
- **Rationale**: Mechanical change with low risk; ticket explicitly allows it; reduces PR overhead; `/done` produces one PR per branch by default.
|
|
- **Trade-offs**: One large PR, but partitioned by commit. Reviewer can use commit history to navigate.
|
|
- **Follow-up**: After each package commit, re-run residual `grep` and `pytest` to maintain the invariant.
|
|
|
|
### Decision: First-statement disambiguation rule
|
|
- **Context**: Distinguish docstrings (in scope) from value strings (out of scope).
|
|
- **Selected Approach**: A triple-quoted string is treated as a docstring (in scope) only if it is the first statement of a module / class / function body. All other triple-quoted strings are values (out of scope).
|
|
- **Rationale**: Matches Python's own definition; keeps boundary with adjacent tickets unambiguous.
|
|
|
|
### Decision: Drop comments that restate code
|
|
- **Context**: R3 requires deletion of comments whose translated form would merely paraphrase the next line.
|
|
- **Selected Approach**: Apply a one-line heuristic: if the translated comment would be a verb phrase that mirrors the immediately following executable line, delete the comment instead of writing it.
|
|
- **Rationale**: Aligns with project rule "comment the why, not the what".
|
|
|
|
## Risks & Mitigations
|
|
- **Risk**: Accidental edit to a string literal (would belong to ticket #2/#3/#4/#5/#6) — **Mitigation**: After each package commit, run `git diff --stat` and a per-file diff sanity check; verify only `#` lines and docstring lines change.
|
|
- **Risk**: Tests failing because a string-shape changed — **Mitigation**: Run `uv run python -m pytest backend/scripts/test_profile_format.py` after each commit.
|
|
- **Risk**: Line length violations after English expansion — **Mitigation**: Reflow long English at <= 120 chars within the docstring/comment only; never reflow code.
|
|
|
|
## References
|
|
- `dev-guidelines.md` — repo-level coding standards, Google-style docstring requirement.
|
|
- `.claude/rules/commits.md` — Conventional Commits standard for the commit message.
|
|
- Issue #7 — salestech-group/MiroFish: source ticket.
|
|
- Issues #2/#3/#4/#5/#6 — adjacent i18n tickets that own the string-literal Chinese.
|