5.9 KiB
5.9 KiB
Research & Design Decisions — i18n-translate-backend-comments
Summary
- Feature:
i18n-translate-backend-comments - Discovery Scope: Simple Addition (mechanical translation, no architectural change)
- Key Findings:
- 37 in-scope
backend/Python files contain Chinese characters in docstrings or#comments. The full list is ingap-analysis.md. - Existing docstrings mix English-shape Google-style keys (
Args:/Returns:) with Chinese descriptions, and a smaller subset uses Chinese keys (参数:/返回:/异常:). Both patterns must converge to canonical English Google-style. - Several
tests/test_locale*.pyfiles contain Chinese only inside string literals (intentional test data) and are out of scope by the ticket's enumerated paths.
- 37 in-scope
Research Log
Discovery scan: where is Chinese in backend/?
- Context: Need a deterministic enumeration of files to translate.
- Sources Consulted:
grep/Python-driven scan againstbackend/**/*.py. - Findings:
- 37 in-app files (under
backend/app/,backend/run.py,backend/scripts/). - 2 additional test files in
backend/tests/whose Chinese is only in string literals; not in ticket scope. .venv/matches are noise and excluded.
- 37 in-app files (under
- Implications: The ticket-listed paths are exhaustive; no unexpected location. Order of traversal can be alphabetical within package groups.
Disambiguation: docstring vs string literal
- Context: A triple-quoted string is a docstring iff it is the first statement of a module, class, or function body. Otherwise it is a value (e.g. a prompt template) owned by adjacent tickets.
- Sources Consulted: Python language reference; spot inspection of
services/ontology_generator.py,services/report_agent.py. - Findings:
- In-scope files contain both kinds of triple-quoted strings.
- Translating only the first-statement triple-quoted string per scope keeps the change comments-and-docstrings-only.
- Implications: Translation pass must visually verify each triple-quoted string is the first statement before rewriting; otherwise leave it alone.
Google-style docstring conversions
- Context:
dev-guidelines.mdrequires Google-style docstrings; existing Chinese docstrings sometimes use Chinese keys. - Findings: The following key map applies:
参数:→Args:返回:→Returns:异常:→Raises:产生:/生成:→Yields:示例:→Example:(orExamples:)注意:/备注:→Note:(orNotes:)
- Implications: Document this mapping in design.md so the implementation pass is mechanical.
Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|---|---|---|---|---|
| Manual file-by-file pass | Walk in alphabetical order, package-grouped commits | Predictable, easy to review per package | Human time required | Selected approach |
| Multi-PR per package | One PR per backend package | Smaller diffs to review | Higher overhead, more PR churn | Allowed by ticket but not required |
| Tooling-assisted bulk script | LLM-driven find-and-replace tool | Reusable | Risk of touching string literals; tool itself becomes a deliverable | Out of proportion |
Design Decisions
Decision: Single-pass, package-grouped commits, single PR
- Context: 37 files, mechanical change, ticket allows either single or split PRs.
- Alternatives Considered:
- Multi-PR per package — more granular review but higher overhead.
- Tooling-assisted bulk script — overkill for one ticket.
- Selected Approach: Single PR with one or more commits, grouped by package (
models/,utils/,services/,api/,scripts/, root) so reviewers can read the diff one package at a time. - Rationale: Mechanical change with low risk; ticket explicitly allows it; reduces PR overhead;
/doneproduces one PR per branch by default. - Trade-offs: One large PR, but partitioned by commit. Reviewer can use commit history to navigate.
- Follow-up: After each package commit, re-run residual
grepandpytestto maintain the invariant.
Decision: First-statement disambiguation rule
- Context: Distinguish docstrings (in scope) from value strings (out of scope).
- Selected Approach: A triple-quoted string is treated as a docstring (in scope) only if it is the first statement of a module / class / function body. All other triple-quoted strings are values (out of scope).
- Rationale: Matches Python's own definition; keeps boundary with adjacent tickets unambiguous.
Decision: Drop comments that restate code
- Context: R3 requires deletion of comments whose translated form would merely paraphrase the next line.
- Selected Approach: Apply a one-line heuristic: if the translated comment would be a verb phrase that mirrors the immediately following executable line, delete the comment instead of writing it.
- Rationale: Aligns with project rule "comment the why, not the what".
Risks & Mitigations
- Risk: Accidental edit to a string literal (would belong to ticket #2/#3/#4/#5/#6) — Mitigation: After each package commit, run
git diff --statand a per-file diff sanity check; verify only#lines and docstring lines change. - Risk: Tests failing because a string-shape changed — Mitigation: Run
uv run python -m pytest backend/scripts/test_profile_format.pyafter each commit. - Risk: Line length violations after English expansion — Mitigation: Reflow long English at <= 120 chars within the docstring/comment only; never reflow code.
References
dev-guidelines.md— repo-level coding standards, Google-style docstring requirement..claude/rules/commits.md— Conventional Commits standard for the commit message.- Issue #7 — salestech-group/MiroFish: source ticket.
- Issues #2/#3/#4/#5/#6 — adjacent i18n tickets that own the string-literal Chinese.