chore(i18n): refresh cjk baseline and update spec status
backend/app baseline drops from 2792 to 307 after the comment/docstring translation pass. Mark i18n-translate-backend-comments tasks complete in the spec and update HANDOFF.md to record the second-installment scope. Add the AST-aware scanner used during verification under the spec directory so future audits can re-run it.
This commit is contained in:
parent
5815ed28d2
commit
339cc396dd
|
|
@ -1,5 +1,5 @@
|
|||
# Per-path CJK baseline for the i18n CI guard.
|
||||
# Format: <path>\t<count>. Sorted lexicographically.
|
||||
# Refresh via: python scripts/ci/i18n_cjk_guard.py --update-baseline
|
||||
backend/app 2792
|
||||
frontend/src 902
|
||||
backend/app 307
|
||||
frontend/src 124
|
||||
|
|
|
|||
|
|
@ -1,61 +1,78 @@
|
|||
# Handoff — `i18n-translate-backend-comments` (Issue #7)
|
||||
|
||||
## Status
|
||||
**Partial completion.** This is the first installment of the ticket-#7 cleanup. The ticket explicitly allows splitting the work across multiple small PRs ("Low-risk, high-volume mechanical task; can be split across multiple small PRs"). This PR ships translations for the smaller files; the larger service and API files remain for follow-up PRs.
|
||||
**Complete.** All in-scope Chinese docstrings and `#` comments under `backend/` have been translated to English.
|
||||
|
||||
## Completed in this PR (23 files)
|
||||
All translated to English with no behavior or string-literal changes:
|
||||
This second installment of the ticket-#7 cleanup builds on the first installment (PR #20) and finishes the remaining 12 files. Together, the two installments cover the full 35-file in-scope set.
|
||||
|
||||
## Completed across both installments (35 files)
|
||||
|
||||
### First installment (PR #20 — landed on `feat/i18n-6-externalize-backend-logs`, then merged here via `merge main` into this branch)
|
||||
- **Root**: `backend/app/__init__.py`, `backend/app/config.py`, `backend/run.py`
|
||||
- **API package init**: `backend/app/api/__init__.py`
|
||||
- **Models** (full package): `backend/app/models/__init__.py`, `project.py`, `task.py`
|
||||
- **Utils** (full package): `backend/app/utils/__init__.py`, `file_parser.py`, `llm_client.py`, `locale.py` (no docstring/comment Chinese to begin with), `logger.py`, `retry.py`, `zep_paging.py`
|
||||
- **Utils** (full package): `backend/app/utils/__init__.py`, `file_parser.py`, `llm_client.py`, `locale.py`, `logger.py`, `retry.py`, `zep_paging.py`
|
||||
- **Services** (partial): `backend/app/services/__init__.py`, `graph_builder.py`, `ontology_generator.py`, `simulation_ipc.py`, `simulation_manager.py`, `text_processor.py`, `zep_entity_reader.py`
|
||||
- **Scripts** (partial): `backend/scripts/action_logger.py`, `backend/scripts/test_profile_format.py`
|
||||
|
||||
## Remaining for follow-up PRs (12 files)
|
||||
Per the AST-aware scanner used in this PR (`/tmp/scan_chinese.py`), the residual in-scope work totals **2,235 hits** (1,203 docstring lines + 1,032 inline-comment lines) across these files:
|
||||
|
||||
| File | Approx in-scope hits | Approx LOC |
|
||||
### Second installment (this PR — finishes the ticket)
|
||||
| File | Starting in-scope hits | Comment-the-obvious deletions |
|
||||
| --- | --- | --- |
|
||||
| `backend/app/api/graph.py` | ~50 | 665 |
|
||||
| `backend/app/api/report.py` | ~80 | 1020 |
|
||||
| `backend/app/api/simulation.py` | ~250 | 2712 |
|
||||
| `backend/app/services/oasis_profile_generator.py` | ~230 | 1195 |
|
||||
| `backend/app/services/report_agent.py` | ~520 | 2572 |
|
||||
| `backend/app/services/simulation_config_generator.py` | ~150 | 991 |
|
||||
| `backend/app/services/simulation_runner.py` | ~330 | 1768 |
|
||||
| `backend/app/services/zep_graph_memory_updater.py` | ~110 | 544 |
|
||||
| `backend/app/services/zep_tools.py` | ~280 | 1741 |
|
||||
| `backend/scripts/run_parallel_simulation.py` | ~150 | 1699 |
|
||||
| `backend/scripts/run_reddit_simulation.py` | ~50 | 769 |
|
||||
| `backend/scripts/run_twitter_simulation.py` | ~50 | 780 |
|
||||
| `backend/app/api/graph.py` | 70 | 25 |
|
||||
| `backend/app/api/report.py` | 104 | 11 |
|
||||
| `backend/app/api/simulation.py` | 351 | ~25 |
|
||||
| `backend/app/services/oasis_profile_generator.py` | 185 | ~14 |
|
||||
| `backend/app/services/report_agent.py` | 335 | 8 |
|
||||
| `backend/app/services/simulation_config_generator.py` | 148 | 0 |
|
||||
| `backend/app/services/simulation_runner.py` | 277 | ~31 |
|
||||
| `backend/app/services/zep_graph_memory_updater.py` | 97 | 5 |
|
||||
| `backend/app/services/zep_tools.py` | 269 | 6 |
|
||||
| `backend/scripts/run_parallel_simulation.py` | 227 | ~7 |
|
||||
| `backend/scripts/run_reddit_simulation.py` | 75 | 12 |
|
||||
| `backend/scripts/run_twitter_simulation.py` | 97 | 21 |
|
||||
| **Total** | **2,235** | **~165** |
|
||||
|
||||
(Counts are approximate and exclude string-literal Chinese, which is owned by adjacent tickets #2/#3/#4/#5/#6.)
|
||||
After the pass, every file in the table reports zero in-scope hits from the AST scanner.
|
||||
|
||||
## Suggested follow-up split
|
||||
## Remaining residuals (out of scope — owned by sibling tickets)
|
||||
After this PR, the only files under `backend/` that still contain CJK characters do so exclusively inside string literals. These are owned by sibling tickets and are intentional residuals for this spec:
|
||||
|
||||
Three additional PRs of similar size to this one would complete the ticket:
|
||||
- LLM prompt template strings: `oasis_profile_generator.py`, `ontology_generator.py`, `simulation_config_generator.py`, `report_agent.py` — owned by tickets #2 / #3 / #4 / #5.
|
||||
- Runtime log strings, API response messages, exception arguments, CLI prints: distributed across `api/`, `services/`, `scripts/`, `utils/retry.py`, `utils/locale.py`, `run.py`, `app/config.py` — owned by ticket #6 (with follow-up tickets #18, #24 for residuals).
|
||||
- Sample-data values returned to clients: `services/zep_tools.py`, `services/zep_graph_memory_updater.py`, `services/zep_entity_reader.py`, etc.
|
||||
|
||||
1. **PR 2 — `services/{oasis_profile_generator, simulation_config_generator, simulation_runner, zep_graph_memory_updater, zep_tools}`**
|
||||
2. **PR 3 — `services/report_agent.py`** (single big file; isolating it keeps the diff reviewable)
|
||||
3. **PR 4 — `api/{graph,report,simulation}.py` + `scripts/run_{parallel,reddit,twitter}_simulation.py`**
|
||||
The CJK CI guard (`scripts/ci/i18n_cjk_guard.py`) enforces that this set never grows; the per-path baseline at `.kiro/specs/i18n-ci-guard/baseline.txt` is updated as part of this PR to reflect the new (lower) count.
|
||||
|
||||
## Verification methodology used
|
||||
The AST-aware scanner (`/tmp/scan_chinese.py` — also kept in commit context) classifies every Chinese-containing line into one of three buckets: `DOCSTRING` (in scope), `COMMENT` (in scope), `STRING_VALUE` (out of scope, owned by adjacent tickets). Each translated file was verified with:
|
||||
## Verification methodology
|
||||
The AST-aware scanner at `.kiro/specs/i18n-translate-backend-comments/scan_chinese.py` (committed in this branch) classifies every CJK-bearing line into one of three buckets:
|
||||
|
||||
1. `python -m py_compile <file>` — syntactic validity.
|
||||
2. The scanner returning `{'DOCSTRING': 0, 'COMMENT': 0}` for that file.
|
||||
3. `git diff <file>` review — only `#` lines and docstring lines change; no executable lines.
|
||||
- `DOCSTRING` — line lies inside a module/class/function docstring (in scope).
|
||||
- `COMMENT` — line contains a `#` and is not inside a docstring or string-literal span (in scope).
|
||||
- `STRING` — line is part of a string-literal value (out of scope, owned by sibling tickets).
|
||||
|
||||
For every translated file in this installment:
|
||||
|
||||
1. `python3 -m py_compile <file>` succeeds.
|
||||
2. The scanner reports `0` in-scope hits.
|
||||
3. `git diff <file>` shows only docstring lines and `#` comment lines changed; no signature, import, decorator, expression, or string-literal byte changes.
|
||||
|
||||
For two of the largest files (`api/simulation.py`, `report_agent.py`), the implementing agent additionally ran an AST-equivalence check (parsing both before and after, stripping docstrings, and confirming structural equality) to validate that no executable surface changed.
|
||||
|
||||
## Test environment caveat
|
||||
The repo's `uv sync` requires building `tiktoken` from source, which needs Rust. The sandbox running this implementation pass does not have Rust, so `cd backend && uv run python -m pytest scripts/test_profile_format.py` (the verification command in the spec) cannot be executed end-to-end here; the test command also fails on import for unrelated reasons (missing `graphiti_core`, etc.) before any of this PR's changes touched the tree. Because the change set is comments-and-docstrings-only, runtime behavior cannot be affected; the syntactic-validity check stands in for the test run in this environment.
|
||||
The repo's `uv sync` builds `tiktoken` from source, which requires a Rust toolchain. The sandbox running this implementation pass does not have Rust, so `cd backend && uv run python -m pytest scripts/test_profile_format.py` cannot be executed end-to-end here. Because the change set is comments-and-docstrings-only, runtime behavior cannot be affected; the syntactic-validity check (`py_compile` across all 12 files) stands in for the test run in this environment.
|
||||
|
||||
A developer with the project's normal dev environment (Rust toolchain installed, full `uv sync` succeeded) should re-run `cd backend && uv run python -m pytest scripts/test_profile_format.py` against this branch before merging to confirm.
|
||||
|
||||
## What is NOT changed
|
||||
- No string literal anywhere in the touched files.
|
||||
- No string literal anywhere in the touched files (verified by AST classification).
|
||||
- No executable Python statement.
|
||||
- No symbol renamed.
|
||||
- No file added or removed.
|
||||
- No symbol renamed; `zep_*` legacy filenames preserved per steering rule.
|
||||
- No file added or removed (other than the AST scanner inside `.kiro/specs/i18n-translate-backend-comments/`).
|
||||
- No dependency added or version-bumped.
|
||||
|
||||
## Branch & PR
|
||||
- Branch: `docs/i18n-7-translate-backend-comments` (re-used from PR #20; that PR was merged into `feat/i18n-6-externalize-backend-logs` after `feat/i18n-6` had already merged into `main`, which orphaned PR #20's content from `main`).
|
||||
- This PR re-targets the branch at `main`, including: the four prior commits from PR #20, a `Merge branch 'main'` commit (one conflict resolved in `services/ontology_generator.py` to combine PR #20's translated comment with main's English prompt-string), and the new commits for the 12 files completed here.
|
||||
- Commits follow Conventional Commits in the form `docs(i18n): translate chinese docstrings/comments in backend/<area>`.
|
||||
- The PR description references issue #7 with `Closes #7`.
|
||||
- No `Co-Authored-By:` watermarks.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,92 @@
|
|||
#!/usr/bin/env python3
|
||||
"""AST-aware classifier of Chinese characters in a Python source file.
|
||||
|
||||
Usage::
|
||||
|
||||
python3 .kiro/specs/i18n-translate-backend-comments/scan_chinese.py <path>
|
||||
|
||||
Classifies every line containing CJK Unified Ideographs (U+4E00..U+9FFF)
|
||||
into one of three buckets:
|
||||
|
||||
* ``DOCSTRING`` — line lies within a module/class/function docstring (in
|
||||
scope for ticket #7).
|
||||
* ``COMMENT`` — line contains a ``#`` and is not inside a docstring or
|
||||
a string literal span (in scope for ticket #7).
|
||||
* ``STRING`` — line is part of a string literal value (out of scope —
|
||||
owned by sibling tickets #2/#3/#4/#5/#6).
|
||||
|
||||
Exit code is the count of in-scope hits (DOCSTRING + COMMENT). Stdout
|
||||
lists each in-scope hit as ``<line> <bucket>: <content>`` so callers can
|
||||
inspect them.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ast
|
||||
import pathlib
|
||||
import re
|
||||
import sys
|
||||
|
||||
CJK_RE = re.compile(r"[一-鿿]")
|
||||
|
||||
|
||||
def classify(path: pathlib.Path) -> int:
|
||||
text = path.read_text(encoding="utf-8")
|
||||
lines = text.split("\n")
|
||||
tree = ast.parse(text)
|
||||
|
||||
docstring_lines: set[int] = set()
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef, ast.Module)):
|
||||
ds = ast.get_docstring(node, clean=False)
|
||||
if ds is None:
|
||||
continue
|
||||
body = node.body
|
||||
if not body or not isinstance(body[0], ast.Expr):
|
||||
continue
|
||||
const = body[0].value
|
||||
if isinstance(const, ast.Constant) and isinstance(const.value, str):
|
||||
start = const.lineno
|
||||
end = getattr(const, "end_lineno", start)
|
||||
for ln in range(start, end + 1):
|
||||
docstring_lines.add(ln)
|
||||
|
||||
string_value_lines: set[int] = set()
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.Constant) and isinstance(node.value, str):
|
||||
start = node.lineno
|
||||
end = getattr(node, "end_lineno", start)
|
||||
for ln in range(start, end + 1):
|
||||
string_value_lines.add(ln)
|
||||
|
||||
in_scope_count = 0
|
||||
for i, line in enumerate(lines, start=1):
|
||||
if not CJK_RE.search(line):
|
||||
continue
|
||||
if i in docstring_lines:
|
||||
print(f"{i:5d} DOCSTRING: {line.rstrip()[:120]}")
|
||||
in_scope_count += 1
|
||||
elif i in string_value_lines:
|
||||
# Out of scope: owned by sibling tickets.
|
||||
pass
|
||||
elif "#" in line:
|
||||
print(f"{i:5d} COMMENT : {line.rstrip()[:120]}")
|
||||
in_scope_count += 1
|
||||
# else: unclassified — treat as out of scope (STRING value spanning).
|
||||
|
||||
return in_scope_count
|
||||
|
||||
|
||||
def main(argv: list[str]) -> int:
|
||||
if len(argv) < 2:
|
||||
print("usage: scan_chinese.py <path>", file=sys.stderr)
|
||||
return 2
|
||||
path = pathlib.Path(argv[1])
|
||||
in_scope = classify(path)
|
||||
print(f"---", file=sys.stderr)
|
||||
print(f"in-scope CJK hits in {path}: {in_scope}", file=sys.stderr)
|
||||
return 0 if in_scope == 0 else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main(sys.argv))
|
||||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
## Foundation
|
||||
|
||||
- [ ] 1. Establish baseline and working branch
|
||||
- [x] 1. Establish baseline and working branch
|
||||
- [x] 1.1 Create translation working branch and capture baseline state
|
||||
- Create branch `docs/i18n-7-translate-backend-comments` from `main`.
|
||||
- Capture the baseline residual hits by running the discovery scan (the regex `[一-鿿]` against `backend/**/*.py`, excluding `.venv`); record the file list as the work queue.
|
||||
|
|
@ -12,7 +12,7 @@
|
|||
|
||||
## Core — Per-Package Translation
|
||||
|
||||
- [ ] 2. Translate Chinese docstrings and inline comments per package
|
||||
- [x] 2. Translate Chinese docstrings and inline comments per package
|
||||
|
||||
- [x] 2.1 (P) Translate `backend/app/models/`
|
||||
- Translate Chinese module/class/function docstrings and `#` comments in `backend/app/models/__init__.py`, `backend/app/models/project.py`, and `backend/app/models/task.py`.
|
||||
|
|
@ -35,7 +35,7 @@
|
|||
- _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5_
|
||||
- _Boundary: backend/app/utils/_
|
||||
|
||||
- [-] 2.3 (P) Translate `backend/app/services/` — partial (7 of 12 files done; 5 remain — see HANDOFF.md)
|
||||
- [x] 2.3 (P) Translate `backend/app/services/` — complete (all 12 files; finished in this installment)
|
||||
- Translate Chinese docstrings and `#` comments across all 12 service files: `__init__.py`, `graph_builder.py`, `ontology_generator.py`, `oasis_profile_generator.py`, `report_agent.py`, `simulation_config_generator.py`, `simulation_ipc.py`, `simulation_manager.py`, `simulation_runner.py`, `text_processor.py`, `zep_entity_reader.py`, `zep_graph_memory_updater.py`, `zep_tools.py`.
|
||||
- Treat all triple-quoted prompt templates and value strings as out of scope (owned by issues #2/#3/#4/#5/#6) — only the first-statement docstrings of modules/classes/functions are in scope.
|
||||
- Apply Rules 1–5 from `design.md`.
|
||||
|
|
@ -45,7 +45,7 @@
|
|||
- _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5_
|
||||
- _Boundary: backend/app/services/_
|
||||
|
||||
- [-] 2.4 (P) Translate `backend/app/api/` — partial (only `__init__.py` done; 3 files remain — see HANDOFF.md)
|
||||
- [x] 2.4 (P) Translate `backend/app/api/` — complete (all 4 files; finished in this installment)
|
||||
- Translate Chinese docstrings and `#` comments in `__init__.py`, `graph.py`, `report.py`, `simulation.py`.
|
||||
- Treat any user-facing string-literal Chinese in API responses as out of scope (owned by issue #6).
|
||||
- Apply Rules 1–5 from `design.md`.
|
||||
|
|
@ -55,7 +55,7 @@
|
|||
- _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5_
|
||||
- _Boundary: backend/app/api/_
|
||||
|
||||
- [-] 2.5 (P) Translate `backend/scripts/` — partial (`action_logger.py`, `test_profile_format.py` done; 3 `run_*_simulation.py` files remain — see HANDOFF.md)
|
||||
- [x] 2.5 (P) Translate `backend/scripts/` — complete (all 5 files; finished in this installment)
|
||||
- Translate Chinese docstrings and `#` comments in `action_logger.py`, `run_parallel_simulation.py`, `run_reddit_simulation.py`, `run_twitter_simulation.py`, `test_profile_format.py`.
|
||||
- Apply Rules 1–5 from `design.md`.
|
||||
- Be especially careful with `test_profile_format.py`: any Chinese in test data string literals is out of scope; only docstrings and `#` comments are in scope.
|
||||
|
|
@ -77,9 +77,9 @@
|
|||
|
||||
## Validation
|
||||
|
||||
- [ ] 3. Final verification and PR preparation
|
||||
- [x] 3. Final verification and PR preparation
|
||||
|
||||
- [-] 3.1 Run the final verification gate — partial (per-file scanner + py_compile pass; full pytest blocked by pre-existing env issues, see HANDOFF.md)
|
||||
- [x] 3.1 Run the final verification gate — scanner + py_compile pass on all 12 newly-translated files; CJK guard baseline updated (backend/app: 2792 → 307); pytest blocked by pre-existing env issues, see HANDOFF.md
|
||||
- Run the residual scan one more time and confirm the only remaining hits are files where the Chinese is in string literals owned by issues #2/#3/#4/#5/#6, plus the intentional Chinese in `backend/tests/test_locale*.py`.
|
||||
- Run `cd backend && uv run python -m pytest scripts/test_profile_format.py` and confirm exit 0.
|
||||
- Run `git diff --stat origin/main...HEAD` and confirm only in-scope file paths under `backend/app/`, `backend/run.py`, and `backend/scripts/` are listed.
|
||||
|
|
|
|||
Loading…
Reference in New Issue