From 339cc396dded784894c33e7eeba8f3e7566398d4 Mon Sep 17 00:00:00 2001 From: Dominik Seemann Date: Sat, 9 May 2026 10:59:51 +0000 Subject: [PATCH] chore(i18n): refresh cjk baseline and update spec status backend/app baseline drops from 2792 to 307 after the comment/docstring translation pass. Mark i18n-translate-backend-comments tasks complete in the spec and update HANDOFF.md to record the second-installment scope. Add the AST-aware scanner used during verification under the spec directory so future audits can re-run it. --- .kiro/specs/i18n-ci-guard/baseline.txt | 4 +- .../HANDOFF.md | 87 +++++++++++------- .../scan_chinese.py | 92 +++++++++++++++++++ .../i18n-translate-backend-comments/tasks.md | 14 +-- 4 files changed, 153 insertions(+), 44 deletions(-) create mode 100644 .kiro/specs/i18n-translate-backend-comments/scan_chinese.py diff --git a/.kiro/specs/i18n-ci-guard/baseline.txt b/.kiro/specs/i18n-ci-guard/baseline.txt index e92f1a6e..94f44463 100644 --- a/.kiro/specs/i18n-ci-guard/baseline.txt +++ b/.kiro/specs/i18n-ci-guard/baseline.txt @@ -1,5 +1,5 @@ # Per-path CJK baseline for the i18n CI guard. # Format: \t. Sorted lexicographically. # Refresh via: python scripts/ci/i18n_cjk_guard.py --update-baseline -backend/app 2792 -frontend/src 902 +backend/app 307 +frontend/src 124 diff --git a/.kiro/specs/i18n-translate-backend-comments/HANDOFF.md b/.kiro/specs/i18n-translate-backend-comments/HANDOFF.md index bb960b16..0e589d02 100644 --- a/.kiro/specs/i18n-translate-backend-comments/HANDOFF.md +++ b/.kiro/specs/i18n-translate-backend-comments/HANDOFF.md @@ -1,61 +1,78 @@ # Handoff — `i18n-translate-backend-comments` (Issue #7) ## Status -**Partial completion.** This is the first installment of the ticket-#7 cleanup. The ticket explicitly allows splitting the work across multiple small PRs ("Low-risk, high-volume mechanical task; can be split across multiple small PRs"). This PR ships translations for the smaller files; the larger service and API files remain for follow-up PRs. +**Complete.** All in-scope Chinese docstrings and `#` comments under `backend/` have been translated to English. -## Completed in this PR (23 files) -All translated to English with no behavior or string-literal changes: +This second installment of the ticket-#7 cleanup builds on the first installment (PR #20) and finishes the remaining 12 files. Together, the two installments cover the full 35-file in-scope set. +## Completed across both installments (35 files) + +### First installment (PR #20 — landed on `feat/i18n-6-externalize-backend-logs`, then merged here via `merge main` into this branch) - **Root**: `backend/app/__init__.py`, `backend/app/config.py`, `backend/run.py` - **API package init**: `backend/app/api/__init__.py` - **Models** (full package): `backend/app/models/__init__.py`, `project.py`, `task.py` -- **Utils** (full package): `backend/app/utils/__init__.py`, `file_parser.py`, `llm_client.py`, `locale.py` (no docstring/comment Chinese to begin with), `logger.py`, `retry.py`, `zep_paging.py` +- **Utils** (full package): `backend/app/utils/__init__.py`, `file_parser.py`, `llm_client.py`, `locale.py`, `logger.py`, `retry.py`, `zep_paging.py` - **Services** (partial): `backend/app/services/__init__.py`, `graph_builder.py`, `ontology_generator.py`, `simulation_ipc.py`, `simulation_manager.py`, `text_processor.py`, `zep_entity_reader.py` - **Scripts** (partial): `backend/scripts/action_logger.py`, `backend/scripts/test_profile_format.py` -## Remaining for follow-up PRs (12 files) -Per the AST-aware scanner used in this PR (`/tmp/scan_chinese.py`), the residual in-scope work totals **2,235 hits** (1,203 docstring lines + 1,032 inline-comment lines) across these files: - -| File | Approx in-scope hits | Approx LOC | +### Second installment (this PR — finishes the ticket) +| File | Starting in-scope hits | Comment-the-obvious deletions | | --- | --- | --- | -| `backend/app/api/graph.py` | ~50 | 665 | -| `backend/app/api/report.py` | ~80 | 1020 | -| `backend/app/api/simulation.py` | ~250 | 2712 | -| `backend/app/services/oasis_profile_generator.py` | ~230 | 1195 | -| `backend/app/services/report_agent.py` | ~520 | 2572 | -| `backend/app/services/simulation_config_generator.py` | ~150 | 991 | -| `backend/app/services/simulation_runner.py` | ~330 | 1768 | -| `backend/app/services/zep_graph_memory_updater.py` | ~110 | 544 | -| `backend/app/services/zep_tools.py` | ~280 | 1741 | -| `backend/scripts/run_parallel_simulation.py` | ~150 | 1699 | -| `backend/scripts/run_reddit_simulation.py` | ~50 | 769 | -| `backend/scripts/run_twitter_simulation.py` | ~50 | 780 | +| `backend/app/api/graph.py` | 70 | 25 | +| `backend/app/api/report.py` | 104 | 11 | +| `backend/app/api/simulation.py` | 351 | ~25 | +| `backend/app/services/oasis_profile_generator.py` | 185 | ~14 | +| `backend/app/services/report_agent.py` | 335 | 8 | +| `backend/app/services/simulation_config_generator.py` | 148 | 0 | +| `backend/app/services/simulation_runner.py` | 277 | ~31 | +| `backend/app/services/zep_graph_memory_updater.py` | 97 | 5 | +| `backend/app/services/zep_tools.py` | 269 | 6 | +| `backend/scripts/run_parallel_simulation.py` | 227 | ~7 | +| `backend/scripts/run_reddit_simulation.py` | 75 | 12 | +| `backend/scripts/run_twitter_simulation.py` | 97 | 21 | +| **Total** | **2,235** | **~165** | -(Counts are approximate and exclude string-literal Chinese, which is owned by adjacent tickets #2/#3/#4/#5/#6.) +After the pass, every file in the table reports zero in-scope hits from the AST scanner. -## Suggested follow-up split +## Remaining residuals (out of scope — owned by sibling tickets) +After this PR, the only files under `backend/` that still contain CJK characters do so exclusively inside string literals. These are owned by sibling tickets and are intentional residuals for this spec: -Three additional PRs of similar size to this one would complete the ticket: +- LLM prompt template strings: `oasis_profile_generator.py`, `ontology_generator.py`, `simulation_config_generator.py`, `report_agent.py` — owned by tickets #2 / #3 / #4 / #5. +- Runtime log strings, API response messages, exception arguments, CLI prints: distributed across `api/`, `services/`, `scripts/`, `utils/retry.py`, `utils/locale.py`, `run.py`, `app/config.py` — owned by ticket #6 (with follow-up tickets #18, #24 for residuals). +- Sample-data values returned to clients: `services/zep_tools.py`, `services/zep_graph_memory_updater.py`, `services/zep_entity_reader.py`, etc. -1. **PR 2 — `services/{oasis_profile_generator, simulation_config_generator, simulation_runner, zep_graph_memory_updater, zep_tools}`** -2. **PR 3 — `services/report_agent.py`** (single big file; isolating it keeps the diff reviewable) -3. **PR 4 — `api/{graph,report,simulation}.py` + `scripts/run_{parallel,reddit,twitter}_simulation.py`** +The CJK CI guard (`scripts/ci/i18n_cjk_guard.py`) enforces that this set never grows; the per-path baseline at `.kiro/specs/i18n-ci-guard/baseline.txt` is updated as part of this PR to reflect the new (lower) count. -## Verification methodology used -The AST-aware scanner (`/tmp/scan_chinese.py` — also kept in commit context) classifies every Chinese-containing line into one of three buckets: `DOCSTRING` (in scope), `COMMENT` (in scope), `STRING_VALUE` (out of scope, owned by adjacent tickets). Each translated file was verified with: +## Verification methodology +The AST-aware scanner at `.kiro/specs/i18n-translate-backend-comments/scan_chinese.py` (committed in this branch) classifies every CJK-bearing line into one of three buckets: -1. `python -m py_compile ` — syntactic validity. -2. The scanner returning `{'DOCSTRING': 0, 'COMMENT': 0}` for that file. -3. `git diff ` review — only `#` lines and docstring lines change; no executable lines. +- `DOCSTRING` — line lies inside a module/class/function docstring (in scope). +- `COMMENT` — line contains a `#` and is not inside a docstring or string-literal span (in scope). +- `STRING` — line is part of a string-literal value (out of scope, owned by sibling tickets). + +For every translated file in this installment: + +1. `python3 -m py_compile ` succeeds. +2. The scanner reports `0` in-scope hits. +3. `git diff ` shows only docstring lines and `#` comment lines changed; no signature, import, decorator, expression, or string-literal byte changes. + +For two of the largest files (`api/simulation.py`, `report_agent.py`), the implementing agent additionally ran an AST-equivalence check (parsing both before and after, stripping docstrings, and confirming structural equality) to validate that no executable surface changed. ## Test environment caveat -The repo's `uv sync` requires building `tiktoken` from source, which needs Rust. The sandbox running this implementation pass does not have Rust, so `cd backend && uv run python -m pytest scripts/test_profile_format.py` (the verification command in the spec) cannot be executed end-to-end here; the test command also fails on import for unrelated reasons (missing `graphiti_core`, etc.) before any of this PR's changes touched the tree. Because the change set is comments-and-docstrings-only, runtime behavior cannot be affected; the syntactic-validity check stands in for the test run in this environment. +The repo's `uv sync` builds `tiktoken` from source, which requires a Rust toolchain. The sandbox running this implementation pass does not have Rust, so `cd backend && uv run python -m pytest scripts/test_profile_format.py` cannot be executed end-to-end here. Because the change set is comments-and-docstrings-only, runtime behavior cannot be affected; the syntactic-validity check (`py_compile` across all 12 files) stands in for the test run in this environment. A developer with the project's normal dev environment (Rust toolchain installed, full `uv sync` succeeded) should re-run `cd backend && uv run python -m pytest scripts/test_profile_format.py` against this branch before merging to confirm. ## What is NOT changed -- No string literal anywhere in the touched files. +- No string literal anywhere in the touched files (verified by AST classification). - No executable Python statement. -- No symbol renamed. -- No file added or removed. +- No symbol renamed; `zep_*` legacy filenames preserved per steering rule. +- No file added or removed (other than the AST scanner inside `.kiro/specs/i18n-translate-backend-comments/`). - No dependency added or version-bumped. + +## Branch & PR +- Branch: `docs/i18n-7-translate-backend-comments` (re-used from PR #20; that PR was merged into `feat/i18n-6-externalize-backend-logs` after `feat/i18n-6` had already merged into `main`, which orphaned PR #20's content from `main`). +- This PR re-targets the branch at `main`, including: the four prior commits from PR #20, a `Merge branch 'main'` commit (one conflict resolved in `services/ontology_generator.py` to combine PR #20's translated comment with main's English prompt-string), and the new commits for the 12 files completed here. +- Commits follow Conventional Commits in the form `docs(i18n): translate chinese docstrings/comments in backend/`. +- The PR description references issue #7 with `Closes #7`. +- No `Co-Authored-By:` watermarks. diff --git a/.kiro/specs/i18n-translate-backend-comments/scan_chinese.py b/.kiro/specs/i18n-translate-backend-comments/scan_chinese.py new file mode 100644 index 00000000..d7835870 --- /dev/null +++ b/.kiro/specs/i18n-translate-backend-comments/scan_chinese.py @@ -0,0 +1,92 @@ +#!/usr/bin/env python3 +"""AST-aware classifier of Chinese characters in a Python source file. + +Usage:: + + python3 .kiro/specs/i18n-translate-backend-comments/scan_chinese.py + +Classifies every line containing CJK Unified Ideographs (U+4E00..U+9FFF) +into one of three buckets: + +* ``DOCSTRING`` — line lies within a module/class/function docstring (in + scope for ticket #7). +* ``COMMENT`` — line contains a ``#`` and is not inside a docstring or + a string literal span (in scope for ticket #7). +* ``STRING`` — line is part of a string literal value (out of scope — + owned by sibling tickets #2/#3/#4/#5/#6). + +Exit code is the count of in-scope hits (DOCSTRING + COMMENT). Stdout +lists each in-scope hit as `` : `` so callers can +inspect them. +""" + +from __future__ import annotations + +import ast +import pathlib +import re +import sys + +CJK_RE = re.compile(r"[一-鿿]") + + +def classify(path: pathlib.Path) -> int: + text = path.read_text(encoding="utf-8") + lines = text.split("\n") + tree = ast.parse(text) + + docstring_lines: set[int] = set() + for node in ast.walk(tree): + if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef, ast.Module)): + ds = ast.get_docstring(node, clean=False) + if ds is None: + continue + body = node.body + if not body or not isinstance(body[0], ast.Expr): + continue + const = body[0].value + if isinstance(const, ast.Constant) and isinstance(const.value, str): + start = const.lineno + end = getattr(const, "end_lineno", start) + for ln in range(start, end + 1): + docstring_lines.add(ln) + + string_value_lines: set[int] = set() + for node in ast.walk(tree): + if isinstance(node, ast.Constant) and isinstance(node.value, str): + start = node.lineno + end = getattr(node, "end_lineno", start) + for ln in range(start, end + 1): + string_value_lines.add(ln) + + in_scope_count = 0 + for i, line in enumerate(lines, start=1): + if not CJK_RE.search(line): + continue + if i in docstring_lines: + print(f"{i:5d} DOCSTRING: {line.rstrip()[:120]}") + in_scope_count += 1 + elif i in string_value_lines: + # Out of scope: owned by sibling tickets. + pass + elif "#" in line: + print(f"{i:5d} COMMENT : {line.rstrip()[:120]}") + in_scope_count += 1 + # else: unclassified — treat as out of scope (STRING value spanning). + + return in_scope_count + + +def main(argv: list[str]) -> int: + if len(argv) < 2: + print("usage: scan_chinese.py ", file=sys.stderr) + return 2 + path = pathlib.Path(argv[1]) + in_scope = classify(path) + print(f"---", file=sys.stderr) + print(f"in-scope CJK hits in {path}: {in_scope}", file=sys.stderr) + return 0 if in_scope == 0 else 1 + + +if __name__ == "__main__": + raise SystemExit(main(sys.argv)) diff --git a/.kiro/specs/i18n-translate-backend-comments/tasks.md b/.kiro/specs/i18n-translate-backend-comments/tasks.md index 279e57e6..6f0bb279 100644 --- a/.kiro/specs/i18n-translate-backend-comments/tasks.md +++ b/.kiro/specs/i18n-translate-backend-comments/tasks.md @@ -2,7 +2,7 @@ ## Foundation -- [ ] 1. Establish baseline and working branch +- [x] 1. Establish baseline and working branch - [x] 1.1 Create translation working branch and capture baseline state - Create branch `docs/i18n-7-translate-backend-comments` from `main`. - Capture the baseline residual hits by running the discovery scan (the regex `[一-鿿]` against `backend/**/*.py`, excluding `.venv`); record the file list as the work queue. @@ -12,7 +12,7 @@ ## Core — Per-Package Translation -- [ ] 2. Translate Chinese docstrings and inline comments per package +- [x] 2. Translate Chinese docstrings and inline comments per package - [x] 2.1 (P) Translate `backend/app/models/` - Translate Chinese module/class/function docstrings and `#` comments in `backend/app/models/__init__.py`, `backend/app/models/project.py`, and `backend/app/models/task.py`. @@ -35,7 +35,7 @@ - _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5_ - _Boundary: backend/app/utils/_ -- [-] 2.3 (P) Translate `backend/app/services/` — partial (7 of 12 files done; 5 remain — see HANDOFF.md) +- [x] 2.3 (P) Translate `backend/app/services/` — complete (all 12 files; finished in this installment) - Translate Chinese docstrings and `#` comments across all 12 service files: `__init__.py`, `graph_builder.py`, `ontology_generator.py`, `oasis_profile_generator.py`, `report_agent.py`, `simulation_config_generator.py`, `simulation_ipc.py`, `simulation_manager.py`, `simulation_runner.py`, `text_processor.py`, `zep_entity_reader.py`, `zep_graph_memory_updater.py`, `zep_tools.py`. - Treat all triple-quoted prompt templates and value strings as out of scope (owned by issues #2/#3/#4/#5/#6) — only the first-statement docstrings of modules/classes/functions are in scope. - Apply Rules 1–5 from `design.md`. @@ -45,7 +45,7 @@ - _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5_ - _Boundary: backend/app/services/_ -- [-] 2.4 (P) Translate `backend/app/api/` — partial (only `__init__.py` done; 3 files remain — see HANDOFF.md) +- [x] 2.4 (P) Translate `backend/app/api/` — complete (all 4 files; finished in this installment) - Translate Chinese docstrings and `#` comments in `__init__.py`, `graph.py`, `report.py`, `simulation.py`. - Treat any user-facing string-literal Chinese in API responses as out of scope (owned by issue #6). - Apply Rules 1–5 from `design.md`. @@ -55,7 +55,7 @@ - _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.1, 4.2, 4.3, 4.4, 4.5_ - _Boundary: backend/app/api/_ -- [-] 2.5 (P) Translate `backend/scripts/` — partial (`action_logger.py`, `test_profile_format.py` done; 3 `run_*_simulation.py` files remain — see HANDOFF.md) +- [x] 2.5 (P) Translate `backend/scripts/` — complete (all 5 files; finished in this installment) - Translate Chinese docstrings and `#` comments in `action_logger.py`, `run_parallel_simulation.py`, `run_reddit_simulation.py`, `run_twitter_simulation.py`, `test_profile_format.py`. - Apply Rules 1–5 from `design.md`. - Be especially careful with `test_profile_format.py`: any Chinese in test data string literals is out of scope; only docstrings and `#` comments are in scope. @@ -77,9 +77,9 @@ ## Validation -- [ ] 3. Final verification and PR preparation +- [x] 3. Final verification and PR preparation -- [-] 3.1 Run the final verification gate — partial (per-file scanner + py_compile pass; full pytest blocked by pre-existing env issues, see HANDOFF.md) +- [x] 3.1 Run the final verification gate — scanner + py_compile pass on all 12 newly-translated files; CJK guard baseline updated (backend/app: 2792 → 307); pytest blocked by pre-existing env issues, see HANDOFF.md - Run the residual scan one more time and confirm the only remaining hits are files where the Chinese is in string literals owned by issues #2/#3/#4/#5/#6, plus the intentional Chinese in `backend/tests/test_locale*.py`. - Run `cd backend && uv run python -m pytest scripts/test_profile_format.py` and confirm exit 0. - Run `git diff --stat origin/main...HEAD` and confirm only in-scope file paths under `backend/app/`, `backend/run.py`, and `backend/scripts/` are listed.