8.2 KiB
Implementation Tasks — i18n-ci-guard
Approved spec: see
requirements.md,design.md,research.md,gap-analysis.mdin this directory.
Tasks
-
1. Foundation: scaffold the CI guard script with stable CLI surface and stdlib-only dependencies
-
1.1 Create the empty guard script and CLI skeleton
- Place the new script at the path designated by the design (
scripts/ci/). - Establish the module docstring, the canonical CJK regex constant, the
scoped-paths constant tuple, and the
argparseparser exposing default check mode plus an explicit--update-baselineflag and a--baselinepath override. - Confirm the script exits 0 on a smoke
--helpinvocation and rejects unknown flags with non-zero exit. - Observable: running
python scripts/ci/i18n_cjk_guard.py --helpfrom the repo root prints usage text containing every documented flag and exits 0; running with an unknown flag exits non-zero. - Requirements: 5.5, 6.2, 6.3
- Boundary: i18n_cjk_guard.py
- Place the new script at the path designated by the design (
-
2. Core: implement the two CJK checks
-
2.1 Implement the locale-catalogue scan
- Recursively walk the parsed
locales/en.jsondict, applying the canonical regex to every string leaf to gather offending entries. - Compute the source line number by re-reading the file as text and matching the value's first textual occurrence; truncate snippets to the documented snippet length.
- On a missing or unreadable catalogue file, emit a clear stderr message and exit non-zero.
- Observable: against a synthetic clean catalogue, the function returns an empty list; against a synthetic catalogue with one CJK value, it returns exactly one finding tuple with the correct dotted key and line number.
- Requirements: 1.1, 1.2, 1.3, 1.4, 3.1
- Boundary: i18n_cjk_guard.py
- Recursively walk the parsed
-
2.2 (P) Implement the per-path CJK count via
git grep- Invoke
git grep -nIP '[\x{4e00}-\x{9fff}]' -- <scoped_path>for each scoped path; treat exit codes 0 (matches found) and 1 (no matches) as success, any other exit code as a hard error reported on stderr. - Count lines of stdout; the result for a zero-match path must be the
integer
0, never an exception. - Reject working-tree states where
gitis not available or PCRE is not enabled, with a clear stderr message. - Observable: against a tmp git repository with N planted CJK lines under a scoped path, the function returns N; with zero CJK content, it returns 0; binary files and untracked files do not contribute.
- Requirements: 2.1, 2.4, 2.5, 2.6
- Boundary: i18n_cjk_guard.py
- Invoke
-
2.3 Implement baseline file read/write with strict format
- Parse the baseline file as
<path>\t<count>lines, ignoring#comments and blank lines, raising a typed error on malformed input or missing file. - Write atomically (
tmp + os.replace) with sorted entries, a single header comment block, and a single trailing newline. - Observable: a round-trip write/read of a deterministic counts dict
yields the same dict; a baseline file containing a non-tab line is
rejected with a clear error; the baseline file ends with exactly one
\n. - Requirements: 4.2, 4.3
- Boundary: i18n_cjk_guard.py
- Parse the baseline file as
-
3. Integration: wire the two checks into the default and refresh modes
-
3.1 Compose the default check mode
- Run both checks under all conditions (do not short-circuit), so a single CI log shows every failure in one pass.
- Print a one-line success summary per check on stdout when both pass.
- On locale failure, print
<file>:<line>: <reason>: <snippet>lines on stderr and a trailingN issuessummary; on regression failure, print<path>: cjk-regression: baseline=<b> current=<c> delta=+<d>lines plus the exact verbatim refresh command. - Surface a non-zero exit when either check fails and exit 0 only when both pass.
- Observable: against a working tree with the committed baseline at or above the current count and a CJK-clean en.json, exit code is 0 and stdout contains the success summary; planting one CJK char in en.json or planting enough new CJK lines to break the baseline yields exit 1 and the documented stderr text.
- Requirements: 1.2, 1.3, 1.4, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.4, 4.5
- Boundary: i18n_cjk_guard.py
-
3.2 Compose the
--update-baselinemode- When the flag is provided, recompute current per-path counts and overwrite the baseline file via the atomic writer; print the new counts on stdout; exit 0.
- When the flag is absent, never write the baseline file under any code path.
- Observable: invoking with
--update-baselinerewrites the baseline file's contents to match current counts and exits 0; running the default mode immediately afterward exits 0. - Requirements: 4.3, 4.4
- Boundary: i18n_cjk_guard.py
-
4. Establish the committed baseline anchored to
main -
4.1 Capture initial baseline counts against
main- Operate from a tree that reflects
origin/main's state for the scoped paths (e.g., a fresh checkout, a worktree atorigin/main, orgit checkout origin/main -- backend/app frontend/srcfollowed by a clean revert) so the committed baseline does not over- or under-count relative to the merge target. - Run
--update-baselineto materialize the counts; confirm the resulting file is exactly two non-comment data lines (one per scoped path) sorted lexicographically. - Observable: the baseline file is committed to
.kiro/specs/i18n-ci-guard/baseline.txtandpython scripts/ci/i18n_cjk_guard.pyagainst the samemain-aligned tree exits 0. - Requirements: 4.1, 4.2
- Boundary: baseline.txt
- Operate from a tree that reflects
-
5. Wire the guard into GitHub Actions on every PR to
main -
5.1 Add the PR-time workflow
- Create the workflow file at the path designated by the design,
triggered on
pull_requestwhose base ref ismain. - Set explicit minimal permissions (
contents: read), a one-minute job timeout,actions/checkout@v4withfetch-depth: 1, andactions/setup-python@v5pinned to Python 3.11. - The single executable step invokes the guard script with no arguments; the workflow surfaces the script's stdout and stderr in the GitHub Actions log without filtering.
- Observable: the workflow YAML parses cleanly; on a PR with no CJK regression, the job passes; on a PR that introduces a CJK regression or CJK in en.json, the job fails and the log shows the documented failure messages.
- Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6
- Boundary: i18n-cjk-guard.yml
- Create the workflow file at the path designated by the design,
triggered on
-
6. Validation: tests and end-to-end checks
-
6.1 Add unit and integration tests for the guard script
- Cover the locale scan against a synthetic clean catalogue and a synthetic CJK-tainted catalogue, asserting findings tuples match.
- Cover the per-path counter against a tmp git repo with both N>0 and N=0 planted CJK lines, asserting the zero-match path exits cleanly with a count of 0.
- Cover the baseline read/write round-trip and the malformed-input rejection path.
- Cover the default mode end-to-end (pass and fail paths) with the expected exit codes and stderr fragments, including the verbatim refresh command on regression failure.
- Observable:
python -m pytest scripts/ci/tests/test_i18n_cjk_guard.pyfrom the repo root passes locally with stdlib-only Python. - Requirements: 1.1, 1.2, 1.3, 1.4, 2.1, 2.4, 2.5, 2.6, 3.3, 4.3, 4.5, 6.1, 6.3
- Boundary: scripts/ci/tests/
-
6.2 Run the guard locally to confirm reproducibility against the committed baseline
- From a clean working tree at
main(or a worktree atorigin/main- this branch's new files merged on top), invoke the guard with no arguments and confirm exit code 0 and the success summary.
- Confirm the same command is the documented developer entry point referenced from the failure-message refresh hint.
- Observable: terminal session shows exit code 0 and the documented
one-line per-check success summary; the same script path (
scripts/ci/i18n_cjk_guard.py) appears verbatim in the regression-failure refresh hint. - Requirements: 6.1, 6.2, 6.3
- Boundary: i18n_cjk_guard.py, baseline.txt
- From a clean working tree at