ci(i18n): add cjk regression guard for every pull request
Adds a stdlib-only Python script and a new GitHub Actions workflow that fail any pull request which reintroduces CJK characters into locales/en.json or which raises the total CJK match count under backend/app or frontend/src above a committed per-path baseline. The guard captures the two highest-signal checks of the larger i18n-e2e-english-verification audit so it can run on every PR with a sub-second budget and without depending on that pipeline being on main. The committed baseline lets the codebase ratchet down toward English-only without blocking unrelated PRs on pre-existing CJK content; refresh it intentionally via the documented flag. Closes #26
This commit is contained in:
parent
063b7fb17d
commit
081de636f1
|
|
@ -0,0 +1,26 @@
|
|||
name: i18n CJK Guard
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
guard:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 1
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Run i18n CJK guard
|
||||
run: python scripts/ci/i18n_cjk_guard.py
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
# Per-path CJK baseline for the i18n CI guard.
|
||||
# Format: <path>\t<count>. Sorted lexicographically.
|
||||
# Refresh via: python scripts/ci/i18n_cjk_guard.py --update-baseline
|
||||
backend/app 2792
|
||||
frontend/src 902
|
||||
|
|
@ -0,0 +1,544 @@
|
|||
# Design — i18n-ci-guard
|
||||
|
||||
## Overview
|
||||
|
||||
This feature installs a permanent, PR-time CI guard that blocks
|
||||
regressions of the project's English-by-default state. It performs two
|
||||
checks: `locales/en.json` must contain zero CJK characters, and the
|
||||
total CJK match count under `backend/app/` and `frontend/src/` must not
|
||||
exceed a committed per-path baseline. The guard is a single Python
|
||||
script invoked by a single GitHub Actions workflow.
|
||||
|
||||
**Purpose**: This feature delivers an automatic regression gate to the
|
||||
i18n initiative so reviewers do not have to spot CJK reintroductions
|
||||
by eye.
|
||||
**Users**: Project maintainers and PR authors. Maintainers gain a
|
||||
hard regression gate; PR authors gain a script they can run locally to
|
||||
catch regressions before pushing.
|
||||
**Impact**: Adds the project's first `pull_request`-triggered CI
|
||||
workflow. No production source under `backend/app/`, `frontend/src/`,
|
||||
or `locales/` is modified by this spec — only new files are added.
|
||||
|
||||
### Goals
|
||||
|
||||
- Fail any PR that introduces a CJK character into `locales/en.json`.
|
||||
- Fail any PR whose CJK match count under `backend/app/` or
|
||||
`frontend/src/` exceeds the committed baseline.
|
||||
- Print a single actionable failure message that includes the exact
|
||||
command a contributor must run if the regression is intentional.
|
||||
- Run end-to-end under sixty seconds on `ubuntu-latest`.
|
||||
- Be reproducible verbatim on a developer machine with Python ≥3.11
|
||||
and `git`.
|
||||
|
||||
### Non-Goals
|
||||
|
||||
- Re-implementing the full classification pipeline from
|
||||
`.kiro/specs/i18n-e2e-english-verification/` (that work belongs to
|
||||
PR #27).
|
||||
- Auto-updating the baseline on `main`.
|
||||
- Translating any production source to satisfy a higher baseline. The
|
||||
initial baseline is recorded against `main` and only ratchets down
|
||||
over time.
|
||||
- Gating commits at pre-commit time. The guard is CI-only; a future
|
||||
spec may wrap it in a hook.
|
||||
|
||||
## Boundary Commitments
|
||||
|
||||
### This Spec Owns
|
||||
|
||||
- The guard script `scripts/ci/i18n_cjk_guard.py` and its CLI
|
||||
contract.
|
||||
- The workflow `.github/workflows/i18n-cjk-guard.yml` and its
|
||||
trigger configuration.
|
||||
- The baseline file `.kiro/specs/i18n-ci-guard/baseline.txt` and its
|
||||
format.
|
||||
- The pass/fail semantics of both checks.
|
||||
|
||||
### Out of Boundary
|
||||
|
||||
- Any change to files under `backend/app/`, `frontend/src/`, or
|
||||
`locales/` — except `locales/en.json` if it is found to contain CJK
|
||||
during initial baseline calibration (a remediation translation would
|
||||
be a separate spec/PR).
|
||||
- The classification heuristics in PR #27's `classify.py`.
|
||||
- Pre-commit hooks; IDE integrations; alternative scoped paths beyond
|
||||
`backend/app/` and `frontend/src/`.
|
||||
|
||||
### Allowed Dependencies
|
||||
|
||||
- Python ≥3.11 standard library.
|
||||
- `git` (for `git grep -nIP` invocation).
|
||||
- `actions/checkout@v4` and `actions/setup-python@v5` from the
|
||||
GitHub Actions Marketplace.
|
||||
|
||||
### Revalidation Triggers
|
||||
|
||||
- Adding a third scoped path → baseline file format changes; consumers
|
||||
(none today) re-check.
|
||||
- Changing the regex range → audit pipeline alignment must be
|
||||
re-confirmed.
|
||||
- Switching from `pull_request` to `merge_group` or other event →
|
||||
required-status-check rules in branch protection must be re-checked.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Existing Architecture Analysis
|
||||
|
||||
- **Repo layout**: monorepo split by runtime (`backend/`, `frontend/`)
|
||||
with shared `locales/` at root. The guard scopes its scan to
|
||||
`backend/app/`, `frontend/src/`, and `locales/en.json`, matching the
|
||||
audit pipeline's canonical scope.
|
||||
- **Existing scripts pattern**: `scripts/<purpose>.py` for developer
|
||||
tools. The new `scripts/ci/` subdirectory introduces a clear,
|
||||
CI-only home without disturbing the existing developer scripts.
|
||||
- **Existing CI**: `.github/workflows/docker-image.yml` is tag-only.
|
||||
No `pull_request` workflow exists. The new workflow is additive and
|
||||
does not affect the docker-image workflow.
|
||||
|
||||
### Architecture Pattern & Boundary Map
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
PR[Pull Request to main] -->|trigger| WF[.github/workflows/i18n-cjk-guard.yml]
|
||||
WF -->|setup-python + checkout| RUN[python scripts/ci/i18n_cjk_guard.py]
|
||||
RUN -->|read| EN[locales/en.json]
|
||||
RUN -->|git grep -nIP| BAPP[backend/app/]
|
||||
RUN -->|git grep -nIP| FSRC[frontend/src/]
|
||||
RUN -->|read| BL[.kiro/specs/i18n-ci-guard/baseline.txt]
|
||||
RUN -->|exit 0 or 1| WF
|
||||
WF -->|status| PR
|
||||
|
||||
DEV[Developer terminal] -->|python scripts/ci/i18n_cjk_guard.py| RUN
|
||||
DEV -->|--update-baseline| RUN
|
||||
RUN -.->|writes| BL
|
||||
```
|
||||
|
||||
**Architecture Integration**:
|
||||
|
||||
- **Selected pattern**: single-purpose script + thin workflow.
|
||||
Matches the project's existing `scripts/<purpose>.py` convention.
|
||||
- **Domain boundaries**: the guard is a pure verification tool with no
|
||||
side effects on production code. Its only writeable surface is the
|
||||
baseline file, and only when explicitly invoked with
|
||||
`--update-baseline`.
|
||||
- **Existing patterns preserved**: stdlib-only Python tooling
|
||||
(precedent: `scripts/check_i18n_logs.py`); single-file workflows in
|
||||
`.github/workflows/`.
|
||||
- **New components rationale**: a new file rather than an extension of
|
||||
an existing script — the existing script is scoped to a fixed
|
||||
module list and is not a regression gate.
|
||||
- **Steering compliance**: respects layer-based structure (script
|
||||
lives at repo root in `scripts/ci/`, not under `backend/` or
|
||||
`frontend/`), no new heavy dependencies, no `os.getenv` calls
|
||||
outside `backend/app/config.py`.
|
||||
|
||||
### Technology Stack
|
||||
|
||||
| Layer | Choice / Version | Role in Feature | Notes |
|
||||
|-------|------------------|-----------------|-------|
|
||||
| Frontend / CLI | Python 3.11 stdlib (`argparse`, `json`, `re`, `subprocess`, `pathlib`, `sys`) | Guard CLI | Stdlib only — Req 5.5 |
|
||||
| Backend / Services | n/a | — | Guard does not touch backend services |
|
||||
| Data / Storage | Plain-text baseline file under `.kiro/specs/` | Per-path count store | One line per path, `<path>\t<count>` |
|
||||
| Messaging / Events | n/a | — | — |
|
||||
| Infrastructure / Runtime | GitHub Actions `ubuntu-latest`, `actions/checkout@v4`, `actions/setup-python@v5` | PR-time runner | `fetch-depth: 1` is sufficient |
|
||||
|
||||
## File Structure Plan
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
scripts/
|
||||
└── ci/
|
||||
└── i18n_cjk_guard.py # Guard CLI (new)
|
||||
|
||||
.github/
|
||||
└── workflows/
|
||||
└── i18n-cjk-guard.yml # PR-time workflow (new)
|
||||
|
||||
.kiro/specs/i18n-ci-guard/
|
||||
├── spec.json # (existing, updated)
|
||||
├── requirements.md # (existing)
|
||||
├── gap-analysis.md # (existing)
|
||||
├── research.md # (existing)
|
||||
├── design.md # (this file)
|
||||
├── tasks.md # (created in next phase)
|
||||
└── baseline.txt # Per-path CJK match counts (new)
|
||||
```
|
||||
|
||||
### Modified Files
|
||||
|
||||
- `.kiro/specs/i18n-ci-guard/spec.json` — phase / approval fields
|
||||
updated by Kiro flow only.
|
||||
- No production source files are modified by this spec.
|
||||
|
||||
## System Flows
|
||||
|
||||
### Guard execution (default mode)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CI as GitHub Actions
|
||||
participant Script as i18n_cjk_guard.py
|
||||
participant Repo as Working tree
|
||||
participant BL as baseline.txt
|
||||
|
||||
CI->>Script: python scripts/ci/i18n_cjk_guard.py
|
||||
Script->>Repo: read locales/en.json
|
||||
Script->>Script: scan for CJK chars
|
||||
alt en.json has CJK
|
||||
Script-->>CI: exit 1 + per-key findings
|
||||
else en.json clean
|
||||
Script->>Repo: git grep -nIP backend/app/
|
||||
Script->>Repo: git grep -nIP frontend/src/
|
||||
Script->>BL: read baseline counts
|
||||
alt any current count > baseline
|
||||
Script-->>CI: exit 1 + per-path delta + refresh hint
|
||||
else within baseline
|
||||
Script-->>CI: exit 0 + summary
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
### Baseline refresh
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Dev as Developer
|
||||
participant Script as i18n_cjk_guard.py
|
||||
participant Repo as Working tree
|
||||
participant BL as baseline.txt
|
||||
|
||||
Dev->>Script: python scripts/ci/i18n_cjk_guard.py --update-baseline
|
||||
Script->>Repo: git grep -nIP backend/app/
|
||||
Script->>Repo: git grep -nIP frontend/src/
|
||||
Script->>BL: write per-path counts (sorted)
|
||||
Script-->>Dev: exit 0 + new counts
|
||||
```
|
||||
|
||||
The two checks run in fixed order: en.json first (cheap, decisive),
|
||||
then per-path counts. Both run under all conditions; the script does
|
||||
not short-circuit after the first failure so the contributor sees the
|
||||
complete diagnostic in one CI log.
|
||||
|
||||
## Requirements Traceability
|
||||
|
||||
| Requirement | Summary | Components | Interfaces | Flows |
|
||||
|-------------|---------|------------|------------|-------|
|
||||
| 1.1 | Scan en.json for CJK | `i18n_cjk_guard.py` | CLI default mode | Guard execution |
|
||||
| 1.2 | Fail with key:line per offender | `i18n_cjk_guard.py` | CLI stderr output | Guard execution |
|
||||
| 1.3 | Report clean state | `i18n_cjk_guard.py` | CLI stdout summary | Guard execution |
|
||||
| 1.4 | Hard error if file missing | `i18n_cjk_guard.py` | CLI stderr + exit 1 | Guard execution |
|
||||
| 2.1 | Count CJK matches per scoped path | `i18n_cjk_guard.py` | `git grep -nIP` invocation | Guard execution |
|
||||
| 2.2 | Read baseline counts | `i18n_cjk_guard.py`, `baseline.txt` | File read | Guard execution |
|
||||
| 2.3 | Fail on regression | `i18n_cjk_guard.py` | Exit 1 | Guard execution |
|
||||
| 2.4 | Pass when within baseline | `i18n_cjk_guard.py` | Exit 0 | Guard execution |
|
||||
| 2.5 | Skip binary files | `git grep -I` | — | Guard execution |
|
||||
| 2.6 | Tracked-only scope | `git grep` default | — | Guard execution |
|
||||
| 3.1 | Per-key locale failure detail | `i18n_cjk_guard.py` | CLI stderr lines | Guard execution |
|
||||
| 3.2 | Per-path regression detail | `i18n_cjk_guard.py` | CLI stderr lines | Guard execution |
|
||||
| 3.3 | Print refresh command | `i18n_cjk_guard.py` | CLI stderr footer | Guard execution |
|
||||
| 3.4 | Success summary lines | `i18n_cjk_guard.py` | CLI stdout | Guard execution |
|
||||
| 4.1 | Baseline under spec dir | `baseline.txt` | File path | — |
|
||||
| 4.2 | Diff-friendly text format | `baseline.txt` | File format | — |
|
||||
| 4.3 | Refresh via flag | `i18n_cjk_guard.py` | `--update-baseline` | Baseline refresh |
|
||||
| 4.4 | No implicit baseline writes | `i18n_cjk_guard.py` | CLI default mode | Guard execution |
|
||||
| 4.5 | Hard error if baseline missing | `i18n_cjk_guard.py` | Exit 1 + message | Guard execution |
|
||||
| 5.1 | PR-only trigger to main | `i18n-cjk-guard.yml` | `on.pull_request.branches` | — |
|
||||
| 5.2 | Checkout PR head | `i18n-cjk-guard.yml` | `actions/checkout@v4` | — |
|
||||
| 5.3 | Surface output on failure | `i18n-cjk-guard.yml` | Default GH log | — |
|
||||
| 5.4 | Pass on exit 0 | `i18n-cjk-guard.yml` | Default | — |
|
||||
| 5.5 | Stdlib-only, no third-party | `i18n_cjk_guard.py`, `i18n-cjk-guard.yml` | — | — |
|
||||
| 5.6 | ≤60s runtime | `i18n-cjk-guard.yml` | `timeout-minutes: 1` | — |
|
||||
| 6.1 | Same result locally | `i18n_cjk_guard.py` | CLI | — |
|
||||
| 6.2 | Single stable entry point | `scripts/ci/i18n_cjk_guard.py` | Path | — |
|
||||
| 6.3 | No env vars / secrets | `i18n_cjk_guard.py` | CLI | — |
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies | Contracts |
|
||||
|-----------|--------------|--------|--------------|------------------|-----------|
|
||||
| `i18n_cjk_guard.py` | CI script | Two-check guard CLI | 1.1–6.3 | `git`, Python stdlib | Service (CLI) |
|
||||
| `i18n-cjk-guard.yml` | CI workflow | Run guard on every PR to main | 5.1–5.6 | `actions/checkout@v4`, `actions/setup-python@v5` | Batch / Job |
|
||||
| `baseline.txt` | Data | Per-path baseline counts | 4.1, 4.2, 2.2 | — | State (file) |
|
||||
|
||||
### CI Script
|
||||
|
||||
#### `i18n_cjk_guard.py`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Run two CJK-regression checks; optionally refresh the baseline |
|
||||
| Requirements | 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.1, 3.2, 3.3, 3.4, 4.1, 4.3, 4.4, 4.5, 5.5, 6.1, 6.2, 6.3 |
|
||||
| Owner / Reviewers | i18n maintainers |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Owns the canonical guard semantics: which paths are scoped, which
|
||||
regex is canonical, what counts as a regression.
|
||||
- Runs in pure Python 3.11 stdlib + a single `git` subprocess per
|
||||
scoped path.
|
||||
- Never modifies any file other than the baseline file, and only when
|
||||
invoked with `--update-baseline`.
|
||||
- Always runs both checks (does not short-circuit), so a single CI log
|
||||
shows every failure mode at once.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `i18n-cjk-guard.yml` workflow; developers running locally.
|
||||
- Outbound: `git` subprocess (`git grep`, `git rev-parse`).
|
||||
- External: none.
|
||||
|
||||
**Contracts**: Service [x] / API [ ] / Event [ ] / Batch [ ] / State [x]
|
||||
|
||||
##### Service Interface (CLI)
|
||||
|
||||
```text
|
||||
i18n_cjk_guard.py [--update-baseline] [--baseline PATH] [--repo-root PATH]
|
||||
```
|
||||
|
||||
Type-annotated module signature (Python type hints, public functions
|
||||
only):
|
||||
|
||||
```python
|
||||
def main(argv: list[str]) -> int: ...
|
||||
|
||||
def run_check(repo_root: pathlib.Path, baseline_path: pathlib.Path) -> int:
|
||||
"""Run both checks; return 0 on success, 1 on any failure."""
|
||||
|
||||
def update_baseline(repo_root: pathlib.Path, baseline_path: pathlib.Path) -> int:
|
||||
"""Refresh the baseline file with current per-path counts; return 0."""
|
||||
|
||||
def scan_locale_cjk(en_json_path: pathlib.Path) -> list[LocaleFinding]:
|
||||
"""Return a list of (key, line_number, snippet) tuples for every
|
||||
CJK occurrence in locales/en.json. Empty list when clean."""
|
||||
|
||||
def count_path_cjk(repo_root: pathlib.Path, scoped_path: str) -> int:
|
||||
"""Return the number of CJK match lines under scoped_path,
|
||||
using `git grep -nIP '[\\x{4e00}-\\x{9fff}]' -- <scoped_path>`."""
|
||||
|
||||
def read_baseline(baseline_path: pathlib.Path) -> dict[str, int]:
|
||||
"""Parse the baseline file. Each non-empty, non-comment line is
|
||||
'<path>\\t<count>'. Raise BaselineError on any malformed input
|
||||
or missing file."""
|
||||
|
||||
def write_baseline(baseline_path: pathlib.Path, counts: dict[str, int]) -> None:
|
||||
"""Atomically overwrite the baseline file with sorted entries
|
||||
and a single trailing newline."""
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
```python
|
||||
LocaleFinding = tuple[str, int, str] # (dotted_key, line_number, snippet)
|
||||
SCOPED_PATHS: tuple[str, ...] = ("backend/app", "frontend/src")
|
||||
EN_JSON_REL_PATH: str = "locales/en.json"
|
||||
CJK_PATTERN: str = "[\\x{4e00}-\\x{9fff}]" # passed to git grep -P
|
||||
CJK_RE: re.Pattern[str] = re.compile(r"[一-鿿]")
|
||||
SNIPPET_MAX_LEN: int = 80
|
||||
```
|
||||
|
||||
- **Preconditions**: invoked with CWD at the repo root or
|
||||
`--repo-root` set; `git` is on `$PATH`; the working tree is the
|
||||
intended scan target.
|
||||
- **Postconditions** (default mode): exit 0 iff both checks pass;
|
||||
exit 1 otherwise. Stdout receives the success summary; stderr
|
||||
receives findings on failure. The baseline file is unchanged.
|
||||
- **Postconditions** (`--update-baseline`): the baseline file is
|
||||
rewritten to current per-path counts and exit 0 is returned.
|
||||
- **Invariants**: regex range, scoped paths, and baseline file path
|
||||
are constants — no env-var override.
|
||||
|
||||
##### State Management
|
||||
|
||||
- **State model**: a dict `{<scoped_path>: <count>}` parsed from
|
||||
the baseline file.
|
||||
- **Persistence**: plain-text file at
|
||||
`.kiro/specs/i18n-ci-guard/baseline.txt`. Atomic write via
|
||||
`tmp + os.replace`.
|
||||
- **Concurrency**: single-writer (developer running
|
||||
`--update-baseline`); CI workers only read.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Output format mirrors `scripts/check_i18n_logs.py`:
|
||||
`<file>:<line>: <reason>: <snippet>` on stderr, summary on stdout,
|
||||
trailing `OK` or `N issues`.
|
||||
- The exact refresh command printed on regression failure is:
|
||||
`python scripts/ci/i18n_cjk_guard.py --update-baseline`.
|
||||
- `count_path_cjk` invokes `git grep` via `subprocess.run` with
|
||||
`check=False`; `git grep` exits 1 when there are zero matches, so
|
||||
the function treats exit codes 0 and 1 as success and any other
|
||||
code as a hard error.
|
||||
- Localised key extraction for `en.json` walks the parsed JSON dict;
|
||||
line numbers are obtained by re-reading the file as text and
|
||||
matching the value's first textual occurrence.
|
||||
- Risks: see `research.md` § Risks & Mitigations.
|
||||
|
||||
### CI Workflow
|
||||
|
||||
#### `i18n-cjk-guard.yml`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Run the guard on every PR to `main` |
|
||||
| Requirements | 5.1, 5.2, 5.3, 5.4, 5.5, 5.6 |
|
||||
| Owner / Reviewers | i18n maintainers |
|
||||
|
||||
**Contracts**: Batch / Job [x]
|
||||
|
||||
##### Batch / Job Contract
|
||||
|
||||
- **Trigger**: `on: pull_request: branches: [main]`.
|
||||
- **Input / validation**: PR head ref checkout via
|
||||
`actions/checkout@v4` with `fetch-depth: 1`. Python set up via
|
||||
`actions/setup-python@v5` with `python-version: '3.11'`.
|
||||
- **Output / destination**: pass/fail status surfaced as a GitHub
|
||||
Actions check on the PR. Script stdout/stderr appears in the
|
||||
workflow log.
|
||||
- **Idempotency & recovery**: re-running the workflow re-evaluates the
|
||||
same working tree; no persistent side effects on the runner.
|
||||
|
||||
##### Workflow shape (sketch)
|
||||
|
||||
```yaml
|
||||
name: i18n CJK Guard
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main]
|
||||
jobs:
|
||||
guard:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 1
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
- run: python scripts/ci/i18n_cjk_guard.py
|
||||
```
|
||||
|
||||
### Baseline Data File
|
||||
|
||||
#### `baseline.txt`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Persist the per-path CJK match-count baseline |
|
||||
| Requirements | 2.2, 4.1, 4.2 |
|
||||
|
||||
**Contracts**: State [x]
|
||||
|
||||
##### Format
|
||||
|
||||
```text
|
||||
# Per-path CJK baseline for the i18n CI guard.
|
||||
# Format: <path>\t<count>. Sorted lexicographically.
|
||||
# Refresh via: python scripts/ci/i18n_cjk_guard.py --update-baseline
|
||||
backend/app <int>
|
||||
frontend/src <int>
|
||||
```
|
||||
|
||||
- One header block of `#`-prefixed comments (parser ignores).
|
||||
- Blank lines ignored.
|
||||
- Lines must match `^(?P<path>[^\t\n]+)\t(?P<count>\d+)$`.
|
||||
- Trailing newline mandatory.
|
||||
|
||||
## Data Models
|
||||
|
||||
### Domain Model
|
||||
|
||||
- `LocaleFinding` — value object
|
||||
`(dotted_key: str, line_number: int, snippet: str)`.
|
||||
- `PathCount` — pair `(scoped_path: str, count: int)`. The full
|
||||
baseline is a `dict[str, int]` keyed by scoped path.
|
||||
|
||||
Invariants:
|
||||
|
||||
- `count` is a non-negative integer.
|
||||
- `scoped_path` is one of `SCOPED_PATHS`.
|
||||
- `LocaleFinding.snippet` is at most `SNIPPET_MAX_LEN` characters,
|
||||
truncated with an ellipsis when needed.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Strategy
|
||||
|
||||
- All non-zero exits are accompanied by a stderr message identifying
|
||||
the failing check, the offending file or path, and (for regressions)
|
||||
the refresh command. The script never raises uncaught exceptions
|
||||
past `main()` in normal flow; unexpected I/O errors propagate as
|
||||
`OSError` with a clear traceback so CI logs surface them clearly.
|
||||
|
||||
### Error Categories and Responses
|
||||
|
||||
- **Locale failure** (Req 1.2): one stderr line per offending key
|
||||
(`locales/en.json:<line>: cjk-in-en: <key> = <snippet>`), then a
|
||||
trailing `N issues` summary.
|
||||
- **Regression failure** (Req 3.2): one stderr line per regressed
|
||||
path (`<path>: cjk-regression: baseline=<b> current=<c> delta=+<d>`)
|
||||
followed by a one-line refresh hint:
|
||||
`# refresh via: python scripts/ci/i18n_cjk_guard.py --update-baseline`.
|
||||
- **Missing en.json** (Req 1.4): stderr `locales/en.json: missing
|
||||
catalogue file`, exit 1.
|
||||
- **Missing or malformed baseline** (Req 4.5): stderr
|
||||
`<baseline-path>: missing or malformed; refresh via …`, exit 1.
|
||||
- **`git grep` unavailable / non-PCRE**: stderr
|
||||
`git grep failed: <stderr>`, exit 1.
|
||||
|
||||
### Monitoring
|
||||
|
||||
- The guard is a single short-lived script. All observability is
|
||||
delegated to GitHub Actions logs (stdout/stderr, run duration).
|
||||
No external telemetry.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests (Python)
|
||||
|
||||
Place tests under `scripts/ci/tests/test_i18n_cjk_guard.py` (or invoke
|
||||
the script directly via subprocess in a tmp git repo). The project's
|
||||
test runner is `pytest` (already used by `backend/`), but the new
|
||||
tests must be runnable with `python -m pytest` from the repo root
|
||||
without backend dependencies. Tests are scoped to:
|
||||
|
||||
1. `scan_locale_cjk` — clean catalogue returns empty list; planted CJK
|
||||
value returns a single `LocaleFinding` with the correct key and
|
||||
line number.
|
||||
2. `count_path_cjk` — given a tmp git repo with N planted CJK lines,
|
||||
returns N; binary file matches are excluded; untracked file
|
||||
matches are excluded.
|
||||
3. `read_baseline` / `write_baseline` round-trip — write counts,
|
||||
re-read, equal.
|
||||
4. `read_baseline` malformed input — non-tab line → `BaselineError`.
|
||||
5. `run_check` end-to-end — passing baseline → exit 0; regressed
|
||||
baseline → exit 1 and stderr contains the refresh command.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
1. Workflow shape — `actionlint` (optional, if installed locally) on
|
||||
`i18n-cjk-guard.yml`. At minimum, `python -c "import yaml;
|
||||
yaml.safe_load(open('.github/workflows/i18n-cjk-guard.yml'))"` for
|
||||
YAML validity.
|
||||
2. Local end-to-end — run
|
||||
`python scripts/ci/i18n_cjk_guard.py` from the repo root with the
|
||||
committed baseline; expect exit 0 on a clean checkout of `main`.
|
||||
3. Refresh end-to-end — run with `--update-baseline`; verify
|
||||
baseline file is rewritten and a second default run is exit 0.
|
||||
|
||||
### Performance / Load
|
||||
|
||||
- Single-pass `git grep` over the scoped paths runs in <2 s on the
|
||||
current repo. The workflow's `timeout-minutes: 1` is a hard ceiling
|
||||
per Req 5.6.
|
||||
|
||||
## Optional Sections
|
||||
|
||||
### Security Considerations
|
||||
|
||||
- The guard reads only tracked text files; no secrets are accessed.
|
||||
- The workflow uses `GITHUB_TOKEN` only implicitly via
|
||||
`actions/checkout`; no additional permissions are requested
|
||||
(`permissions:` block omitted relies on the repo default of
|
||||
`contents: read`, which is sufficient).
|
||||
|
|
@ -0,0 +1,169 @@
|
|||
# Gap Analysis — i18n-ci-guard
|
||||
|
||||
Comparison of the approved requirements against the current MiroFish
|
||||
codebase, focused on what already exists, what is missing, and what
|
||||
options the design phase should choose between.
|
||||
|
||||
## 1. Current State Investigation
|
||||
|
||||
### Domain assets already in the repo
|
||||
|
||||
- **`scripts/check_i18n_logs.py`** — Python-stdlib-only, exit-code-based
|
||||
i18n verification script. Uses the same canonical CJK regex
|
||||
`[一-鿿]` (`U+4E00..U+9FFF`) the new guard needs, prints findings as
|
||||
`<file>:<line>: <reason>: <snippet>`, and was written for ticket #6.
|
||||
Strong precedent for the new guard's CLI surface and output format.
|
||||
- **`scripts/_apply_translations.py`, `scripts/_codemod_i18n.py`,
|
||||
`scripts/_merge_locale_keys.py`** — i18n tooling sibling scripts.
|
||||
Convention is to keep auxiliary i18n scripts under `scripts/` at the
|
||||
repo root.
|
||||
- **`.github/workflows/docker-image.yml`** — only existing GH Actions
|
||||
workflow; triggers on tag pushes and `workflow_dispatch`. No PR-time
|
||||
workflow exists yet, so the new guard introduces the project's first
|
||||
PR-blocking CI check.
|
||||
- **PR #27 / branch `chore/i18n-10-e2e-english-verification`** — defines
|
||||
the audit methodology referenced by the ticket. Its `audit_cjk.sh`
|
||||
uses `git grep -nIP '[\x{4e00}-\x{9fff}]' -- backend/app frontend/src
|
||||
locales/en.json` — the canonical scoped scan command. PR #27 is open;
|
||||
the new guard must work with or without it merged.
|
||||
- **`.kiro/specs/<feature>/`** — established home for spec artefacts.
|
||||
`i18n-externalize-backend-logs/` is the closest precedent for an
|
||||
i18n-flavoured spec.
|
||||
- **`locales/en.json`, `locales/zh.json`, `locales/languages.json`** —
|
||||
shared i18n source consumed by both runtimes.
|
||||
|
||||
### Conventions extracted
|
||||
|
||||
- Auxiliary scripts: `scripts/<purpose>.py`, Python ≥3.11 stdlib only,
|
||||
shebang `#!/usr/bin/env python3`, double-quoted strings, snake_case,
|
||||
Google-style docstrings on the module and public functions.
|
||||
- Output format: `<file>:<line>: <reason>: <snippet>`, summary line
|
||||
`OK` or `N issues`, exit `0`/`1`.
|
||||
- Reuse the canonical regex `[一-鿿]` rather than re-deriving range
|
||||
literals.
|
||||
- 4-space indent, ≤120 cols, no trailing whitespace, single trailing
|
||||
newline (`.claude/rules/dev-guidelines.md`).
|
||||
|
||||
### Integration surfaces
|
||||
|
||||
- **CI**: GitHub Actions, `.github/workflows/`. `ubuntu-latest` runner,
|
||||
Python 3.11+ via `actions/setup-python@v5` (use the same version
|
||||
pin already present in the docker-image workflow ecosystem if any).
|
||||
- **Repo layout boundaries** scoped by the audit: `backend/app/`,
|
||||
`frontend/src/`, `locales/en.json` — all live at repo root or two
|
||||
levels deep.
|
||||
- **Git working tree**: the guard relies on `git grep -I` for tracked,
|
||||
text-only matches; this binds the guard to a runner that has `git`
|
||||
available (true on `ubuntu-latest` and on developer machines).
|
||||
|
||||
## 2. Requirement-to-Asset Map
|
||||
|
||||
| Req | Need | Existing asset | Gap |
|
||||
| --- | --------------------------------- | ----------------------------------------------------------------------------------------------- | ----------- |
|
||||
| 1 | CJK scan of `locales/en.json` | `scripts/check_i18n_logs.py` already loads `locales/*.json` and runs the canonical regex. | Missing — new guard must scan en.json specifically and emit `key:line` per offender. |
|
||||
| 2 | CJK count under `backend/app/` and `frontend/src/` against baseline | Audit `audit_cjk.sh` (PR #27) demonstrates `git grep -nIP` is the canonical scan; no baseline file exists yet on main. | Missing — no per-path counter, no baseline file. |
|
||||
| 3 | Actionable failure messaging | `check_i18n_logs.py` output format reusable. | Missing — need refresh-baseline command in failure text. |
|
||||
| 4 | Baseline file lifecycle | None. | Missing — file format and refresh subcommand to design. |
|
||||
| 5 | GH Actions PR integration | `.github/workflows/` directory exists; one tag-only workflow. | Missing — new `pull_request` workflow. |
|
||||
| 6 | Local reproducibility | Existing scripts run locally with stdlib; same pattern reusable. | None — covered by following the existing pattern. |
|
||||
|
||||
## 3. Implementation Approach Options
|
||||
|
||||
### Option A — Extend `scripts/check_i18n_logs.py`
|
||||
|
||||
Add a new `--cjk-guard` mode (catalogue scan + per-path baseline diff)
|
||||
to the existing script, then call it from the new workflow.
|
||||
|
||||
- ✅ One file to maintain; reuses the regex constant and CLI.
|
||||
- ❌ The existing script is tightly scoped to the in-scope backend
|
||||
modules and the parity check. Mixing a PR-gating regression check into
|
||||
it dilutes its intent and grows it past the SRP line that the
|
||||
surrounding scripts respect.
|
||||
- ❌ The existing script targets a fixed list of backend modules; the
|
||||
new guard scans whole subtrees. The two scopes don't fit one CLI.
|
||||
|
||||
### Option B — New, focused script `scripts/ci/i18n_cjk_guard.py` + new workflow (recommended)
|
||||
|
||||
A new directory `scripts/ci/` holds CI-only scripts; the guard is a
|
||||
single file that performs both checks and supports a `--refresh-baseline`
|
||||
flag. New workflow `.github/workflows/i18n-cjk-guard.yml` runs it on
|
||||
every PR to `main`.
|
||||
|
||||
- ✅ Clean separation: production-i18n script (`check_i18n_logs.py`)
|
||||
and CI-gating script (`i18n_cjk_guard.py`) live side by side without
|
||||
overlapping responsibilities.
|
||||
- ✅ Mirrors the established convention of one script per
|
||||
responsibility under `scripts/`.
|
||||
- ✅ The baseline file lives under the spec dir
|
||||
(`.kiro/specs/i18n-ci-guard/baseline.txt`), matching the ticket's
|
||||
"baseline must be committed and reviewable" requirement.
|
||||
- ❌ One more file in the repo, but the file is small (~150 LoC).
|
||||
|
||||
### Option C — Hybrid: shared `cjk_scan.py` helper + thin guard script
|
||||
|
||||
Factor the regex + git-grep logic into a tiny shared helper consumed by
|
||||
both `check_i18n_logs.py` and the new guard.
|
||||
|
||||
- ✅ DRY for the regex constant.
|
||||
- ❌ Premature abstraction: today the only shared element is one
|
||||
one-line regex. The two scripts have different scopes, output
|
||||
formats, and consumers. Pulling a helper out now satisfies
|
||||
consistency without paying for itself; defer until a third caller
|
||||
appears.
|
||||
|
||||
### Recommendation
|
||||
|
||||
**Option B**. It matches the project's established "one focused script
|
||||
per responsibility" convention, isolates the new CI surface from
|
||||
existing i18n scripts, and keeps the baseline file collocated with
|
||||
spec metadata where reviewers expect to find it.
|
||||
|
||||
## 4. Research Items for Design Phase
|
||||
|
||||
- **Baseline file format**: prefer a stable, line-oriented text format
|
||||
over JSON to minimize diff churn (e.g., `path<TAB>count` per line,
|
||||
trailing newline). Confirm in design.
|
||||
- **`git grep` invocation portability**: `git grep -nIP` works on all
|
||||
modern git builds (≥2.4 ships PCRE2). `ubuntu-latest` ships ≥2.40.
|
||||
No portability concern; record the assumption explicitly.
|
||||
- **`fetch-depth`** for the `actions/checkout@v4` step: `git grep`
|
||||
scans the working tree, not history, so a shallow clone (`fetch-depth:
|
||||
1`) is sufficient.
|
||||
- **Workflow timeout budget**: capture the empirical runtime of the
|
||||
full scan locally (already measured: a single `git grep` over the
|
||||
scoped paths runs in <2 seconds with ~3.6k matches). The 60-second
|
||||
ceiling in Req 5 is comfortable.
|
||||
- **Failure-message refresh command** wording: the design should pin
|
||||
the exact command shown to contributors so it stays one stable
|
||||
string developers can copy.
|
||||
- **Initial baseline values**: with `git grep -nIP '[\x{4e00}-\x{9fff}]'`
|
||||
on the current branch — `backend/app` = 2707, `frontend/src` = 902,
|
||||
`locales/en.json` = 0. The committed baseline must be regenerated
|
||||
against `main` at implementation time so it reflects the merge target.
|
||||
|
||||
## 5. Effort & Risk
|
||||
|
||||
- **Effort**: **S** (1–3 days). Small, self-contained additions
|
||||
(one Python script, one workflow file, one baseline file, plus the
|
||||
spec). All patterns already exist in the repo.
|
||||
- **Risk**: **Low**. No production-source changes, no new dependencies,
|
||||
no architectural shifts. The only failure mode is a noisy guard
|
||||
blocking unrelated PRs — mitigated by the per-path baseline ratchet.
|
||||
|
||||
## 6. Recommendations for Design Phase
|
||||
|
||||
- Adopt **Option B** (new focused script + new workflow + baseline file
|
||||
under spec dir).
|
||||
- Lock in the canonical regex `[一-鿿]` and the canonical scan command
|
||||
`git grep -nIP '[\x{4e00}-\x{9fff}]' -- <path>` to keep this guard
|
||||
bytewise-aligned with the audit pipeline.
|
||||
- Use a line-oriented baseline format keyed by scoped path; explicit
|
||||
`--refresh-baseline` (or equivalent) subcommand updates it; no
|
||||
implicit overwrite.
|
||||
- Output: machine-friendly findings on stderr, summary on stdout,
|
||||
exit `0`/`1`.
|
||||
- The workflow should run only on `pull_request` to `main` (Req 5.1)
|
||||
with `fetch-depth: 1` and `actions/setup-python@v5`. No third-party
|
||||
packages.
|
||||
- Baseline counts must be recomputed against `main` before the PR
|
||||
ships; do not commit baselines from a feature branch's working tree.
|
||||
|
|
@ -0,0 +1,189 @@
|
|||
# Requirements Document
|
||||
|
||||
## Project Description (Input)
|
||||
Add a permanent CI guard that runs an i18n CJK audit on every pull request.
|
||||
|
||||
Linked GitHub issue: #26 (.ticket/26.md).
|
||||
|
||||
The guard must fail a PR build when:
|
||||
1. locales/en.json contains any CJK character (range U+4E00..U+9FFF), or
|
||||
2. The total count of CJK matches across backend/app/ and frontend/src/ regresses (i.e. exceeds) a committed baseline value.
|
||||
|
||||
## Introduction
|
||||
|
||||
The i18n initiative has driven the project toward English-by-default UI, logs,
|
||||
prompts, and documentation. Manual audits (see PR #27, the
|
||||
`i18n-e2e-english-verification` spec) have repeatedly surfaced regressions
|
||||
where Chinese strings re-enter the codebase. This spec installs a permanent,
|
||||
self-contained CI guard that runs on every pull request and fails the build
|
||||
when (a) `locales/en.json` is no longer CJK-clean, or (b) the total CJK match
|
||||
count under `backend/app/` and `frontend/src/` regresses against a committed
|
||||
baseline.
|
||||
|
||||
The guard is intentionally minimal: it captures the two highest-signal checks
|
||||
from the larger audit pipeline so it can run on every PR with a sub-minute
|
||||
budget and without depending on the (currently unmerged) verification spec.
|
||||
The committed baseline lets the project ratchet down gaps over time without
|
||||
blocking unrelated PRs on pre-existing CJK content.
|
||||
|
||||
## Boundary Context
|
||||
|
||||
- **In scope**:
|
||||
- A locally runnable Python script that performs both guard checks on the
|
||||
current working tree.
|
||||
- A baseline file committed under the spec directory recording the
|
||||
accepted CJK match counts per scoped path.
|
||||
- A GitHub Actions workflow that runs the script on every pull request
|
||||
targeting `main` and fails the build when either check fails.
|
||||
- A clear, actionable failure message (which path regressed, baseline
|
||||
value, current value, command to update the baseline).
|
||||
- **Out of scope**:
|
||||
- The full classification pipeline (`classify.py`, `render_report.py`,
|
||||
`post_comment.sh`) from the unmerged `i18n-e2e-english-verification`
|
||||
spec — those scripts perform deeper audit work and are not required
|
||||
for the PR-time guard.
|
||||
- Auto-updating the baseline on `main` (the baseline is a normal
|
||||
reviewable file).
|
||||
- Translation work itself; this spec only enforces a regression gate.
|
||||
- Any change to production source under `backend/app/`, `frontend/src/`,
|
||||
or `locales/` apart from translations needed to satisfy the guard
|
||||
against its own initial baseline.
|
||||
- **Adjacent expectations**:
|
||||
- PR #27 (`chore/i18n-10-e2e-english-verification`) provides the
|
||||
methodology referenced here. This spec must remain functional whether
|
||||
PR #27 has been merged or not.
|
||||
- The guard reuses the canonical CJK regex range
|
||||
`[一-鿿]` already established by that audit.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Locale-catalogue CJK cleanliness check
|
||||
|
||||
**Objective:** As a maintainer of the English locale catalogue, I want every
|
||||
PR to fail when `locales/en.json` reintroduces any CJK character, so that the
|
||||
English catalogue stays CJK-free.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. When the guard script is run from the repository root, the i18n CI Guard
|
||||
shall scan the contents of `locales/en.json` for any character in the
|
||||
range `U+4E00..U+9FFF`.
|
||||
2. If `locales/en.json` contains at least one such character, the i18n CI
|
||||
Guard shall exit with a non-zero status and report each offending
|
||||
`key:line` pair on standard output.
|
||||
3. While `locales/en.json` contains zero such characters, the i18n CI Guard
|
||||
shall report the catalogue as CJK-clean.
|
||||
4. If `locales/en.json` is missing or unreadable, the i18n CI Guard shall
|
||||
exit with a non-zero status and emit an explicit error message naming
|
||||
the missing file.
|
||||
|
||||
### Requirement 2: Backend/frontend CJK regression check against committed baseline
|
||||
|
||||
**Objective:** As a maintainer of English support across the codebase, I
|
||||
want every PR to fail when the total CJK match count under `backend/app/`
|
||||
or `frontend/src/` exceeds a committed baseline, so that the codebase
|
||||
ratchets monotonically toward English-only without blocking PRs on
|
||||
pre-existing CJK content.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. When the guard script is run, the i18n CI Guard shall count the total
|
||||
number of CJK matches (range `U+4E00..U+9FFF`, line-level, text files
|
||||
only) under each of the scoped paths `backend/app/` and `frontend/src/`.
|
||||
2. The i18n CI Guard shall read the baseline counts from a single
|
||||
committed baseline file under the spec directory.
|
||||
3. If the current count for any scoped path exceeds the baseline count for
|
||||
that path, the i18n CI Guard shall exit with a non-zero status.
|
||||
4. While the current count for every scoped path is less than or equal to
|
||||
the baseline, the i18n CI Guard shall exit with status zero for this
|
||||
check.
|
||||
5. The i18n CI Guard shall ignore matches inside binary files
|
||||
(image, font, archive, lockfile, or other non-text formats) by relying
|
||||
on `git grep -I` semantics.
|
||||
6. The i18n CI Guard shall scope its scan to tracked files only (matches
|
||||
in untracked or ignored files shall not contribute to the count).
|
||||
|
||||
### Requirement 3: Actionable failure messaging
|
||||
|
||||
**Objective:** As a contributor whose PR was rejected by the guard, I want
|
||||
the failure message to tell me exactly what regressed and how to fix it,
|
||||
so that I can either translate the offending content or — when intentional —
|
||||
update the baseline through normal review.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. If the locale-catalogue check fails, the i18n CI Guard shall print, for
|
||||
each offending entry: the dotted catalogue key, the line number in
|
||||
`locales/en.json`, and a truncated snippet of the value.
|
||||
2. If the regression check fails, the i18n CI Guard shall print, for each
|
||||
regressed scoped path: the path name, the baseline count, the current
|
||||
count, and the delta.
|
||||
3. If the regression check fails, the i18n CI Guard shall print the exact
|
||||
shell command a contributor must run locally to refresh the baseline
|
||||
file so the PR can be re-reviewed against the new value.
|
||||
4. The i18n CI Guard shall print, on success, a one-line summary per check
|
||||
confirming the catalogue is CJK-clean and the per-path counts are at or
|
||||
below baseline.
|
||||
|
||||
### Requirement 4: Baseline file lifecycle
|
||||
|
||||
**Objective:** As a reviewer enforcing English support, I want the baseline
|
||||
to live in the repository as a small, human-readable file that only changes
|
||||
through code review, so that downward ratcheting is intentional and
|
||||
auditable.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The i18n CI Guard shall store the baseline as a single committed file
|
||||
under `.kiro/specs/i18n-ci-guard/`.
|
||||
2. The baseline file shall record one count per scoped path, in a stable,
|
||||
diff-friendly text format (no JSON line shuffling, no trailing
|
||||
whitespace).
|
||||
3. When the guard script is invoked with an explicit "refresh baseline"
|
||||
subcommand or flag, the i18n CI Guard shall overwrite the baseline file
|
||||
with the current per-path counts and exit with status zero.
|
||||
4. While no refresh flag is supplied, the i18n CI Guard shall never modify
|
||||
the baseline file.
|
||||
5. If the baseline file is missing at check time, the i18n CI Guard shall
|
||||
exit with a non-zero status and instruct the contributor to refresh it.
|
||||
|
||||
### Requirement 5: GitHub Actions PR integration
|
||||
|
||||
**Objective:** As a project maintainer, I want every pull request targeting
|
||||
`main` to be gated by the guard, so that no merge silently regresses the
|
||||
English-only state of the catalogue or codebase.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The i18n CI Guard workflow shall trigger on every `pull_request` event
|
||||
whose base ref is `main`.
|
||||
2. While the workflow runs, the i18n CI Guard shall check out the PR head
|
||||
commit with full history sufficient for `git grep` to scan tracked
|
||||
files.
|
||||
3. When the guard script exits with non-zero status, the workflow shall
|
||||
fail and surface the script's standard output and standard error in the
|
||||
GitHub Actions log.
|
||||
4. When the guard script exits with status zero, the workflow shall pass.
|
||||
5. The workflow shall use only Python from the standard
|
||||
`actions/setup-python` distribution and tools already available on the
|
||||
GitHub-hosted `ubuntu-latest` runner (`bash`, `git`); it shall not
|
||||
install third-party Python packages.
|
||||
6. The workflow shall complete within sixty seconds of wall-clock time on
|
||||
a clean `ubuntu-latest` runner.
|
||||
|
||||
### Requirement 6: Local reproducibility
|
||||
|
||||
**Objective:** As a developer preparing a PR, I want to run the same guard
|
||||
locally before pushing, so that I can catch regressions before CI does.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. When the guard script is invoked from a developer machine that has
|
||||
Python 3.11 or newer and `git` available, the i18n CI Guard shall
|
||||
produce the same pass/fail result and the same per-path counts that
|
||||
it would produce in CI for the same working tree.
|
||||
2. The i18n CI Guard shall expose a single, stable invocation entry point
|
||||
(a script under `scripts/ci/`) documented in the spec's design and
|
||||
README touchpoints.
|
||||
3. The i18n CI Guard shall require zero environment variables or secrets
|
||||
to run locally.
|
||||
|
|
@ -0,0 +1,175 @@
|
|||
# Research & Design Decisions — i18n-ci-guard
|
||||
|
||||
## Summary
|
||||
- **Feature**: `i18n-ci-guard`
|
||||
- **Discovery Scope**: Simple Addition (one Python script + one GH Actions
|
||||
workflow + one baseline file). Extension-flavoured because it builds on
|
||||
established `scripts/` conventions and the canonical CJK regex used by
|
||||
the larger audit pipeline.
|
||||
- **Key Findings**:
|
||||
- The canonical CJK match command `git grep -nIP '[\x{4e00}-\x{9fff}]'
|
||||
-- <path>` is already used by the unmerged audit pipeline (PR #27)
|
||||
and is portable on every git ≥2.4 (`ubuntu-latest` ships ≥2.40).
|
||||
- `scripts/check_i18n_logs.py` is a strong CLI/style precedent:
|
||||
Python-stdlib-only, exit `0`/`1`, output as `<file>:<line>:
|
||||
<reason>: <snippet>`, canonical regex `[一-鿿]`.
|
||||
- The repository has no existing `pull_request`-triggered GH Actions
|
||||
workflow; this guard introduces the first one. The only existing
|
||||
workflow (`.github/workflows/docker-image.yml`) runs on tag pushes
|
||||
only.
|
||||
- Current per-path counts on this branch:
|
||||
`backend/app=2707, frontend/src=902, locales/en.json=0`. These are
|
||||
sample counts; the committed baseline must be regenerated against
|
||||
`main` at implementation time.
|
||||
|
||||
## Research Log
|
||||
|
||||
### Canonical scan command
|
||||
- **Context**: Requirement 2 needs a stable per-path CJK count and
|
||||
Requirement 5.5 forbids third-party packages.
|
||||
- **Sources Consulted**:
|
||||
- `audit_cjk.sh` from PR #27 commit `3481408`.
|
||||
- `git grep` man page.
|
||||
- **Findings**:
|
||||
- `git grep -nIP '[\x{4e00}-\x{9fff}]' -- <path>` returns one match
|
||||
per matching line in tracked, text-only files. `-I` excludes binary
|
||||
files; `-P` enables PCRE2 so the `\x{...}` Unicode range works.
|
||||
- This matches the input format consumed by the existing audit
|
||||
classifier, so the guard's match counts are directly comparable
|
||||
across pipelines.
|
||||
- **Implications**:
|
||||
- The guard re-uses this exact command; no new dependencies.
|
||||
- Because `-I` skips binary files and tracked-only is the default,
|
||||
Requirements 2.5 and 2.6 are satisfied by the command itself
|
||||
rather than by additional script logic.
|
||||
|
||||
### Baseline file format
|
||||
- **Context**: Requirement 4 needs a diff-friendly committed baseline.
|
||||
- **Sources Consulted**:
|
||||
- Diff churn behaviour of JSON vs. line-oriented text in this repo's
|
||||
history (e.g. `locales/*.json` PR diffs frequently re-key, while
|
||||
plain-text `parity.txt` from PR #27 reads cleanly).
|
||||
- **Findings**:
|
||||
- Line-oriented `<path>\t<count>` files produce minimal diffs and
|
||||
require no JSON parser.
|
||||
- A two-line file (one per scoped path) is large enough to be
|
||||
self-explanatory and small enough to never line-shuffle.
|
||||
- **Implications**:
|
||||
- Use plain text, sorted by path, single trailing newline. Reject
|
||||
the file as malformed if the script cannot parse it (Req 4.5).
|
||||
|
||||
### Locale-catalogue scan path
|
||||
- **Context**: Requirement 1 wants `key:line` per CJK offender in
|
||||
`locales/en.json`.
|
||||
- **Sources Consulted**:
|
||||
- `scripts/check_i18n_logs.py` (`flatten_keys` reuse pattern).
|
||||
- `check_parity.py` from PR #27 (`flatten`, `[cjk-in-en]` block).
|
||||
- **Findings**:
|
||||
- Both precedents flatten the locale dict and run the canonical
|
||||
regex against each leaf string value. Line numbers are derivable
|
||||
by re-reading the file as text and matching the value's first
|
||||
occurrence (good enough for an actionable error message).
|
||||
- Empty-string values and non-string leaf values (booleans, null)
|
||||
are skipped.
|
||||
- **Implications**:
|
||||
- Implement a tiny flatten-then-scan helper inside the guard
|
||||
script; do not add a new shared utility module.
|
||||
|
||||
### GH Actions trigger and budget
|
||||
- **Context**: Requirements 5.1, 5.5, 5.6.
|
||||
- **Sources Consulted**:
|
||||
- GitHub-hosted runners reference (`ubuntu-latest`).
|
||||
- `actions/setup-python@v5` README.
|
||||
- **Findings**:
|
||||
- `ubuntu-latest` has Python 3.10+ pre-installed; `actions/setup-python@v5`
|
||||
pins to 3.11 in <5 s.
|
||||
- A single `git grep` over the scoped paths runs in <2 s on this
|
||||
repo (~3.6k matches). End-to-end the workflow comfortably fits
|
||||
inside the 60 s ceiling.
|
||||
- **Implications**:
|
||||
- Use `actions/checkout@v4` with `fetch-depth: 1`,
|
||||
`actions/setup-python@v5` with `python-version: '3.11'`, and run
|
||||
the script directly. No caching layer needed.
|
||||
|
||||
## Architecture Pattern Evaluation
|
||||
|
||||
| Option | Description | Strengths | Risks / Limitations | Notes |
|
||||
|--------|-------------|-----------|---------------------|-------|
|
||||
| A. Extend `check_i18n_logs.py` | Add `--cjk-guard` mode to existing script | Reuses one file | Conflates two scopes; existing script is module-scoped, guard is subtree-scoped | Rejected |
|
||||
| B. New `scripts/ci/i18n_cjk_guard.py` + new workflow | Single-purpose script + workflow + baseline file | Clean SRP; matches "one script per responsibility" precedent | One additional file | **Selected** |
|
||||
| C. Shared `cjk_scan.py` helper + thin guard | Factor regex/git-grep into helper | DRY for regex constant | Premature abstraction; only one shared symbol today | Rejected |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Decision: Single-purpose CI script + GH Actions workflow (Option B)
|
||||
- **Context**: Requirements 1–6 demand a small, self-contained guard.
|
||||
- **Alternatives Considered**: A (extend), C (shared helper).
|
||||
- **Selected Approach**: New script `scripts/ci/i18n_cjk_guard.py`,
|
||||
new workflow `.github/workflows/i18n-cjk-guard.yml`, baseline file
|
||||
`.kiro/specs/i18n-ci-guard/baseline.txt`.
|
||||
- **Rationale**: Matches the project's "one focused script per
|
||||
responsibility" convention; isolates a CI-blocking surface from the
|
||||
existing i18n developer scripts; keeps the baseline collocated with
|
||||
the spec for review traceability.
|
||||
- **Trade-offs**: One more file in `scripts/` vs. tighter cohesion.
|
||||
- **Follow-up**: When a third caller wants the canonical regex, factor
|
||||
it out then.
|
||||
|
||||
### Decision: Plain-text baseline format
|
||||
- **Context**: Requirement 4.2 demands stable, diff-friendly format.
|
||||
- **Alternatives Considered**: JSON, YAML.
|
||||
- **Selected Approach**: One line per scoped path: `<path>\t<count>`,
|
||||
sorted lexicographically by path, single trailing newline.
|
||||
- **Rationale**: Zero parser dependency; predictable diffs; trivial
|
||||
to refresh atomically.
|
||||
- **Trade-offs**: Less expressive than JSON (no nested structure), but
|
||||
the data model is two integers — nesting is unnecessary.
|
||||
|
||||
### Decision: Refresh via `--update-baseline` subcommand-style flag
|
||||
- **Context**: Requirement 4.3 needs an explicit refresh path.
|
||||
- **Alternatives Considered**: Separate `update_baseline.py` script;
|
||||
Makefile target.
|
||||
- **Selected Approach**: Single script with two modes: default (check
|
||||
+ exit 0/1) and `--update-baseline` (overwrite baseline + exit 0).
|
||||
- **Rationale**: One CLI surface to remember; the failure message
|
||||
prints the exact command to run.
|
||||
- **Trade-offs**: Slightly more conditional logic in one script;
|
||||
acceptable given the small total LoC.
|
||||
|
||||
### Decision: Workflow runs only on `pull_request` to `main`
|
||||
- **Context**: Requirement 5.1.
|
||||
- **Alternatives Considered**: Run on `push` to all branches as well;
|
||||
run on `pull_request` to any base branch.
|
||||
- **Selected Approach**: `on.pull_request.branches: [main]` only.
|
||||
- **Rationale**: Aligns with how the existing project uses `main` as
|
||||
the protected branch (see `gh pr list` history; every feature PR
|
||||
targets `main`). Avoids redundant runs on intra-branch chains.
|
||||
- **Trade-offs**: A direct push to `main` would not be guarded — but
|
||||
branch protection already discourages that path (per
|
||||
`dev-guidelines.md`).
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
- **Risk**: Baseline drifts upward unintentionally during
|
||||
`--update-baseline` runs, hiding real regressions.
|
||||
- *Mitigation*: Failure message instructs contributors to refresh
|
||||
*only when intentional*; the baseline file is reviewed in the same
|
||||
PR diff. Acceptance Criteria 3.3 makes this explicit.
|
||||
- **Risk**: `git grep -P` not built with PCRE on a developer's local
|
||||
git build (rare on Linux/macOS, possible on minimal Windows builds).
|
||||
- *Mitigation*: The guard prints a clear error if `git grep` exits
|
||||
non-zero with PCRE mode; documents Python ≥3.11 + git ≥2.20 as
|
||||
prerequisites.
|
||||
- **Risk**: Baseline counts captured on a feature branch include
|
||||
changes not yet on `main`, mis-anchoring the ratchet.
|
||||
- *Mitigation*: The implementation task explicitly recomputes
|
||||
baseline against `origin/main` before committing; documented in
|
||||
`tasks.md`.
|
||||
|
||||
## References
|
||||
- PR #27 audit pipeline (`audit_cjk.sh`, `check_parity.py`,
|
||||
`classify.py`) — methodology source of truth.
|
||||
- `scripts/check_i18n_logs.py` — CLI/style precedent.
|
||||
- `git grep` man page — `-n`, `-I`, `-P` flag semantics.
|
||||
- GitHub Actions `actions/setup-python@v5` and `actions/checkout@v4`
|
||||
README pages.
|
||||
|
|
@ -0,0 +1,24 @@
|
|||
{
|
||||
"feature_name": "i18n-ci-guard",
|
||||
"created_at": "2026-05-08T00:25:37Z",
|
||||
"updated_at": "2026-05-08T00:40:00Z",
|
||||
"language": "en",
|
||||
"phase": "tasks-generated",
|
||||
"approvals": {
|
||||
"requirements": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"design": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"tasks": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
}
|
||||
},
|
||||
"ready_for_implementation": true,
|
||||
"ticket": "26",
|
||||
"ticket_url": "https://github.com/salestech-group/MiroFish/issues/26"
|
||||
}
|
||||
|
|
@ -0,0 +1,157 @@
|
|||
# Implementation Tasks — i18n-ci-guard
|
||||
|
||||
> Approved spec: see `requirements.md`, `design.md`, `research.md`,
|
||||
> `gap-analysis.md` in this directory.
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1. Foundation: scaffold the CI guard script with stable CLI surface and stdlib-only dependencies
|
||||
- [x] 1.1 Create the empty guard script and CLI skeleton
|
||||
- Place the new script at the path designated by the design (`scripts/ci/`).
|
||||
- Establish the module docstring, the canonical CJK regex constant, the
|
||||
scoped-paths constant tuple, and the `argparse` parser exposing default
|
||||
check mode plus an explicit `--update-baseline` flag and a
|
||||
`--baseline` path override.
|
||||
- Confirm the script exits 0 on a smoke `--help` invocation and rejects
|
||||
unknown flags with non-zero exit.
|
||||
- Observable: running `python scripts/ci/i18n_cjk_guard.py --help` from
|
||||
the repo root prints usage text containing every documented flag and
|
||||
exits 0; running with an unknown flag exits non-zero.
|
||||
- _Requirements: 5.5, 6.2, 6.3_
|
||||
- _Boundary: i18n_cjk_guard.py_
|
||||
|
||||
- [x] 2. Core: implement the two CJK checks
|
||||
- [x] 2.1 Implement the locale-catalogue scan
|
||||
- Recursively walk the parsed `locales/en.json` dict, applying the
|
||||
canonical regex to every string leaf to gather offending entries.
|
||||
- Compute the source line number by re-reading the file as text and
|
||||
matching the value's first textual occurrence; truncate snippets to
|
||||
the documented snippet length.
|
||||
- On a missing or unreadable catalogue file, emit a clear stderr
|
||||
message and exit non-zero.
|
||||
- Observable: against a synthetic clean catalogue, the function returns
|
||||
an empty list; against a synthetic catalogue with one CJK value, it
|
||||
returns exactly one finding tuple with the correct dotted key and
|
||||
line number.
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 3.1_
|
||||
- _Boundary: i18n_cjk_guard.py_
|
||||
|
||||
- [x] 2.2 (P) Implement the per-path CJK count via `git grep`
|
||||
- Invoke `git grep -nIP '[\x{4e00}-\x{9fff}]' -- <scoped_path>` for each
|
||||
scoped path; treat exit codes 0 (matches found) and 1 (no matches) as
|
||||
success, any other exit code as a hard error reported on stderr.
|
||||
- Count lines of stdout; the result for a zero-match path must be the
|
||||
integer `0`, never an exception.
|
||||
- Reject working-tree states where `git` is not available or PCRE is
|
||||
not enabled, with a clear stderr message.
|
||||
- Observable: against a tmp git repository with N planted CJK lines
|
||||
under a scoped path, the function returns N; with zero CJK content,
|
||||
it returns 0; binary files and untracked files do not contribute.
|
||||
- _Requirements: 2.1, 2.4, 2.5, 2.6_
|
||||
- _Boundary: i18n_cjk_guard.py_
|
||||
|
||||
- [x] 2.3 Implement baseline file read/write with strict format
|
||||
- Parse the baseline file as `<path>\t<count>` lines, ignoring `#`
|
||||
comments and blank lines, raising a typed error on malformed input
|
||||
or missing file.
|
||||
- Write atomically (`tmp + os.replace`) with sorted entries, a single
|
||||
header comment block, and a single trailing newline.
|
||||
- Observable: a round-trip write/read of a deterministic counts dict
|
||||
yields the same dict; a baseline file containing a non-tab line is
|
||||
rejected with a clear error; the baseline file ends with exactly one
|
||||
`\n`.
|
||||
- _Requirements: 4.2, 4.3_
|
||||
- _Boundary: i18n_cjk_guard.py_
|
||||
|
||||
- [x] 3. Integration: wire the two checks into the default and refresh modes
|
||||
- [x] 3.1 Compose the default check mode
|
||||
- Run both checks under all conditions (do not short-circuit), so a
|
||||
single CI log shows every failure in one pass.
|
||||
- Print a one-line success summary per check on stdout when both pass.
|
||||
- On locale failure, print `<file>:<line>: <reason>: <snippet>` lines
|
||||
on stderr and a trailing `N issues` summary; on regression failure,
|
||||
print `<path>: cjk-regression: baseline=<b> current=<c> delta=+<d>`
|
||||
lines plus the exact verbatim refresh command.
|
||||
- Surface a non-zero exit when either check fails and exit 0 only when
|
||||
both pass.
|
||||
- Observable: against a working tree with the committed baseline at or
|
||||
above the current count and a CJK-clean en.json, exit code is 0 and
|
||||
stdout contains the success summary; planting one CJK char in
|
||||
en.json or planting enough new CJK lines to break the baseline
|
||||
yields exit 1 and the documented stderr text.
|
||||
- _Requirements: 1.2, 1.3, 1.4, 2.2, 2.3, 2.4, 3.1, 3.2, 3.3, 3.4, 4.4, 4.5_
|
||||
- _Boundary: i18n_cjk_guard.py_
|
||||
|
||||
- [x] 3.2 Compose the `--update-baseline` mode
|
||||
- When the flag is provided, recompute current per-path counts and
|
||||
overwrite the baseline file via the atomic writer; print the new
|
||||
counts on stdout; exit 0.
|
||||
- When the flag is absent, never write the baseline file under any
|
||||
code path.
|
||||
- Observable: invoking with `--update-baseline` rewrites the baseline
|
||||
file's contents to match current counts and exits 0; running the
|
||||
default mode immediately afterward exits 0.
|
||||
- _Requirements: 4.3, 4.4_
|
||||
- _Boundary: i18n_cjk_guard.py_
|
||||
|
||||
- [x] 4. Establish the committed baseline anchored to `main`
|
||||
- [x] 4.1 Capture initial baseline counts against `main`
|
||||
- Operate from a tree that reflects `origin/main`'s state for the
|
||||
scoped paths (e.g., a fresh checkout, a worktree at `origin/main`,
|
||||
or `git checkout origin/main -- backend/app frontend/src` followed
|
||||
by a clean revert) so the committed baseline does not over- or
|
||||
under-count relative to the merge target.
|
||||
- Run `--update-baseline` to materialize the counts; confirm the
|
||||
resulting file is exactly two non-comment data lines (one per
|
||||
scoped path) sorted lexicographically.
|
||||
- Observable: the baseline file is committed to
|
||||
`.kiro/specs/i18n-ci-guard/baseline.txt` and `python scripts/ci/i18n_cjk_guard.py`
|
||||
against the same `main`-aligned tree exits 0.
|
||||
- _Requirements: 4.1, 4.2_
|
||||
- _Boundary: baseline.txt_
|
||||
|
||||
- [x] 5. Wire the guard into GitHub Actions on every PR to `main`
|
||||
- [x] 5.1 Add the PR-time workflow
|
||||
- Create the workflow file at the path designated by the design,
|
||||
triggered on `pull_request` whose base ref is `main`.
|
||||
- Set explicit minimal permissions (`contents: read`), a one-minute
|
||||
job timeout, `actions/checkout@v4` with `fetch-depth: 1`, and
|
||||
`actions/setup-python@v5` pinned to Python 3.11.
|
||||
- The single executable step invokes the guard script with no
|
||||
arguments; the workflow surfaces the script's stdout and stderr in
|
||||
the GitHub Actions log without filtering.
|
||||
- Observable: the workflow YAML parses cleanly; on a PR with no CJK
|
||||
regression, the job passes; on a PR that introduces a CJK regression
|
||||
or CJK in en.json, the job fails and the log shows the documented
|
||||
failure messages.
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6_
|
||||
- _Boundary: i18n-cjk-guard.yml_
|
||||
|
||||
- [x] 6. Validation: tests and end-to-end checks
|
||||
- [x] 6.1 Add unit and integration tests for the guard script
|
||||
- Cover the locale scan against a synthetic clean catalogue and a
|
||||
synthetic CJK-tainted catalogue, asserting findings tuples match.
|
||||
- Cover the per-path counter against a tmp git repo with both N>0
|
||||
and N=0 planted CJK lines, asserting the zero-match path exits
|
||||
cleanly with a count of 0.
|
||||
- Cover the baseline read/write round-trip and the malformed-input
|
||||
rejection path.
|
||||
- Cover the default mode end-to-end (pass and fail paths) with the
|
||||
expected exit codes and stderr fragments, including the verbatim
|
||||
refresh command on regression failure.
|
||||
- Observable: `python -m pytest scripts/ci/tests/test_i18n_cjk_guard.py`
|
||||
from the repo root passes locally with stdlib-only Python.
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 2.1, 2.4, 2.5, 2.6, 3.3, 4.3, 4.5, 6.1, 6.3_
|
||||
- _Boundary: scripts/ci/tests/_
|
||||
|
||||
- [x] 6.2 Run the guard locally to confirm reproducibility against the committed baseline
|
||||
- From a clean working tree at `main` (or a worktree at `origin/main`
|
||||
+ this branch's new files merged on top), invoke the guard with no
|
||||
arguments and confirm exit code 0 and the success summary.
|
||||
- Confirm the same command is the documented developer entry point
|
||||
referenced from the failure-message refresh hint.
|
||||
- Observable: terminal session shows exit code 0 and the documented
|
||||
one-line per-check success summary; the same script path (`scripts/ci/i18n_cjk_guard.py`)
|
||||
appears verbatim in the regression-failure refresh hint.
|
||||
- _Requirements: 6.1, 6.2, 6.3_
|
||||
- _Boundary: i18n_cjk_guard.py, baseline.txt_
|
||||
|
|
@ -0,0 +1,393 @@
|
|||
#!/usr/bin/env python3
|
||||
"""i18n CJK guard for pull-request CI.
|
||||
|
||||
Run from the repository root::
|
||||
|
||||
python scripts/ci/i18n_cjk_guard.py
|
||||
python scripts/ci/i18n_cjk_guard.py --update-baseline
|
||||
|
||||
Two checks always run (no short-circuit):
|
||||
|
||||
* ``locales/en.json`` must contain zero CJK characters
|
||||
(range ``U+4E00..U+9FFF``).
|
||||
* CJK match counts under ``backend/app/`` and ``frontend/src/`` must not
|
||||
exceed the committed per-path baseline at
|
||||
``.kiro/specs/i18n-ci-guard/baseline.txt``.
|
||||
|
||||
Both checks rely on the canonical scan
|
||||
``git grep -nIP '[\\x{4e00}-\\x{9fff}]' -- <scoped_path>`` so the guard
|
||||
stays bytewise-aligned with the broader audit pipeline.
|
||||
|
||||
Stdlib only. Exit code is 0 on success and 1 on any failure or hard
|
||||
error.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
CJK_RE: re.Pattern[str] = re.compile(r"[一-鿿]")
|
||||
CJK_PATTERN: str = r"[\x{4e00}-\x{9fff}]"
|
||||
SCOPED_PATHS: tuple[str, ...] = ("backend/app", "frontend/src")
|
||||
EN_JSON_REL_PATH: str = "locales/en.json"
|
||||
DEFAULT_BASELINE_REL_PATH: str = ".kiro/specs/i18n-ci-guard/baseline.txt"
|
||||
SNIPPET_MAX_LEN: int = 80
|
||||
REFRESH_COMMAND: str = "python scripts/ci/i18n_cjk_guard.py --update-baseline"
|
||||
REFRESH_HINT: str = f"# refresh via: {REFRESH_COMMAND}"
|
||||
|
||||
LocaleFinding = tuple[str, int, str]
|
||||
|
||||
|
||||
class BaselineError(Exception):
|
||||
"""Raised when the baseline file is missing or malformed."""
|
||||
|
||||
|
||||
def _truncate(text: str, limit: int = SNIPPET_MAX_LEN) -> str:
|
||||
if len(text) <= limit:
|
||||
return text
|
||||
return text[: limit - 3] + "..."
|
||||
|
||||
|
||||
def _flatten(prefix: str, value: object, out: list[tuple[str, object]]) -> None:
|
||||
if isinstance(value, dict):
|
||||
for key, child in value.items():
|
||||
child_prefix = f"{prefix}.{key}" if prefix else str(key)
|
||||
_flatten(child_prefix, child, out)
|
||||
else:
|
||||
out.append((prefix, value))
|
||||
|
||||
|
||||
def _value_line_number(text_lines: list[str], value: str) -> int:
|
||||
"""Best-effort line number for ``value`` in the original JSON text.
|
||||
|
||||
Tries the raw value first (matches when the JSON file was written with
|
||||
``ensure_ascii=False``), then the JSON-escaped form, then falls back to
|
||||
line 1 so callers always have a usable integer.
|
||||
"""
|
||||
candidates: list[str] = [value]
|
||||
escaped = json.dumps(value)[1:-1]
|
||||
if escaped not in candidates:
|
||||
candidates.append(escaped)
|
||||
for candidate in candidates:
|
||||
if not candidate:
|
||||
continue
|
||||
for index, line in enumerate(text_lines, start=1):
|
||||
if candidate in line:
|
||||
return index
|
||||
return 1
|
||||
|
||||
|
||||
def scan_locale_cjk(en_json_path: Path) -> list[LocaleFinding]:
|
||||
"""Return ``(dotted_key, line_number, snippet)`` for every CJK leaf.
|
||||
|
||||
Args:
|
||||
en_json_path: Path to ``locales/en.json``.
|
||||
|
||||
Returns:
|
||||
A list of findings in document order. Empty when the catalogue is
|
||||
CJK-clean. Non-string leaves and empty strings are skipped.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If ``en_json_path`` does not exist.
|
||||
json.JSONDecodeError: If the file is not valid JSON.
|
||||
"""
|
||||
raw = en_json_path.read_text(encoding="utf-8")
|
||||
data = json.loads(raw)
|
||||
flat: list[tuple[str, object]] = []
|
||||
_flatten("", data, flat)
|
||||
text_lines = raw.splitlines()
|
||||
findings: list[LocaleFinding] = []
|
||||
for key, value in flat:
|
||||
if not isinstance(value, str) or not value:
|
||||
continue
|
||||
if not CJK_RE.search(value):
|
||||
continue
|
||||
line_no = _value_line_number(text_lines, value)
|
||||
findings.append((key, line_no, _truncate(value)))
|
||||
return findings
|
||||
|
||||
|
||||
def count_path_cjk(repo_root: Path, scoped_path: str) -> int:
|
||||
"""Count CJK match lines under ``scoped_path`` via ``git grep -nIP``.
|
||||
|
||||
Args:
|
||||
repo_root: Working-tree root used as ``git`` CWD.
|
||||
scoped_path: Repo-relative path to scan (e.g. ``backend/app``).
|
||||
|
||||
Returns:
|
||||
The number of matching tracked-text lines. ``-I`` excludes binary
|
||||
files; untracked files are excluded by default.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If ``git grep`` fails for any reason other than
|
||||
"no matches" (exit code 1, which is treated as zero matches).
|
||||
"""
|
||||
cmd = ["git", "grep", "-nIP", CJK_PATTERN, "--", scoped_path]
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
cwd=repo_root,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
)
|
||||
if proc.returncode not in (0, 1):
|
||||
raise RuntimeError(
|
||||
f"git grep failed (exit {proc.returncode}) for {scoped_path}: "
|
||||
f"{proc.stderr.strip()}"
|
||||
)
|
||||
if not proc.stdout:
|
||||
return 0
|
||||
return sum(1 for line in proc.stdout.splitlines() if line)
|
||||
|
||||
|
||||
def read_baseline(baseline_path: Path) -> dict[str, int]:
|
||||
"""Parse the baseline file and return ``{scoped_path: count}``.
|
||||
|
||||
Args:
|
||||
baseline_path: Absolute path to the baseline file.
|
||||
|
||||
Returns:
|
||||
A dict keyed by scoped path with non-negative integer counts.
|
||||
|
||||
Raises:
|
||||
BaselineError: If the file is missing or contains a malformed line.
|
||||
"""
|
||||
if not baseline_path.exists():
|
||||
raise BaselineError(
|
||||
f"{baseline_path}: missing or malformed; "
|
||||
f"refresh via: {REFRESH_COMMAND}"
|
||||
)
|
||||
counts: dict[str, int] = {}
|
||||
for raw_line in baseline_path.read_text(encoding="utf-8").splitlines():
|
||||
line = raw_line.rstrip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
if "\t" not in line:
|
||||
raise BaselineError(
|
||||
f"{baseline_path}: malformed line {raw_line!r}; "
|
||||
f"expected '<path>\\t<count>'"
|
||||
)
|
||||
path, _, count_str = line.partition("\t")
|
||||
if not path or not count_str.isdigit():
|
||||
raise BaselineError(
|
||||
f"{baseline_path}: malformed line {raw_line!r}; "
|
||||
f"expected '<path>\\t<count>'"
|
||||
)
|
||||
counts[path] = int(count_str)
|
||||
return counts
|
||||
|
||||
|
||||
def write_baseline(baseline_path: Path, counts: dict[str, int]) -> None:
|
||||
"""Atomically write the baseline file with sorted entries.
|
||||
|
||||
Args:
|
||||
baseline_path: Target file path.
|
||||
counts: Per-path baseline counts; keys are written in lexicographic
|
||||
order with a single trailing newline.
|
||||
"""
|
||||
header = (
|
||||
"# Per-path CJK baseline for the i18n CI guard.\n"
|
||||
"# Format: <path>\\t<count>. Sorted lexicographically.\n"
|
||||
f"# Refresh via: {REFRESH_COMMAND}\n"
|
||||
)
|
||||
body_lines = [f"{path}\t{counts[path]}" for path in sorted(counts)]
|
||||
body = "\n".join(body_lines) + "\n"
|
||||
contents = header + body
|
||||
baseline_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = baseline_path.with_suffix(baseline_path.suffix + ".tmp")
|
||||
tmp.write_text(contents, encoding="utf-8")
|
||||
os.replace(tmp, baseline_path)
|
||||
|
||||
|
||||
def _format_locale_finding(key: str, line_no: int, snippet: str) -> str:
|
||||
return f"{EN_JSON_REL_PATH}:{line_no}: cjk-in-en: {key} = {snippet}"
|
||||
|
||||
|
||||
def _format_regression_line(path: str, baseline: int, current: int) -> str:
|
||||
delta = current - baseline
|
||||
sign = "+" if delta > 0 else ""
|
||||
return (
|
||||
f"{path}: cjk-regression: baseline={baseline} "
|
||||
f"current={current} delta={sign}{delta}"
|
||||
)
|
||||
|
||||
|
||||
def run_check(repo_root: Path, baseline_path: Path) -> int:
|
||||
"""Run both guard checks and return the script exit code.
|
||||
|
||||
Args:
|
||||
repo_root: Working-tree root passed to ``git grep``.
|
||||
baseline_path: Path to the baseline file.
|
||||
|
||||
Returns:
|
||||
``0`` when both checks pass, ``1`` otherwise.
|
||||
"""
|
||||
failed = False
|
||||
success_summary: list[str] = []
|
||||
|
||||
en_json_path = repo_root / EN_JSON_REL_PATH
|
||||
if not en_json_path.exists():
|
||||
print(f"{EN_JSON_REL_PATH}: missing catalogue file", file=sys.stderr)
|
||||
failed = True
|
||||
else:
|
||||
try:
|
||||
findings = scan_locale_cjk(en_json_path)
|
||||
except json.JSONDecodeError as exc:
|
||||
print(
|
||||
f"{EN_JSON_REL_PATH}: invalid JSON: {exc.msg}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
findings = []
|
||||
failed = True
|
||||
if findings:
|
||||
for key, line_no, snippet in findings:
|
||||
print(
|
||||
_format_locale_finding(key, line_no, snippet),
|
||||
file=sys.stderr,
|
||||
)
|
||||
print(f"{len(findings)} issues", file=sys.stderr)
|
||||
failed = True
|
||||
elif not failed:
|
||||
success_summary.append("OK locales/en.json is CJK-clean")
|
||||
|
||||
try:
|
||||
baseline = read_baseline(baseline_path)
|
||||
except BaselineError as exc:
|
||||
print(str(exc), file=sys.stderr)
|
||||
return 1
|
||||
|
||||
current_counts: dict[str, int] = {}
|
||||
try:
|
||||
for path in SCOPED_PATHS:
|
||||
current_counts[path] = count_path_cjk(repo_root, path)
|
||||
except RuntimeError as exc:
|
||||
print(f"git grep failed: {exc}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
regressions: list[str] = []
|
||||
for path in SCOPED_PATHS:
|
||||
baseline_value = baseline.get(path, 0)
|
||||
current_value = current_counts[path]
|
||||
if current_value > baseline_value:
|
||||
regressions.append(
|
||||
_format_regression_line(path, baseline_value, current_value)
|
||||
)
|
||||
|
||||
if regressions:
|
||||
for line in regressions:
|
||||
print(line, file=sys.stderr)
|
||||
print(REFRESH_HINT, file=sys.stderr)
|
||||
failed = True
|
||||
else:
|
||||
per_path = ", ".join(
|
||||
f"{path}={current_counts[path]}<={baseline.get(path, 0)}"
|
||||
for path in SCOPED_PATHS
|
||||
)
|
||||
success_summary.append(
|
||||
f"OK per-path counts within baseline ({per_path})"
|
||||
)
|
||||
|
||||
if not failed:
|
||||
for line in success_summary:
|
||||
print(line)
|
||||
|
||||
return 1 if failed else 0
|
||||
|
||||
|
||||
def update_baseline(repo_root: Path, baseline_path: Path) -> int:
|
||||
"""Refresh ``baseline_path`` with current per-path counts.
|
||||
|
||||
Args:
|
||||
repo_root: Working-tree root passed to ``git grep``.
|
||||
baseline_path: Target baseline file path; created if missing.
|
||||
|
||||
Returns:
|
||||
``0`` on success.
|
||||
"""
|
||||
counts: dict[str, int] = {}
|
||||
for path in SCOPED_PATHS:
|
||||
counts[path] = count_path_cjk(repo_root, path)
|
||||
write_baseline(baseline_path, counts)
|
||||
print(f"baseline updated: {baseline_path}")
|
||||
for path in sorted(counts):
|
||||
print(f" {path}\t{counts[path]}")
|
||||
return 0
|
||||
|
||||
|
||||
def _build_parser() -> argparse.ArgumentParser:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="i18n_cjk_guard",
|
||||
description=(
|
||||
"PR-time guard: fail when locales/en.json contains CJK or when "
|
||||
"backend/app + frontend/src CJK match counts exceed the "
|
||||
"committed baseline."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--update-baseline",
|
||||
action="store_true",
|
||||
help=(
|
||||
"overwrite the baseline file with current counts and exit 0"
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--baseline",
|
||||
type=Path,
|
||||
default=None,
|
||||
help=(
|
||||
f"path to the baseline file (default: {DEFAULT_BASELINE_REL_PATH})"
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--repo-root",
|
||||
type=Path,
|
||||
default=None,
|
||||
help=(
|
||||
"repository root (default: detected via "
|
||||
"`git rev-parse --show-toplevel`)"
|
||||
),
|
||||
)
|
||||
return parser
|
||||
|
||||
|
||||
def _detect_repo_root(explicit: Path | None) -> Path:
|
||||
if explicit is not None:
|
||||
return explicit.resolve()
|
||||
proc = subprocess.run(
|
||||
["git", "rev-parse", "--show-toplevel"],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
)
|
||||
if proc.returncode != 0:
|
||||
raise RuntimeError(
|
||||
f"unable to detect repository root: {proc.stderr.strip()}"
|
||||
)
|
||||
return Path(proc.stdout.strip())
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
"""CLI entry point. Returns the script exit code."""
|
||||
parser = _build_parser()
|
||||
args = parser.parse_args(argv)
|
||||
try:
|
||||
repo_root = _detect_repo_root(args.repo_root)
|
||||
except RuntimeError as exc:
|
||||
print(str(exc), file=sys.stderr)
|
||||
return 1
|
||||
if args.baseline is not None:
|
||||
baseline_path = args.baseline.resolve()
|
||||
else:
|
||||
baseline_path = (repo_root / DEFAULT_BASELINE_REL_PATH).resolve()
|
||||
if args.update_baseline:
|
||||
return update_baseline(repo_root, baseline_path)
|
||||
return run_check(repo_root, baseline_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
|
@ -0,0 +1,358 @@
|
|||
"""Unit and integration tests for ``scripts/ci/i18n_cjk_guard.py``.
|
||||
|
||||
Stdlib-only tests using ``unittest``. Run from the repository root with::
|
||||
|
||||
python -m unittest scripts/ci/tests/test_i18n_cjk_guard.py
|
||||
|
||||
or as a script::
|
||||
|
||||
python scripts/ci/tests/test_i18n_cjk_guard.py
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
_HERE = Path(__file__).resolve().parent
|
||||
_GUARD_DIR = _HERE.parent
|
||||
sys.path.insert(0, str(_GUARD_DIR))
|
||||
|
||||
import i18n_cjk_guard as guard # noqa: E402
|
||||
|
||||
|
||||
def _git(repo: Path, *args: str) -> subprocess.CompletedProcess[str]:
|
||||
"""Run a git command in ``repo`` and return the completed process."""
|
||||
return subprocess.run(
|
||||
["git", *args],
|
||||
cwd=repo,
|
||||
check=True,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
)
|
||||
|
||||
|
||||
def _make_repo(tmp: Path) -> Path:
|
||||
"""Initialize an isolated git repository at ``tmp`` and return the path."""
|
||||
_git(tmp, "init", "-q", "-b", "main")
|
||||
_git(tmp, "config", "user.email", "test@example.com")
|
||||
_git(tmp, "config", "user.name", "Test")
|
||||
return tmp
|
||||
|
||||
|
||||
def _commit_file(repo: Path, rel: str, content: str | bytes) -> None:
|
||||
"""Write a file under ``repo`` and commit it."""
|
||||
target = repo / rel
|
||||
target.parent.mkdir(parents=True, exist_ok=True)
|
||||
if isinstance(content, str):
|
||||
target.write_text(content, encoding="utf-8")
|
||||
else:
|
||||
target.write_bytes(content)
|
||||
_git(repo, "add", "--", rel)
|
||||
_git(repo, "commit", "-q", "-m", f"add {rel}")
|
||||
|
||||
|
||||
class ScanLocaleCjkTests(unittest.TestCase):
|
||||
"""``scan_locale_cjk`` returns one ``LocaleFinding`` per CJK leaf string."""
|
||||
|
||||
def test_clean_catalogue_returns_empty_list(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
en_path = Path(tmp) / "en.json"
|
||||
en_path.write_text(
|
||||
json.dumps(
|
||||
{"common": {"confirm": "Confirm", "cancel": "Cancel"}},
|
||||
indent=2,
|
||||
),
|
||||
encoding="utf-8",
|
||||
)
|
||||
self.assertEqual(guard.scan_locale_cjk(en_path), [])
|
||||
|
||||
def test_planted_cjk_returns_one_finding(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
en_path = Path(tmp) / "en.json"
|
||||
data = {
|
||||
"common": {
|
||||
"confirm": "Confirm",
|
||||
"cancel": "取消",
|
||||
}
|
||||
}
|
||||
en_path.write_text(
|
||||
json.dumps(data, indent=2, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
findings = guard.scan_locale_cjk(en_path)
|
||||
self.assertEqual(len(findings), 1)
|
||||
key, line_no, snippet = findings[0]
|
||||
self.assertEqual(key, "common.cancel")
|
||||
self.assertGreaterEqual(line_no, 1)
|
||||
self.assertIn("取消", snippet)
|
||||
|
||||
def test_long_value_is_truncated(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
en_path = Path(tmp) / "en.json"
|
||||
value = "前置" + ("x" * 200)
|
||||
en_path.write_text(
|
||||
json.dumps({"k": value}, ensure_ascii=False),
|
||||
encoding="utf-8",
|
||||
)
|
||||
findings = guard.scan_locale_cjk(en_path)
|
||||
self.assertEqual(len(findings), 1)
|
||||
self.assertLessEqual(len(findings[0][2]), guard.SNIPPET_MAX_LEN)
|
||||
|
||||
|
||||
class CountPathCjkTests(unittest.TestCase):
|
||||
"""``count_path_cjk`` shells out to ``git grep -nIP``."""
|
||||
|
||||
def test_returns_zero_for_empty_match(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(Path(tmp))
|
||||
_commit_file(repo, "src/a.txt", "hello world\n")
|
||||
self.assertEqual(guard.count_path_cjk(repo, "src"), 0)
|
||||
|
||||
def test_counts_planted_cjk_lines(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(Path(tmp))
|
||||
_commit_file(
|
||||
repo,
|
||||
"src/a.py",
|
||||
"# 一\nprint('hi')\n# 二三\nx = '四'\n",
|
||||
)
|
||||
# Three lines contain CJK: # 一 ; # 二三 ; x = '四'.
|
||||
self.assertEqual(guard.count_path_cjk(repo, "src"), 3)
|
||||
|
||||
def test_skips_binary_files(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(Path(tmp))
|
||||
# A "binary" blob containing CJK bytes; -I should exclude it.
|
||||
_commit_file(
|
||||
repo,
|
||||
"src/blob.bin",
|
||||
b"\x00\x01\x02\xe4\xb8\x80\x00\xff",
|
||||
)
|
||||
self.assertEqual(guard.count_path_cjk(repo, "src"), 0)
|
||||
|
||||
def test_skips_untracked_files(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(Path(tmp))
|
||||
_commit_file(repo, "src/.gitkeep", "")
|
||||
(repo / "src" / "untracked.py").write_text(
|
||||
"x = '中'\n", encoding="utf-8"
|
||||
)
|
||||
self.assertEqual(guard.count_path_cjk(repo, "src"), 0)
|
||||
|
||||
|
||||
class BaselineRoundTripTests(unittest.TestCase):
|
||||
"""``read_baseline`` and ``write_baseline`` round-trip cleanly."""
|
||||
|
||||
def test_round_trip(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
path = Path(tmp) / "baseline.txt"
|
||||
counts = {"backend/app": 2792, "frontend/src": 902}
|
||||
guard.write_baseline(path, counts)
|
||||
self.assertTrue(path.read_text().endswith("\n"))
|
||||
self.assertEqual(guard.read_baseline(path), counts)
|
||||
|
||||
def test_sorted_lexicographically_and_single_trailing_newline(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
path = Path(tmp) / "baseline.txt"
|
||||
guard.write_baseline(path, {"frontend/src": 1, "backend/app": 2})
|
||||
text = path.read_text(encoding="utf-8")
|
||||
data_lines = [
|
||||
line for line in text.splitlines() if not line.startswith("#")
|
||||
]
|
||||
self.assertEqual(
|
||||
data_lines,
|
||||
["backend/app\t2", "frontend/src\t1"],
|
||||
)
|
||||
self.assertTrue(text.endswith("\n"))
|
||||
self.assertFalse(text.endswith("\n\n"))
|
||||
|
||||
def test_missing_file_raises_baseline_error(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
path = Path(tmp) / "missing.txt"
|
||||
with self.assertRaises(guard.BaselineError):
|
||||
guard.read_baseline(path)
|
||||
|
||||
def test_malformed_line_raises_baseline_error(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
path = Path(tmp) / "baseline.txt"
|
||||
path.write_text(
|
||||
"# header\nbackend/app 100\n", encoding="utf-8"
|
||||
)
|
||||
with self.assertRaises(guard.BaselineError):
|
||||
guard.read_baseline(path)
|
||||
|
||||
|
||||
class RunCheckEndToEndTests(unittest.TestCase):
|
||||
"""End-to-end test of ``run_check`` against a synthetic repo."""
|
||||
|
||||
def _make_full_repo(
|
||||
self,
|
||||
tmp: Path,
|
||||
*,
|
||||
en_json: dict,
|
||||
backend_lines: int,
|
||||
frontend_lines: int,
|
||||
) -> tuple[Path, Path]:
|
||||
repo = _make_repo(tmp)
|
||||
_commit_file(
|
||||
repo,
|
||||
"locales/en.json",
|
||||
json.dumps(en_json, indent=2, ensure_ascii=False),
|
||||
)
|
||||
if backend_lines:
|
||||
content = "\n".join(f"# 中{i}" for i in range(backend_lines)) + "\n"
|
||||
_commit_file(repo, "backend/app/x.py", content)
|
||||
else:
|
||||
_commit_file(repo, "backend/app/.gitkeep", "")
|
||||
if frontend_lines:
|
||||
content = "\n".join(f"// 中{i}" for i in range(frontend_lines)) + "\n"
|
||||
_commit_file(repo, "frontend/src/x.js", content)
|
||||
else:
|
||||
_commit_file(repo, "frontend/src/.gitkeep", "")
|
||||
baseline_path = repo / "baseline.txt"
|
||||
return repo, baseline_path
|
||||
|
||||
def test_pass_within_baseline(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo, baseline_path = self._make_full_repo(
|
||||
Path(tmp),
|
||||
en_json={"k": "Confirm"},
|
||||
backend_lines=3,
|
||||
frontend_lines=2,
|
||||
)
|
||||
guard.write_baseline(
|
||||
baseline_path,
|
||||
{"backend/app": 5, "frontend/src": 5},
|
||||
)
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
self.assertEqual(rc, 0)
|
||||
|
||||
def test_fail_on_locale_cjk(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo, baseline_path = self._make_full_repo(
|
||||
Path(tmp),
|
||||
en_json={"k": "中文"},
|
||||
backend_lines=0,
|
||||
frontend_lines=0,
|
||||
)
|
||||
guard.write_baseline(
|
||||
baseline_path,
|
||||
{"backend/app": 0, "frontend/src": 0},
|
||||
)
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
self.assertEqual(rc, 1)
|
||||
|
||||
def test_fail_on_regression_with_refresh_hint(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo, baseline_path = self._make_full_repo(
|
||||
Path(tmp),
|
||||
en_json={"k": "Confirm"},
|
||||
backend_lines=10,
|
||||
frontend_lines=0,
|
||||
)
|
||||
guard.write_baseline(
|
||||
baseline_path,
|
||||
{"backend/app": 5, "frontend/src": 0},
|
||||
)
|
||||
# Capture stderr.
|
||||
from io import StringIO
|
||||
|
||||
captured_err = StringIO()
|
||||
old_err = sys.stderr
|
||||
sys.stderr = captured_err
|
||||
try:
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
finally:
|
||||
sys.stderr = old_err
|
||||
self.assertEqual(rc, 1)
|
||||
err_text = captured_err.getvalue()
|
||||
self.assertIn("cjk-regression", err_text)
|
||||
self.assertIn(
|
||||
"python scripts/ci/i18n_cjk_guard.py --update-baseline",
|
||||
err_text,
|
||||
)
|
||||
|
||||
def test_missing_en_json_fails(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(Path(tmp))
|
||||
_commit_file(repo, "backend/app/.gitkeep", "")
|
||||
_commit_file(repo, "frontend/src/.gitkeep", "")
|
||||
baseline_path = repo / "baseline.txt"
|
||||
guard.write_baseline(
|
||||
baseline_path,
|
||||
{"backend/app": 0, "frontend/src": 0},
|
||||
)
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
self.assertEqual(rc, 1)
|
||||
|
||||
def test_missing_baseline_fails(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo, baseline_path = self._make_full_repo(
|
||||
Path(tmp),
|
||||
en_json={"k": "Confirm"},
|
||||
backend_lines=0,
|
||||
frontend_lines=0,
|
||||
)
|
||||
# Do not write the baseline.
|
||||
self.assertFalse(baseline_path.exists())
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
self.assertEqual(rc, 1)
|
||||
|
||||
|
||||
class UpdateBaselineTests(unittest.TestCase):
|
||||
"""``update_baseline`` writes current counts and exits 0."""
|
||||
|
||||
def test_update_then_check_passes(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = _make_repo(Path(tmp))
|
||||
_commit_file(
|
||||
repo,
|
||||
"locales/en.json",
|
||||
json.dumps({"k": "Confirm"}, indent=2),
|
||||
)
|
||||
_commit_file(repo, "backend/app/x.py", "# 一\n# 二\n")
|
||||
_commit_file(repo, "frontend/src/.gitkeep", "")
|
||||
baseline_path = repo / "baseline.txt"
|
||||
self.assertEqual(
|
||||
guard.update_baseline(repo, baseline_path), 0
|
||||
)
|
||||
counts = guard.read_baseline(baseline_path)
|
||||
self.assertEqual(counts["backend/app"], 2)
|
||||
self.assertEqual(counts["frontend/src"], 0)
|
||||
self.assertEqual(guard.run_check(repo, baseline_path), 0)
|
||||
|
||||
|
||||
class CliSmokeTests(unittest.TestCase):
|
||||
"""``main`` exposes the documented CLI surface."""
|
||||
|
||||
def test_help_flag_exits_zero(self) -> None:
|
||||
guard_script = _GUARD_DIR / "i18n_cjk_guard.py"
|
||||
proc = subprocess.run(
|
||||
[sys.executable, str(guard_script), "--help"],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
)
|
||||
self.assertEqual(proc.returncode, 0)
|
||||
for flag in ("--update-baseline", "--baseline", "--repo-root"):
|
||||
self.assertIn(flag, proc.stdout)
|
||||
|
||||
def test_unknown_flag_exits_nonzero(self) -> None:
|
||||
guard_script = _GUARD_DIR / "i18n_cjk_guard.py"
|
||||
proc = subprocess.run(
|
||||
[sys.executable, str(guard_script), "--no-such-flag"],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
)
|
||||
self.assertNotEqual(proc.returncode, 0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
Loading…
Reference in New Issue