feat(i18n): enforce locale-key parity in pr-time ci guard
Extend scripts/ci/i18n_cjk_guard.py with a third check that fails any
PR introducing a key in only one of locales/en.json / locales/zh.json.
The new check runs alongside the existing CJK-clean and per-path
ratchet checks, with no short-circuit and the same single-exit-code
contract; the workflow file and CLI flags are untouched.
Live catalogues are already parity-clean (962 keys per side), so the
guard ships green. This addresses acceptance criterion 4 of the
English-support epic ("for every externalized log message, matching
log.* keys exist in both locales/en.json and locales/zh.json") with a
permanent automated guard, complementing the CJK-clean ratchet from #26.
Refs #11
This commit is contained in:
parent
e6f939592c
commit
a7eab1eea1
|
|
@ -0,0 +1,517 @@
|
|||
# Design — i18n-locale-parity-guard
|
||||
|
||||
## Overview
|
||||
|
||||
This feature extends the project's PR-time i18n CI guard so that any pull request which introduces a key in only one of `locales/en.json` / `locales/zh.json` fails. It satisfies acceptance criterion #4 of epic #11 (locale-key parity) with a permanent automated check.
|
||||
|
||||
**Purpose**: Lock in locale-catalogue key parity as a permanent CI invariant so that AC #4 of epic #11 cannot regress as new strings are added.
|
||||
**Users**: Project maintainers and PR authors. Maintainers gain a hard regression gate; PR authors gain a script they can run locally to confirm parity before pushing.
|
||||
**Impact**: Adds a third check to the existing PR-time guard `scripts/ci/i18n_cjk_guard.py`. No production source under `backend/app/`, `frontend/src/`, or `locales/` is modified by this spec.
|
||||
|
||||
### Goals
|
||||
|
||||
- Fail any PR whose flattened-key set in `locales/en.json` differs from that of `locales/zh.json`.
|
||||
- Print actionable failure lines (`<file>:<line>: parity-<en|zh>-only: <dotted-key>`) and a summary count.
|
||||
- Compose with the existing CJK-clean and per-path-ratchet checks in a single CLI invocation, with a single exit code, no short-circuit.
|
||||
- Run end-to-end in well under one second on the live catalogues; stdlib-only.
|
||||
- Pass on `main` at the moment this spec ships (live catalogues are already parity-clean).
|
||||
|
||||
### Non-Goals
|
||||
|
||||
- Re-implementing the manual audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/`. The new check is the CI extract; the audit retains its own copy of `check_parity.py`.
|
||||
- Cross-locale value-equality, identical-value heuristics, or ICU-placeholder-shape checks.
|
||||
- Auto-creating missing keys, suggesting translations, or reformatting the catalogues.
|
||||
- Modifying the `locales/` schema, the `vue-i18n` runtime, or `backend/app/utils/locale.py`.
|
||||
- Adding a new GitHub Actions workflow or workflow step.
|
||||
|
||||
## Boundary Commitments
|
||||
|
||||
### This Spec Owns
|
||||
|
||||
- The new parity-check helpers (`_flatten_keys`, `_locate_key_line`, `_format_parity_finding`, `run_parity_check`) and constants (`ZH_JSON_REL_PATH`) inside `scripts/ci/i18n_cjk_guard.py`.
|
||||
- The new third block of `run_check` that invokes `run_parity_check` and integrates its result into the existing `failed` accumulator and `success_summary` collector.
|
||||
- The pass/fail semantics of the locale-key parity check.
|
||||
- New unit / integration tests under `scripts/ci/tests/` covering the parity check and its composition.
|
||||
|
||||
### Out of Boundary
|
||||
|
||||
- The audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` (independent, manual-only).
|
||||
- The structure or format of the baseline file `.kiro/specs/i18n-ci-guard/baseline.txt` (parity is binary; no baseline needed).
|
||||
- The workflow file `.github/workflows/i18n-cjk-guard.yml` (unchanged; same `python scripts/ci/i18n_cjk_guard.py` invocation already covers the new check).
|
||||
- Any change to `locales/en.json` or `locales/zh.json` content.
|
||||
- Open follow-up issues #7, #23, #25 (out-of-scope translation work).
|
||||
|
||||
### Allowed Dependencies
|
||||
|
||||
- Python ≥3.11 standard library (`json`, `os`, `pathlib`, `re`, `subprocess`, `sys`, `argparse`, `unittest`).
|
||||
- The existing helpers `_flatten`, `_value_line_number`, `_truncate`, the `EN_JSON_REL_PATH` constant, and the `run_check`/`update_baseline` functions in `scripts/ci/i18n_cjk_guard.py`.
|
||||
- `git` (for the existing CJK-counting block, untouched here).
|
||||
|
||||
### Revalidation Triggers
|
||||
|
||||
- Adding a third locale catalogue → parity becomes pairwise; design must be revisited.
|
||||
- Changing the `flatten` contract (e.g. encoding non-dict containers like lists) → the parity check's "exact match with `check_parity.py`" clause must be re-asserted against the new contract.
|
||||
- Splitting the guard into multiple CLI scripts → Requirement 3 ("one invocation") must be re-anchored.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Existing Architecture Analysis
|
||||
|
||||
The guard is a single-file Python CLI: `scripts/ci/i18n_cjk_guard.py` (~393 lines, stdlib-only) invoked by one workflow step in `.github/workflows/i18n-cjk-guard.yml`. Its `run_check(repo_root, baseline_path) -> int` function is the orchestrator; today it composes two checks without short-circuit:
|
||||
|
||||
1. `scan_locale_cjk(en_json_path)` — fail when `locales/en.json` contains any CJK character.
|
||||
2. Per-path baseline ratchet — fail when `count_path_cjk(repo_root, p)` exceeds `read_baseline(...)[p]` for any `p` in `("backend/app", "frontend/src")`.
|
||||
|
||||
A `failed: bool` accumulator is set independently by each block; a `success_summary: list[str]` collects "OK …" lines that print only on full success. This design extends it with a third block.
|
||||
|
||||
The audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` already implements the algorithm we need (recursive `flatten` + symmetric difference). Its logic is the canonical reference for Requirement 1.1.
|
||||
|
||||
### Architecture Pattern & Boundary Map
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
Workflow[GitHub Actions step]
|
||||
Main[main entry]
|
||||
UpdateBaseline[update_baseline]
|
||||
RunCheck[run_check orchestrator]
|
||||
CjkClean[scan_locale_cjk]
|
||||
Ratchet[count_path_cjk + read_baseline]
|
||||
Parity[run_parity_check NEW]
|
||||
EnJson[locales en.json]
|
||||
ZhJson[locales zh.json]
|
||||
BaselineFile[baseline.txt]
|
||||
|
||||
Workflow --> Main
|
||||
Main -->|--update-baseline| UpdateBaseline
|
||||
Main --> RunCheck
|
||||
RunCheck --> CjkClean
|
||||
RunCheck --> Ratchet
|
||||
RunCheck --> Parity
|
||||
CjkClean --> EnJson
|
||||
Ratchet --> BaselineFile
|
||||
Parity --> EnJson
|
||||
Parity --> ZhJson
|
||||
```
|
||||
|
||||
**Architecture Integration**:
|
||||
|
||||
- **Selected pattern**: Composed checks inside a single orchestrator (`run_check`). Each check is an independent function that returns a pass/fail signal and a list of human-readable lines; the orchestrator accumulates them.
|
||||
- **Domain/feature boundaries**: Parity logic is internal to the guard module. It does not depend on the audit pipeline, the per-path ratchet, or the locale runtime.
|
||||
- **Existing patterns preserved**: No-short-circuit composition, stderr-for-failure / stdout-for-success, lexicographic ordering for determinism, atomic-write / tmp-rename for any new persistence (none added here).
|
||||
- **New components rationale**: `run_parity_check` is the only new orchestrator-level function; small private helpers (`_flatten_keys`, `_locate_key_line`, `_format_parity_finding`) keep `run_parity_check`'s body short and individually testable.
|
||||
- **Steering compliance**: Stdlib-only; explicit type hints (PEP 604 union syntax already in use in this module); single-responsibility helpers; module dependency direction unchanged (still no imports from `backend/`, `frontend/`, or `locales/` runtime code).
|
||||
|
||||
### Technology Stack
|
||||
|
||||
| Layer | Choice / Version | Role in Feature | Notes |
|
||||
|-------|------------------|-----------------|-------|
|
||||
| Backend / Services | n/a | n/a | This is a CI tool; no backend or service code is touched. |
|
||||
| Infrastructure / Runtime | Python 3.11 stdlib (`json`, `pathlib`, `re`, `subprocess`, `sys`, `argparse`); GitHub Actions `ubuntu-latest`; `actions/checkout@v4`; `actions/setup-python@v5` | Runtime for the guard script and its new parity check. | Versions match the existing guard. No new dependencies; `pyproject.toml` and CI image unchanged. |
|
||||
| Test Tooling | Python `unittest` (stdlib) | Drives parity check unit + integration tests. | Same framework as existing tests in `scripts/ci/tests/test_i18n_cjk_guard.py`. |
|
||||
|
||||
## File Structure Plan
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
scripts/
|
||||
└── ci/
|
||||
├── i18n_cjk_guard.py # Extended: adds parity helpers + third block in run_check
|
||||
└── tests/
|
||||
└── test_i18n_cjk_guard.py # Extended: adds ParityCheckTests + composition test
|
||||
```
|
||||
|
||||
### Modified Files
|
||||
|
||||
- `scripts/ci/i18n_cjk_guard.py`
|
||||
- Add module-level constants: `ZH_JSON_REL_PATH = "locales/zh.json"`.
|
||||
- Add private helpers: `_flatten_keys`, `_locate_key_line`, `_format_parity_finding`.
|
||||
- Add public function: `run_parity_check(repo_root: Path) -> ParityResult`.
|
||||
- Add a new `NamedTuple` (or `@dataclass(frozen=True, slots=True)`) `ParityResult` with fields `(passed: bool, failure_lines: list[str], success_summary: str | None)`.
|
||||
- Edit `run_check`: insert the parity block after the per-path-ratchet block, before the final `if not failed: print(success_summary)` block. Match the existing accumulator idiom.
|
||||
- Update the module docstring to list three checks.
|
||||
- `scripts/ci/tests/test_i18n_cjk_guard.py`
|
||||
- Extend `_make_full_repo` (or add a sibling `_make_full_repo_with_zh`) to write a `locales/zh.json` alongside the existing `locales/en.json`. Keep the default ZH a parity-clean mirror of the EN fixture so existing tests do not need to change semantically.
|
||||
- Add new test class `ParityCheckTests` covering Requirements 1.1, 1.2, 1.3, 1.4, 1.5, 2.1, 2.2, 2.3, 2.5.
|
||||
- Add one composition test (Requirement 5.1.f) inside `RunCheckEndToEndTests` (or a new `RunCheckCompositionTests` class) that plants a CJK string and a parity divergence in the same repo and asserts both failure lines + exit 1.
|
||||
- Update existing `RunCheckEndToEndTests.test_*` to either commit a parity-clean `locales/zh.json` or assert the parity check now also runs but does not flip the test outcome.
|
||||
|
||||
### Files Not Created
|
||||
|
||||
- No new source file is created. Option C (separate `locale_parity.py` helper module) was rejected in `gap-analysis.md` and `research.md`.
|
||||
- No new workflow file. The existing `.github/workflows/i18n-cjk-guard.yml` is invoked unchanged.
|
||||
|
||||
## Requirements Traceability
|
||||
|
||||
| Requirement | Summary | Components | Interfaces | Flows |
|
||||
|-------------|---------|------------|------------|-------|
|
||||
| 1.1 | Flatten EN/ZH into matching dotted-key sets | `i18n_cjk_guard._flatten_keys` (new), reuses `_flatten` | `_flatten_keys(data: dict) -> set[str]` | n/a |
|
||||
| 1.2 | Pass on identical key sets, success line includes shared count | `run_parity_check`, `run_check` | `ParityResult.success_summary` | Run-Check Composition |
|
||||
| 1.3 / 1.4 | Fail on en-only or zh-only keys | `run_parity_check` | `ParityResult.passed`, `ParityResult.failure_lines` | Run-Check Composition |
|
||||
| 1.5 | Dict leaves are non-leaves; scalar leaves are leaves | `_flatten_keys` (no type narrowing) | n/a | n/a |
|
||||
| 2.1 | `<file>:<line>: parity-<side>-only: <key>` lines | `_format_parity_finding`, `_locate_key_line` | `_format_parity_finding(file, line, key, side) -> str` | n/a |
|
||||
| 2.2 | Line-1 fallback when key not located | `_locate_key_line` | `_locate_key_line(text_lines, key) -> int` (returns 1 on miss) | n/a |
|
||||
| 2.3 | Final `parity: en-only=N, zh-only=M` summary | `run_parity_check` | Last entry of `ParityResult.failure_lines` on failure | n/a |
|
||||
| 2.4 | All parity output to stderr | `run_check` integration block | `print(..., file=sys.stderr)` | Run-Check Composition |
|
||||
| 2.5 | Lexicographic ordering | `run_parity_check` | `sorted(...)` over symmetric difference | n/a |
|
||||
| 3.1 | All checks run, no short-circuit | `run_check` (existing accumulator pattern) | `failed: bool` accumulator | Run-Check Composition |
|
||||
| 3.2 / 3.3 | Single exit code: 1 on any fail, 0 otherwise | `run_check` | Returns `1 if failed else 0` | Run-Check Composition |
|
||||
| 3.4 / 3.5 | `--update-baseline`, `--baseline`, `--repo-root` flags unchanged | `main`, `_build_parser` | Existing argparse surface | n/a |
|
||||
| 3.6 | Workflow file unchanged | `.github/workflows/i18n-cjk-guard.yml` | n/a (no edit) | n/a |
|
||||
| 4.1 | Stdlib-only | `i18n_cjk_guard` imports | No new imports | n/a |
|
||||
| 4.2 | Sub-second runtime | `_flatten_keys` is O(keys); set-diff is O(keys) | n/a | n/a |
|
||||
| 4.3 | Deterministic output | All sorts lexicographic | n/a | n/a |
|
||||
| 5.1 (a–f) | Tests for success, en-only, zh-only, both, scalar-leaf, composition | `scripts/ci/tests/test_i18n_cjk_guard.py:ParityCheckTests` + composition test | n/a | n/a |
|
||||
| 5.2 / 5.3 / 5.4 | Match existing test style; isolated fixtures; clean run on parity-clean repo | Same test file | n/a | n/a |
|
||||
| 6.1 | Guard passes on live catalogues at HEAD | Manual run at implementation time | `python scripts/ci/i18n_cjk_guard.py` exit 0 | n/a |
|
||||
| 6.2 | If divergence found, document in tasks.md and fix | n/a (does not trigger; live parity holds) | n/a | n/a |
|
||||
|
||||
## System Flows
|
||||
|
||||
### Run-Check Composition
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CLI as main
|
||||
participant Orch as run_check
|
||||
participant CjkChk as scan_locale_cjk
|
||||
participant RatChk as ratchet block
|
||||
participant ParChk as run_parity_check
|
||||
participant Out as stderr/stdout
|
||||
|
||||
CLI->>Orch: run_check repo baseline
|
||||
Orch->>CjkChk: scan en.json
|
||||
CjkChk-->>Orch: findings list
|
||||
alt findings non-empty
|
||||
Orch->>Out: stderr cjk-in-en lines
|
||||
Note over Orch: failed = True
|
||||
else
|
||||
Note over Orch: success summary append
|
||||
end
|
||||
Orch->>RatChk: count + read baseline
|
||||
RatChk-->>Orch: regressions list
|
||||
alt regressions non-empty
|
||||
Orch->>Out: stderr cjk-regression lines + refresh hint
|
||||
Note over Orch: failed = True
|
||||
else
|
||||
Note over Orch: success summary append
|
||||
end
|
||||
Orch->>ParChk: run parity check
|
||||
ParChk-->>Orch: ParityResult
|
||||
alt parity failed
|
||||
Orch->>Out: stderr parity lines + parity summary
|
||||
Note over Orch: failed = True
|
||||
else
|
||||
Note over Orch: success summary append
|
||||
end
|
||||
alt failed false
|
||||
Orch->>Out: stdout success lines
|
||||
end
|
||||
Orch-->>CLI: 1 if failed else 0
|
||||
```
|
||||
|
||||
**Key decisions**:
|
||||
|
||||
- The parity block is appended last so its (potentially long) failure list is contiguous in the failure stream.
|
||||
- The `failed` accumulator is shared with the prior two blocks; this is the only mechanism for cross-block signalling.
|
||||
- The summary line `parity: en-only=N, zh-only=M` is appended to `ParityResult.failure_lines` (last entry) so the orchestrator can print all failure lines uniformly without a special-case branch.
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|
||||
|-----------|--------------|--------|--------------|--------------------------|-----------|
|
||||
| `_flatten_keys` | Guard / helper | Return the dotted-key set of a parsed JSON catalogue, mirroring `check_parity.py.flatten`. | 1.1, 1.5 | `_flatten` (P0, existing) | Service |
|
||||
| `_locate_key_line` | Guard / helper | Best-effort line-number resolution for a dotted key in raw JSON text, with line-1 fallback. | 2.1, 2.2 | none | Service |
|
||||
| `_format_parity_finding` | Guard / helper | Format one failure line as `<file>:<line>: parity-<side>-only: <key>`. | 2.1 | none | Service |
|
||||
| `ParityResult` | Guard / DTO | Carry parity-check outcome (passed flag, failure lines, success-summary line). | 1.2, 2.3, 2.5 | none | State |
|
||||
| `run_parity_check` | Guard / orchestrator-leaf | Read both catalogues, compute symmetric difference, build `ParityResult`. | 1.1–1.5, 2.1–2.5 | `_flatten_keys` (P0), `_locate_key_line` (P0), `_format_parity_finding` (P0) | Service |
|
||||
| `run_check` (modified) | Guard / orchestrator | Compose the three checks with a single `failed` accumulator and exit code. | 3.1–3.3 | All three checks (P0) | Service |
|
||||
| `ParityCheckTests` (test) | Tests | Unit + integration coverage for parity. | 5.1 (a–f), 5.2–5.4 | `run_parity_check`, `run_check` (P0) | Service |
|
||||
|
||||
### Guard / helper layer
|
||||
|
||||
#### `_flatten_keys`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Return the set of dotted-key paths of a parsed JSON object, mirroring `check_parity.py.flatten`. |
|
||||
| Requirements | 1.1, 1.5 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Iterate via the existing `_flatten(prefix, value, out)` helper to guarantee identical path semantics.
|
||||
- Descend only into `dict`. Any non-dict (string, number, bool, null, list) at a leaf produces a key.
|
||||
- Return a `set[str]` so the parity caller can compute symmetric differences without re-deduplicating.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `run_parity_check` (P0).
|
||||
- Outbound: `_flatten` (P0, existing private helper in same module).
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _flatten_keys(data: dict[str, object]) -> set[str]:
|
||||
...
|
||||
```
|
||||
|
||||
- Preconditions: `data` is the result of `json.loads` over a catalogue file (i.e., a `dict` at the top level).
|
||||
- Postconditions: every dotted path returned corresponds to a non-`dict` leaf in `data`. The set is unordered; callers must sort before formatting output (Requirement 2.5).
|
||||
- Invariants: `_flatten_keys({}) == set()`. For any catalogue `c`, `_flatten_keys(c)` is identical to the set of keys produced by `check_parity.py.flatten(c)`.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: One call site (`run_parity_check`).
|
||||
- Validation: Unit-test against a hand-rolled fixture with mixed leaf types (string, number, bool, null) and at least three nesting levels (Requirement 5.1.e).
|
||||
- Risks: None. Reuses the existing flatten primitive verbatim.
|
||||
|
||||
#### `_locate_key_line`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Best-effort line-number resolution for a dotted key in the raw JSON source text, with a deterministic line-1 fallback. |
|
||||
| Requirements | 2.1, 2.2 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Accept the splitlines view of a JSON file (`text_lines: list[str]`) and a dotted key (`dotted_key: str`).
|
||||
- Search for the leaf segment of the dotted key (after the last `.`) wrapped in JSON quotes, e.g. `"missingKey"`. Return the 1-based line number of the first match.
|
||||
- Fall back to `1` when no match is found (mirrors `_value_line_number`).
|
||||
- Performance must remain linear in the number of lines.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `run_parity_check` (P0).
|
||||
- Outbound: none.
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _locate_key_line(text_lines: list[str], dotted_key: str) -> int:
|
||||
...
|
||||
```
|
||||
|
||||
- Preconditions: `dotted_key` non-empty; `text_lines` is the result of `Path.read_text(...).splitlines()`.
|
||||
- Postconditions: returns an integer ≥ 1.
|
||||
- Invariants: When the leaf segment appears in `text_lines` wrapped in `"..."`, the return is the (1-based) line number of the first occurrence. Otherwise the return is `1`.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: One call site (`run_parity_check`).
|
||||
- Validation: Unit-test the exact-match path, the multi-occurrence path (first match wins), and the not-found fallback.
|
||||
- Risks: A leaf segment that also appears as part of another (unrelated) key or in a value text could yield a slightly misleading line number. Acceptable: the dotted key in the failure message is the source of truth; the line is a navigation aid. Documented in the docstring.
|
||||
|
||||
#### `_format_parity_finding`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Format a single parity-failure line in the canonical layout used by the guard. |
|
||||
| Requirements | 2.1 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Produce strings of the exact form `<file>:<line>: parity-en-only: <dotted-key>` or `<file>:<line>: parity-zh-only: <dotted-key>`.
|
||||
- Mirror the existing `_format_locale_finding` style (`<file>:<line>: <category>: <payload>`).
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `run_parity_check` (P0).
|
||||
- Outbound: none.
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _format_parity_finding(file_rel_path: str, line_no: int, dotted_key: str, side: str) -> str:
|
||||
...
|
||||
```
|
||||
|
||||
- Preconditions: `side in {"en-only", "zh-only"}`; `file_rel_path` is one of `EN_JSON_REL_PATH` / `ZH_JSON_REL_PATH`; `line_no >= 1`.
|
||||
- Postconditions: returns a single line with no embedded newline.
|
||||
- Invariants: The category token in the line is exactly `parity-en-only` or `parity-zh-only` so log greps match deterministically.
|
||||
|
||||
### Guard / DTO layer
|
||||
|
||||
#### `ParityResult`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Immutable carrier for parity-check outcome consumed by `run_check`. |
|
||||
| Requirements | 1.2, 2.3, 2.5 |
|
||||
|
||||
**Contracts**: State [x]
|
||||
|
||||
##### State Management
|
||||
|
||||
- State model:
|
||||
|
||||
```python
|
||||
class ParityResult(NamedTuple):
|
||||
passed: bool
|
||||
failure_lines: list[str] # already-formatted lines, including the trailing "parity: en-only=N, zh-only=M" summary on failure
|
||||
success_summary: str | None # populated only when passed is True
|
||||
```
|
||||
|
||||
- Persistence & consistency: in-memory only; constructed by `run_parity_check` and consumed by `run_check`.
|
||||
- Concurrency strategy: n/a (single-process, single-call).
|
||||
|
||||
### Guard / orchestrator-leaf
|
||||
|
||||
#### `run_parity_check`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Compute the locale-key parity outcome and produce a `ParityResult`. |
|
||||
| Requirements | 1.1–1.5, 2.1–2.5 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Read both `locales/en.json` and `locales/zh.json` from `repo_root`.
|
||||
- Flatten each via `_flatten_keys` and compute the symmetric difference.
|
||||
- For each en-only key (sorted lexicographically): resolve its line via `_locate_key_line` over the EN catalogue's source-text lines, and emit a `parity-en-only` line via `_format_parity_finding`.
|
||||
- For each zh-only key (sorted lexicographically, after en-only): resolve its line via `_locate_key_line` over the ZH catalogue's source-text lines, and emit a `parity-zh-only` line.
|
||||
- On failure, append a final `parity: en-only=N, zh-only=M` summary line to `failure_lines`.
|
||||
- On success, build the success summary `OK locale-parity: <count> keys per side`.
|
||||
- If either catalogue file is missing, return a `ParityResult(passed=False, failure_lines=[<single error line>], success_summary=None)` and let `run_check` fold the error into the global `failed` flag.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `run_check` (P0).
|
||||
- Outbound: `_flatten_keys`, `_locate_key_line`, `_format_parity_finding` (all P0).
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def run_parity_check(repo_root: Path) -> ParityResult:
|
||||
...
|
||||
```
|
||||
|
||||
- Preconditions: `repo_root` is a valid working-tree directory; `locales/en.json` and `locales/zh.json` are expected at the relative paths defined by `EN_JSON_REL_PATH` and `ZH_JSON_REL_PATH`.
|
||||
- Postconditions: returns a `ParityResult`. When `passed`, `failure_lines == []` and `success_summary` is non-`None`. When not `passed`, `failure_lines` is non-empty and ends with a `parity: en-only=…` summary line; `success_summary` is `None`.
|
||||
- Invariants: Flattened-key-set computation matches `check_parity.py.flatten` byte-for-byte for any input. Output is deterministic across runs for identical inputs.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: Called once per `run_check` invocation. Skipped entirely in `--update-baseline` mode (covered by Requirement 3.4 — `update_baseline` is invoked from `main` instead of `run_check`).
|
||||
- Validation: Unit-test all required outcomes (Requirement 5.1 a–e); integration-test composition (5.1 f).
|
||||
- Risks: A malformed JSON catalogue raises `json.JSONDecodeError`. The function should treat this the same as a missing file (return `ParityResult(passed=False, …)`), so the guard reports a clean failure rather than crashing CI with a Python traceback.
|
||||
|
||||
### Guard / orchestrator (modified)
|
||||
|
||||
#### `run_check` (modification)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Compose all three checks (CJK-clean, per-path ratchet, parity) into one exit code. |
|
||||
| Requirements | 3.1, 3.2, 3.3 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- After the existing per-path-ratchet block (existing line ~258–293) and before the final `if not failed` block (existing line ~295–298), call `run_parity_check(repo_root)`.
|
||||
- If the result is not passed, set `failed = True`, print every entry of `result.failure_lines` to `sys.stderr`, one line per `print(...)` call.
|
||||
- If passed, append `result.success_summary` to `success_summary`.
|
||||
- Return `1 if failed else 0` (unchanged).
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `main` (P0, via either standalone CLI or test invocation).
|
||||
- Outbound: `scan_locale_cjk`, per-path ratchet helpers, `run_parity_check` (all P0).
|
||||
|
||||
**Contracts**: Service [x] / State [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
Unchanged signature: `def run_check(repo_root: Path, baseline_path: Path) -> int`.
|
||||
|
||||
- Preconditions: unchanged.
|
||||
- Postconditions: exit code reflects all three checks (was: two checks).
|
||||
- Invariants: still no short-circuit between checks.
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Integration: One inserted block of ~10 lines in the existing function.
|
||||
- Validation: Existing CLI smoke tests continue to pass; new `RunCheckEndToEndTests` cases assert correct fail/pass propagation when only the parity check fails, only an existing check fails, or both fail.
|
||||
- Risks: A future maintainer could accidentally short-circuit by inserting an early `return` between blocks. Mitigated by the composition test (Requirement 5.1.f) which fails if any block is skipped.
|
||||
|
||||
### Tests
|
||||
|
||||
#### `ParityCheckTests`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Unit + integration coverage for the parity check, matching the style of existing `RunCheckEndToEndTests`. |
|
||||
| Requirements | 5.1 (a–f), 5.2, 5.3, 5.4 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
|
||||
- Use `unittest`, `tempfile.TemporaryDirectory`, and the existing `_make_repo` / `_commit_file` test helpers.
|
||||
- Each test owns its own ephemeral repo. No reliance on the live `locales/` content for negative paths (Requirement 5.3).
|
||||
- Assertions check exit code AND substring presence of the failure category tokens (`parity-en-only`, `parity-zh-only`) AND that the summary line is the last failure line.
|
||||
|
||||
**Dependencies**
|
||||
|
||||
- Inbound: `unittest.main`.
|
||||
- Outbound: `i18n_cjk_guard.run_parity_check`, `i18n_cjk_guard.run_check` (both P0).
|
||||
|
||||
**Implementation Notes**
|
||||
|
||||
- Test cases (one per Requirement 5.1 sub-bullet):
|
||||
- (a) `test_passes_when_keys_match` — both catalogues identical → `run_parity_check` returns `passed=True`; `run_check` returns 0.
|
||||
- (b) `test_fails_on_en_only_key` — `en.json` has an extra key → `run_parity_check` returns `passed=False`, failure includes `parity-en-only`, summary is `parity: en-only=1, zh-only=0`.
|
||||
- (c) `test_fails_on_zh_only_key` — symmetric of (b).
|
||||
- (d) `test_fails_on_both_sided_divergence` — failure list contains both `parity-en-only` and `parity-zh-only` lines, ordered en-first then zh, each lex-sorted within its group.
|
||||
- (e) `test_passes_with_scalar_leaves_at_same_path` — both catalogues have a scalar (e.g. `null`, `42`, `false`) at the same dotted path → parity passes (Requirement 1.5).
|
||||
- (f) `test_run_check_no_short_circuit` — one repo plants both a CJK in `en.json` and a parity-divergent key. Expect: exit 1; stderr contains both `cjk-in-en` and `parity-en-only` (or `parity-zh-only`); the per-path-ratchet success summary is suppressed (since failed).
|
||||
- Risks: Test fixtures must use `ensure_ascii=False` JSON to match the live catalogue style.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Strategy
|
||||
|
||||
- **Missing catalogue file** → `run_parity_check` returns `ParityResult(passed=False, failure_lines=[<missing-file-line>], success_summary=None)`. `run_check` flips `failed`, prints the line to stderr, returns 1.
|
||||
- **Malformed JSON** → same path as missing catalogue. `json.JSONDecodeError` is caught inside `run_parity_check`; the line printed names the offending file and the parser's `msg`.
|
||||
- **Parity divergence** (the expected unhappy path) → fail per Requirements 1.3 / 1.4 / 2.1–2.5.
|
||||
- **`_locate_key_line` cannot find the key** → fall back to line 1 (Requirement 2.2). Not an error; the caller proceeds.
|
||||
- **No-short-circuit invariant** → enforced by the orchestrator's accumulator pattern; covered by Requirement 5.1.f.
|
||||
|
||||
### Monitoring
|
||||
|
||||
CI workflow logs (GitHub Actions) are the sole observability surface. Failure lines are designed to be greppable: `parity-en-only`, `parity-zh-only`, `parity: en-only=`, `parity: zh-only=` are stable tokens.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- `_flatten_keys`: empty input, flat input, mixed-type leaves, three-level nesting, `null` and scalar leaves.
|
||||
- `_locate_key_line`: exact match, multi-occurrence (first wins), not found (line-1 fallback).
|
||||
- `_format_parity_finding`: en-only and zh-only sides, embedded special characters in key names (e.g. underscores, digits).
|
||||
- `ParityResult`: pass-shape and fail-shape construction.
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- All six `ParityCheckTests` sub-cases listed above.
|
||||
- The composition case (Requirement 5.1.f) inside `RunCheckCompositionTests` (or appended to `RunCheckEndToEndTests`).
|
||||
- A regression of the existing `RunCheckEndToEndTests` cases after extending `_make_full_repo` to write a default parity-clean `locales/zh.json`.
|
||||
|
||||
### Performance / Load
|
||||
|
||||
- One sanity case: parity check on a synthetic 10 000-key catalogue completes in well under one second on the CI runner. Asserted by a `time.perf_counter()` budget of 1.0 s in the integration test.
|
||||
|
||||
## Performance & Scalability
|
||||
|
||||
- Catalogue size: ~1000 keys today; growth bounded by the number of UI strings + log keys. Even at 10× the current size, `_flatten` + set-diff remains negligible (<100 ms).
|
||||
- The CI workflow timeout is 1 minute (`.github/workflows/i18n-cjk-guard.yml:timeout-minutes: 1`); the new check adds at most tens of milliseconds.
|
||||
|
||||
## Supporting References
|
||||
|
||||
- `gap-analysis.md` (this spec) — implementation-approach options A/B/C with rationale.
|
||||
- `research.md` (this spec) — design decision records.
|
||||
- `.kiro/specs/i18n-ci-guard/design.md` — prior CI guard's design doc (style and boundary precedents).
|
||||
- `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` — reference parity algorithm.
|
||||
|
|
@ -0,0 +1,154 @@
|
|||
# Gap Analysis — i18n-locale-parity-guard
|
||||
|
||||
## Current State Investigation
|
||||
|
||||
### Domain assets
|
||||
|
||||
| Asset | Path | Role |
|
||||
|------|------|------|
|
||||
| Existing PR-time guard | `scripts/ci/i18n_cjk_guard.py` (393 lines) | Runs (a) zero-CJK-in-`en.json`, (b) per-path CJK ratchet on `backend/app` + `frontend/src`. CLI: `--update-baseline`, `--baseline`, `--repo-root`. Stdlib-only. |
|
||||
| Workflow | `.github/workflows/i18n-cjk-guard.yml` | `pull_request` trigger; single step `python scripts/ci/i18n_cjk_guard.py`. 1-minute timeout. Python 3.11. |
|
||||
| Existing tests | `scripts/ci/tests/test_i18n_cjk_guard.py` (358 lines) | `unittest`, stdlib-only. Per-function test classes (`ScanLocaleCjkTests`, `CountPathCjkTests`, `BaselineRoundTripTests`, `RunCheckEndToEndTests`, `UpdateBaselineTests`, `CliSmokeTests`). Synthetic git repos via `tempfile.TemporaryDirectory` + `git init`. |
|
||||
| Reference parity logic | `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` (128 lines) | Already implements `flatten()` (recursive dotted-key generator) and the EN/ZH symmetric-difference computation. Used only by the manual audit pipeline; not in CI. |
|
||||
| Locale catalogues | `locales/en.json`, `locales/zh.json` | Two-space-indented JSON, `ensure_ascii=False`. 962 keys per side at HEAD; symmetric difference 0. Multi-level nesting (e.g. `common.confirm`, `step1.upload.title`, `log.api.graph.startBuild`). |
|
||||
| Prior spec | `.kiro/specs/i18n-ci-guard/{design.md,baseline.txt}` | Documents the CJK-guard's design, format, and "scope ratchets only" rationale. The new check should compose, not replace. |
|
||||
|
||||
### Conventions extracted
|
||||
|
||||
- **Module layout**: One CLI script per check class; checks compose inside a `run_check(...)` orchestrator that returns 0/1.
|
||||
- **Output discipline**: Stderr for failures, stdout for success summaries. Each failure line is self-contained (`<file>:<line>: <category>: <key/payload>`). Refresh hints (when applicable) printed once at the end.
|
||||
- **No-short-circuit composition**: `run_check` evaluates every check before exiting (existing pattern at lines 230, 258, 271 in `i18n_cjk_guard.py`).
|
||||
- **Stdlib-only, deterministic**: existing module imports only `argparse`, `json`, `os`, `re`, `subprocess`, `sys`, `pathlib`. All sorts use lexicographic order.
|
||||
- **Test-fixture isolation**: Each test owns a `tempfile.TemporaryDirectory()` and writes its own JSON / source files. Negative-path tests never depend on the live `locales/`.
|
||||
- **Atomic writes**: `write_baseline` uses tmp-file + `os.replace`; if any new persistence is added, mirror that pattern.
|
||||
- **JSON line-resolution helper**: `_value_line_number(text_lines, value)` already implements the line-fallback semantics required by R2.2 (returns 1 when value not found). Reusable for parity reporting if we resolve by **key name** rather than by **value**.
|
||||
|
||||
### Integration surfaces
|
||||
|
||||
- The workflow file invokes the guard exactly once: `python scripts/ci/i18n_cjk_guard.py`. Anything done inside `run_check` is automatically picked up — **no workflow change needed** if we extend the existing script (R3.6).
|
||||
- `--update-baseline` short-circuits inside `main()` *before* `run_check` is called; the new parity check naturally won't run in that mode (R3.4).
|
||||
- The audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` is independent and stays untouched (R6's "spec for prior CI guard" boundary).
|
||||
- Baseline file format is single-purpose (CJK counts) and does not need to grow to accommodate parity (parity has no baseline — divergence is binary).
|
||||
|
||||
## Requirement-to-Asset Map
|
||||
|
||||
| # | Requirement | Existing asset(s) | Gap tag | Notes |
|
||||
|---|-------------|------------------|---------|-------|
|
||||
| 1.1 | Flatten EN/ZH into dotted keys matching `check_parity.py` | `audit/scripts/check_parity.py:flatten` (reference); existing `_flatten` in guard also flattens but only collects (key, value) pairs into a list | **Constraint** | Two `_flatten` flavours exist. Need ONE canonical function inside the guard module that mirrors `check_parity.py.flatten` (recursive, descends into dicts only, emits leaf scalars). The existing private `_flatten(prefix, value, out)` in the guard is already key-value-emitting and will work; the parity check just consumes its keys. |
|
||||
| 1.2 | Pass when key sets identical, emit success summary with key count | `success_summary` list in `run_check` | **Missing** | Add a parity success line in the same idiom: `"OK locale-parity: 962 keys per side"`. |
|
||||
| 1.3 / 1.4 | Fail on en-only or zh-only keys | None — no parity check exists | **Missing** | Compute symmetric difference. |
|
||||
| 1.5 | Treat dict leaves as non-leaves; treat scalar leaves the same as string leaves for parity | `_flatten` already descends only into dicts and emits any non-dict as a leaf; `scan_locale_cjk` then narrows to strings, but parity should NOT narrow | **Constraint** | Use `_flatten` directly (no narrowing). |
|
||||
| 2.1 | Print `<file>:<line>: <key>: en-only|zh-only` | `_value_line_number` resolves a value's line; needs adaptation for keys | **Missing** | Search for the JSON key token (e.g. `"missingKey"`) in the source-text lines using a substring scan; reuse the line-1 fallback from `_value_line_number`. |
|
||||
| 2.2 | Fall back to line 1 when location not found | `_value_line_number` already returns 1 in this case | **Reuse** | |
|
||||
| 2.3 | Final summary `parity: en-only=<n>, zh-only=<m>` | None | **Missing** | One line, stderr. |
|
||||
| 2.4 | All parity output to stderr | `print(..., file=sys.stderr)` pattern used everywhere | **Reuse** | |
|
||||
| 2.5 | Lexicographic sort | Existing patterns use `sorted(...)` | **Reuse** | |
|
||||
| 3.1 / 3.2 / 3.3 | Compose with existing checks; one exit code | `run_check` already composes (a) and (b) without short-circuit | **Constraint** | Insert (c) at the end of `run_check`, after the per-path block but before the final return. Each check toggles the same `failed` flag. |
|
||||
| 3.4 | `--update-baseline` does not run parity | `main()` short-circuits to `update_baseline()` and never enters `run_check` | **Reuse** | Untouched. |
|
||||
| 3.5 | `--baseline` and `--repo-root` semantics unchanged | `_build_parser` and `_detect_repo_root` | **Reuse** | Untouched. |
|
||||
| 3.6 | Workflow file unchanged | `.github/workflows/i18n-cjk-guard.yml` | **Reuse** | No edit needed. |
|
||||
| 4.1 | Stdlib-only | Existing module is stdlib-only | **Reuse** | `json` is the only library needed for ZH loading. |
|
||||
| 4.2 | Sub-second runtime | ~1k keys; flatten + set diff is O(n) | **Constraint** | Trivially holds. |
|
||||
| 4.3 | Deterministic output | All sorts lexicographic | **Reuse** | |
|
||||
| 5.1–5.4 | Tests under `scripts/ci/tests/` for success / en-only / zh-only / both / scalar-leaves / no-short-circuit | `test_i18n_cjk_guard.py:RunCheckEndToEndTests` is the integration class | **Missing** | Add either a new `ParityCheckTests` class or extend `RunCheckEndToEndTests`. Reuse `_make_full_repo` style; need a `zh_json` argument or a new helper that writes both locale files. |
|
||||
| 6.1 | Guard passes on live catalogues at merge target | EN/ZH parity verified manually (962/962, 0 diff) | **Reuse** | Manual run after implementation. |
|
||||
| 6.2 | Document any blocking divergence in tasks.md | n/a | **Conditional** | Only relevant if 6.1 fails — currently does not. |
|
||||
|
||||
### Complexity signal
|
||||
|
||||
- **Algorithmic logic** only: load two JSON files, recursive flatten, set diff, sort, format, print. No external integrations, no I/O contention, no perf concerns at the catalogue size.
|
||||
|
||||
## Implementation Approach Options
|
||||
|
||||
### Option A — Extend `scripts/ci/i18n_cjk_guard.py` *(recommended)*
|
||||
|
||||
**What changes**:
|
||||
|
||||
- Add private helpers to the existing module:
|
||||
- `_flatten_keys(data) -> set[str]` — wrapper over the existing `_flatten` that returns just the dotted-key set.
|
||||
- `_locate_key_line(text_lines, dotted_key) -> int` — substring scan for the leaf segment (after the last `.`) wrapped in JSON quotes; returns 1 on miss (mirrors `_value_line_number`'s fallback).
|
||||
- `_format_parity_finding(file_rel_path, line_no, dotted_key, side) -> str` — single-line formatter.
|
||||
- Add a function `run_parity_check(repo_root) -> tuple[bool, list[str], str]` returning `(passed, failure_lines, success_summary_line)`. Callable independently for tests.
|
||||
- In `run_check`, after the per-path baseline block and before the final return:
|
||||
- Call `run_parity_check(repo_root)`.
|
||||
- If failed, set `failed = True`, print all failure lines + the `parity: ...` summary to stderr.
|
||||
- If passed, append the success line to `success_summary`.
|
||||
- Add a `ZH_JSON_REL_PATH` constant alongside `EN_JSON_REL_PATH`.
|
||||
|
||||
**Compatibility assessment**:
|
||||
|
||||
- All existing CLI flags, exit codes, and stdout/stderr patterns preserved.
|
||||
- No new top-level dependencies. `json` already imported.
|
||||
- The module grows to ~470 lines, comparable to similar single-purpose CLI scripts in the repo (`oasis_profile_generator.py` is much larger). Single-responsibility is preserved: the responsibility is "PR-time i18n catalogue health," and parity is a sub-instance of that.
|
||||
- Existing tests continue to pass unmodified (none of the changed functions break their contract).
|
||||
|
||||
**Trade-offs**:
|
||||
- ✅ Zero workflow churn, single CI job, single CLI surface.
|
||||
- ✅ Reuses `_flatten`, line-resolution fallback, sort/print idioms.
|
||||
- ✅ All checks fail/pass together — easier to read in CI logs.
|
||||
- ❌ Module name (`i18n_cjk_guard`) is now slightly misleading: it also enforces parity, not just CJK presence. Mitigated by docstring update.
|
||||
|
||||
### Option B — New parallel script `scripts/ci/i18n_locale_parity_guard.py` + new workflow step
|
||||
|
||||
**What changes**:
|
||||
|
||||
- New script that implements the parity check standalone.
|
||||
- Either (i) add a second job to `.github/workflows/i18n-cjk-guard.yml`, or (ii) add a new workflow file `i18n-locale-parity-guard.yml`.
|
||||
- New test file `scripts/ci/tests/test_i18n_locale_parity_guard.py`.
|
||||
|
||||
**Compatibility assessment**:
|
||||
|
||||
- Both scripts duplicate `_flatten`, line-resolution helper, JSON loader, repo-root detection, argparse boilerplate.
|
||||
- Two CI runs (or two steps) to read and ack on every PR.
|
||||
|
||||
**Trade-offs**:
|
||||
- ✅ Single-responsibility script per file (matches one literal reading of project conventions).
|
||||
- ❌ Code duplication ~80 lines.
|
||||
- ❌ Two CI surfaces; PR review fatigue.
|
||||
- ❌ Violates the spirit of R3 ("compose with the existing checks") — composing across two scripts requires either `&&` or two-job aggregation.
|
||||
|
||||
### Option C — Hybrid: new helper module + extended guard
|
||||
|
||||
**What changes**:
|
||||
|
||||
- New module `scripts/ci/locale_parity.py` exposing `compute_parity_findings(en_path, zh_path) -> ParityResult`.
|
||||
- The existing `i18n_cjk_guard.py` imports from it and integrates the call into `run_check`, identical to Option A's runtime behaviour.
|
||||
- Tests split: `test_locale_parity.py` covers the helper in isolation; `test_i18n_cjk_guard.py` gains one composition test.
|
||||
|
||||
**Compatibility assessment**:
|
||||
|
||||
- Adds package-style imports inside `scripts/ci/` (currently flat — `scripts/ci/i18n_cjk_guard.py` adds `_GUARD_DIR` to `sys.path` via the test bootstrap, which works for sibling modules without further config).
|
||||
- No workflow change.
|
||||
|
||||
**Trade-offs**:
|
||||
- ✅ Clean separation, more reusable helper.
|
||||
- ✅ Possible to import the helper from the audit pipeline later (collapsing the duplicate `check_parity.py`).
|
||||
- ❌ More files for what is ~80 lines of new logic; over-engineering for current scope.
|
||||
- ❌ Risks scope creep into "deduplicate `check_parity.py`," which is explicitly out of scope.
|
||||
|
||||
## Effort & Risk
|
||||
|
||||
- **Effort**: **S** (1–2 days). Existing module patterns are mature; the algorithmic logic is small and proven (`check_parity.py`); test scaffolding is already in place.
|
||||
- **Risk**: **Low**. Stdlib-only; no external integrations; no shared mutable state; deterministic algorithm; existing CI workflow unchanged; live catalogues already pass.
|
||||
|
||||
## Recommendations for Design Phase
|
||||
|
||||
### Preferred approach: Option A (extend `scripts/ci/i18n_cjk_guard.py`)
|
||||
|
||||
Rationale:
|
||||
|
||||
1. The existing module's docstring already says "PR-time guard: fail when locales/en.json contains CJK or when backend/app + frontend/src CJK match counts exceed the committed baseline." Extending it to also fail on locale-key parity is the smallest possible delta that also reads naturally in the codebase.
|
||||
2. R3 ("composes with the existing CJK and per-path checks; one CLI; no workflow edit") is satisfied trivially.
|
||||
3. Reuses `_flatten`, line-fallback, sort/print idioms verbatim.
|
||||
4. The module name remains accurate — "CJK Guard" is the canonical name of the i18n PR-time gate; we'll add a docstring note that parity is the third covered check.
|
||||
|
||||
### Key design decisions to settle in `design.md`
|
||||
|
||||
- **Function boundary**: should `run_parity_check` live in the same module or in a small helper module? *Suggest: same module, as a private function alongside `count_path_cjk` / `scan_locale_cjk` for symmetry.*
|
||||
- **Failure line format**: exact string layout (file:line:key:side, ordering of the four pieces, separator characters). *Suggest mirroring `_format_locale_finding` exactly: `f"{file}:{line}: {category}: {key}"` where `category` is `parity-en-only` or `parity-zh-only`.*
|
||||
- **Test fixture for `RunCheckEndToEndTests`**: extend `_make_full_repo` to accept an optional `zh_json` parameter, or add a sibling helper. *Suggest extending — keeps the integration test in one place and lets the existing tests opt out by passing `zh_json=None` (the helper writes a parity-clean default).*
|
||||
- **Whether to expose a `--check=parity` selector**: *Out of scope per R3.1 (no short-circuit, all-or-nothing).*
|
||||
|
||||
### Research items to carry forward
|
||||
|
||||
None. All required information is in the existing repo and the cited reference scripts. No external dependencies, no new tech, no perf research, no security implications.
|
||||
|
|
@ -0,0 +1,96 @@
|
|||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
Epic #11 ("complete english support across ui, agents, logs, and docs") states as acceptance criterion #4: *"For every externalized log message, matching `log.*` keys exist in both `locales/en.json` and `locales/zh.json`."* The wider intent is symmetric: any externalized string introduced into either locale catalogue must have a counterpart in the other, otherwise English users hit fallback keys at runtime (and the inverse for Chinese users).
|
||||
|
||||
Parity holds today (962 keys per side, symmetric difference 0), but no automated check enforces it. The existing CI guard at `scripts/ci/i18n_cjk_guard.py` (workflow `.github/workflows/i18n-cjk-guard.yml`, landed via #26) only enforces (1) zero CJK in `locales/en.json` and (2) a per-path CJK count ratchet for `backend/app` + `frontend/src`. The audit script at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` does compute the symmetric difference, but only as part of a manual audit — it never runs in CI.
|
||||
|
||||
This spec extends the existing PR-time CI guard to enforce locale-key parity permanently. Once shipped, any pull request that introduces a key on only one side will fail CI with a precise list of the offending keys, freezing AC #4 in place for the rest of the epic and beyond.
|
||||
|
||||
## Boundary Context
|
||||
|
||||
- **In scope**:
|
||||
- Symmetric-difference check between flattened dotted-key sets of `locales/en.json` and `locales/zh.json`.
|
||||
- Integration of the new check into the existing `scripts/ci/i18n_cjk_guard.py` so the existing workflow `.github/workflows/i18n-cjk-guard.yml` exercises it without any workflow edit beyond what's strictly necessary.
|
||||
- Test coverage under `scripts/ci/tests/` matching the style of the existing CJK-guard tests.
|
||||
- Failure output formatted so a developer can locate the offending key without further tooling.
|
||||
- **Out of scope**:
|
||||
- Translating any remaining hard-coded strings in `backend/app` or `frontend/src` (tracked under open assigned issues #7, #23, #25).
|
||||
- Value-equality, identical-value, or "review-needed" heuristics from the audit script's `[identical-values]` block — only key presence is asserted here.
|
||||
- Any change to the `locales/` directory layout, schemas, or to `vue-i18n` / `backend/app/utils/locale.py` consumers.
|
||||
- Cross-locale value-shape checks (e.g. matching ICU placeholders).
|
||||
- README, `.env.example`, or documentation updates beyond what's needed inside the spec / guard module itself.
|
||||
- **Adjacent expectations**:
|
||||
- The existing CJK-clean and per-path-ratchet checks in `scripts/ci/i18n_cjk_guard.py` continue to run unchanged and report independently of the new parity check.
|
||||
- The audit pipeline at `.kiro/specs/i18n-e2e-english-verification/audit/scripts/` keeps its own copy of `check_parity.py` for manual deep-dive use; the new CI check does not depend on the audit pipeline being invoked.
|
||||
- All four checks (CJK in en.json, per-path ratchet, en-only keys, zh-only keys) run in a single CI job and surface together; no short-circuit between them.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Locale-key parity check
|
||||
|
||||
**Objective:** As a maintainer of the i18n catalogues, I want a CI check that detects any key present on only one of `locales/en.json` / `locales/zh.json`, so that AC #4 of epic #11 stays satisfied as new strings are added.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The i18n CJK Guard shall load `locales/en.json` and `locales/zh.json` and flatten each into a set of dotted keys whose paths exactly match those produced by `flatten()` in `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py`.
|
||||
2. When the flattened EN and ZH key sets are identical, the i18n CJK Guard shall pass the parity check and emit a single success summary line that includes the shared key count.
|
||||
3. When the flattened EN key set contains any key that is absent from ZH, the i18n CJK Guard shall fail the parity check.
|
||||
4. When the flattened ZH key set contains any key that is absent from EN, the i18n CJK Guard shall fail the parity check.
|
||||
5. The i18n CJK Guard shall treat a leaf whose value is a nested object as a non-leaf (no key emitted) and shall treat a leaf whose value is a non-string scalar (number, boolean, null) the same way it treats a string leaf for parity purposes.
|
||||
|
||||
### Requirement 2: Actionable failure reporting
|
||||
|
||||
**Objective:** As a developer whose PR is failing on parity, I want the failure message to name every offending key and the side it is missing on, so that I can fix the divergence without re-running the audit pipeline.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. If the parity check fails, then the i18n CJK Guard shall print one line per missing key in the form `<locales/en.json|locales/zh.json>:<line>: <dotted-key>: en-only` or `... zh-only`, with `<line>` being the 1-based line number of that key in the source JSON file.
|
||||
2. If a missing key cannot be located in its source file (e.g. owing to JSON formatting), then the i18n CJK Guard shall fall back to line 1 and still print the offending key and side.
|
||||
3. If the parity check fails, then the i18n CJK Guard shall print a final summary line of the form `parity: en-only=<n>, zh-only=<m>` where `<n>` and `<m>` are the counts of en-only and zh-only keys.
|
||||
4. The i18n CJK Guard shall print all parity-related output to stderr.
|
||||
5. The i18n CJK Guard shall sort each side's missing-key list lexicographically so that the failure output is deterministic across environments.
|
||||
|
||||
### Requirement 3: Integration with the existing guard
|
||||
|
||||
**Objective:** As a maintainer extending the CI guard, I want the new parity check to compose with the existing CJK-clean and per-path-ratchet checks rather than replace them, so that all four checks are visible in a single CI run.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The i18n CJK Guard shall execute all of (a) the CJK-clean check on `locales/en.json`, (b) the per-path baseline ratchet on `backend/app` and `frontend/src`, and (c) the new parity check on every invocation of `python scripts/ci/i18n_cjk_guard.py` without short-circuiting between checks.
|
||||
2. When any of (a), (b), or (c) fail, the i18n CJK Guard shall exit with status code 1.
|
||||
3. When all of (a), (b), and (c) pass, the i18n CJK Guard shall exit with status code 0.
|
||||
4. The i18n CJK Guard shall continue to support the `--update-baseline` flag with its existing semantics (refresh per-path counts and exit 0); the parity check shall not run in `--update-baseline` mode.
|
||||
5. The i18n CJK Guard shall continue to support the `--baseline` and `--repo-root` flags with their existing semantics.
|
||||
6. The existing GitHub Actions workflow `.github/workflows/i18n-cjk-guard.yml` shall continue to invoke the guard via the same single command (`python scripts/ci/i18n_cjk_guard.py`), with no new workflow steps required.
|
||||
|
||||
### Requirement 4: Stdlib-only, deterministic, fast
|
||||
|
||||
**Objective:** As a CI operator, I want the parity check to run quickly and without new dependencies, so that the existing 1-minute job timeout still holds.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The i18n CJK Guard shall implement the parity check using only the Python standard library; no new package shall be added to `pyproject.toml`, `requirements*.txt`, or any other dependency manifest.
|
||||
2. The i18n CJK Guard shall complete the parity check in well under one second on the current catalogue size (~1000 keys per side) under normal CI conditions.
|
||||
3. The i18n CJK Guard shall produce identical output for identical inputs across runs (no timestamps, no run IDs, no nondeterministic ordering).
|
||||
|
||||
### Requirement 5: Test coverage
|
||||
|
||||
**Objective:** As a future contributor modifying the guard, I want automated tests for every parity behaviour, so that regressions in either check or in their composition are caught locally.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. The repository shall contain unit tests under `scripts/ci/tests/` that cover at minimum: (a) the success path where EN and ZH have identical key sets, (b) an en-only-key failure, (c) a zh-only-key failure, (d) a both-sides-divergent failure, (e) a leaf-value-type-mismatch case (string vs scalar/null) that does NOT count as a parity failure, and (f) the integration case where the parity check runs alongside the existing CJK-clean and per-path-ratchet checks without short-circuiting.
|
||||
2. The new tests shall use the same testing style and framework already used by the existing tests in `scripts/ci/tests/`.
|
||||
3. When a new test fixture is required for a JSON file, the fixture shall live under `scripts/ci/tests/` in a self-contained form (no reliance on `locales/` content for negative-path tests).
|
||||
4. When the test suite is run from the repository root, the i18n CJK Guard test module shall pass without warnings on a clean checkout where `locales/en.json` and `locales/zh.json` have full key parity.
|
||||
|
||||
### Requirement 6: Self-test against the live catalogues
|
||||
|
||||
**Objective:** As an epic-#11 closer, I want to know the moment this guard ships that it observes the live catalogues as parity-clean, so that the guard's first PR doesn't produce a false alarm.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. While the live catalogues `locales/en.json` and `locales/zh.json` have a symmetric difference of zero on the merge target branch, the i18n CJK Guard shall pass the parity check on a manual run from the repository root.
|
||||
2. If the merge target branch is found to have a non-zero symmetric difference at the time this spec is implemented, then the implementer shall (a) document the divergence in the spec's `tasks.md` as a blocking finding and (b) fix the divergence before completing the implementation tasks, rather than weakening the parity check.
|
||||
|
|
@ -0,0 +1,104 @@
|
|||
# Research & Design Decisions — i18n-locale-parity-guard
|
||||
|
||||
## Summary
|
||||
|
||||
- **Feature**: `i18n-locale-parity-guard`
|
||||
- **Discovery Scope**: Extension (extends an existing single-script CI guard)
|
||||
- **Key Findings**:
|
||||
- The existing PR-time guard `scripts/ci/i18n_cjk_guard.py` already implements the no-short-circuit composition pattern, the JSON-flatten primitive, and the line-fallback line-resolution helper that the new parity check needs to reuse.
|
||||
- The audit pipeline's `check_parity.py` (in `.kiro/specs/i18n-e2e-english-verification/audit/scripts/`) already proves the algorithm: flatten both catalogues into dotted-key sets and compute their symmetric difference. It runs only in the manual audit path; promoting it to CI is a pure plumbing exercise.
|
||||
- The live catalogues at `HEAD` of `main` are parity-clean (962 keys per side, symmetric difference 0), so the new guard's first run will not produce a false alarm and Requirement 6.1 holds out of the gate.
|
||||
|
||||
## Research Log
|
||||
|
||||
### Composition with the existing guard
|
||||
|
||||
- **Context**: Requirement 3 mandates that all checks (CJK-clean, per-path ratchet, parity) run in a single invocation without short-circuit and surface a unified exit code.
|
||||
- **Sources Consulted**: `scripts/ci/i18n_cjk_guard.py:run_check` (lines 220–299).
|
||||
- **Findings**: `run_check` uses a `failed: bool` accumulator and a `success_summary: list[str]` collector, evaluating every block before deciding the exit code. The parity check fits trivially as a third block at the end of `run_check`, before the final `if not failed: print(success_summary)` block.
|
||||
- **Implications**: No structural refactor is needed. The extension is additive.
|
||||
|
||||
### Flatten and key resolution semantics
|
||||
|
||||
- **Context**: Requirement 1.1 anchors the flatten contract to `check_parity.py.flatten`. Requirement 1.5 specifies that scalar leaves and string leaves are treated identically for parity (only dict leaves are skipped).
|
||||
- **Sources Consulted**: `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py:flatten`; `scripts/ci/i18n_cjk_guard.py:_flatten`.
|
||||
- **Findings**: The two implementations are byte-equivalent in behaviour: both descend only into `dict`, both yield `(dotted-path, value)` for any non-dict leaf, both build dotted paths with `.` separators. The guard's existing `_flatten` is suitable; the parity check just consumes its keys (set comprehension over the flattened pairs).
|
||||
- **Implications**: No new flatten function is needed. Requirement 1.1's "exactly match" clause is satisfied by reusing `_flatten`. Add a thin `_flatten_keys(data) -> set[str]` wrapper to keep call sites readable.
|
||||
|
||||
### Line resolution for missing keys
|
||||
|
||||
- **Context**: Requirement 2.1 demands `<file>:<line>: <key>: <side>` output. Requirement 2.2 demands a line-1 fallback when location is unknown.
|
||||
- **Sources Consulted**: `scripts/ci/i18n_cjk_guard.py:_value_line_number` (lines 70–87).
|
||||
- **Findings**: `_value_line_number` resolves a value's line by substring scan with two candidates (raw + JSON-escaped), falling back to line 1. For parity we must resolve a key, not a value. The minimal adaptation is a `_locate_key_line(text_lines, dotted_key)` that searches for the leaf segment of the dotted key wrapped in JSON quotes (e.g. `"missingKey"`). Falling back to line 1 mirrors `_value_line_number`'s contract.
|
||||
- **Implications**: A small new helper is needed; it follows the same code idiom as `_value_line_number`. Edge cases: leaf segments that appear elsewhere in the file (other keys, value text) — accepting a coarse first-match is acceptable because the *primary* signal (the dotted key + side) is unambiguous; the line number is a navigation aid.
|
||||
|
||||
### Stdlib-only enforcement
|
||||
|
||||
- **Context**: Requirement 4.1 prohibits new dependencies.
|
||||
- **Sources Consulted**: `pyproject.toml`, `requirements*.txt` (none at repo root); existing guard imports.
|
||||
- **Findings**: The existing guard imports `argparse`, `json`, `os`, `re`, `subprocess`, `sys`, `pathlib`. Parity needs none beyond `json` and `pathlib` — both already in use.
|
||||
- **Implications**: No `pyproject.toml` change. CI runtime image needs no addition.
|
||||
|
||||
### Live catalogue parity at HEAD
|
||||
|
||||
- **Context**: Requirement 6.1 asserts the guard must pass on the merge target's current state.
|
||||
- **Sources Consulted**: `locales/en.json`, `locales/zh.json` flattened via stdlib `json.loads` + recursive descent.
|
||||
- **Findings**: 962 keys per side, symmetric difference 0. Pre-existing `log.*` namespace fully mirrored (373 keys per side).
|
||||
- **Implications**: No remediation translation work is needed. Requirement 6.2's conditional ("if divergence is found, fix it before completing") does not trigger.
|
||||
|
||||
## Architecture Pattern Evaluation
|
||||
|
||||
| Option | Description | Strengths | Risks / Limitations | Notes |
|
||||
|--------|-------------|-----------|---------------------|-------|
|
||||
| Extend existing guard (Option A — selected) | Add parity helpers + a third block in `run_check` inside `scripts/ci/i18n_cjk_guard.py`; no workflow edit. | Single CI surface; reuses `_flatten`, line-fallback, sort/print idioms; trivially satisfies Requirement 3.6. | Module grows ~80 lines; module name no longer narrowly "CJK" — mitigated by docstring update. | Recommended in `gap-analysis.md`. |
|
||||
| Parallel script + step (Option B) | New `scripts/ci/i18n_locale_parity_guard.py`; either second job in existing workflow or new workflow file. | Tightest single-responsibility per file. | Code duplication (~80 lines); two CI surfaces; violates the spirit of Requirement 3 ("compose with existing checks"). | Rejected. |
|
||||
| Helper module + thin import (Option C) | New `scripts/ci/locale_parity.py`; the existing guard imports it and integrates the call. | Cleaner unit-test isolation; possible future de-duplication of audit `check_parity.py`. | Adds package-style imports for ~80 lines of logic; risks scope creep into "deduplicate audit script" (out of scope). | Rejected. |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Decision: Extend `scripts/ci/i18n_cjk_guard.py` rather than create a new script
|
||||
|
||||
- **Context**: Requirement 3 mandates a single CLI invocation that runs all i18n CI checks together with no short-circuit and one exit code.
|
||||
- **Alternatives Considered**:
|
||||
1. New parallel script + workflow step — duplicates ~80 lines of plumbing.
|
||||
2. New helper module imported by the guard — introduces package structure for trivial logic.
|
||||
- **Selected Approach**: Add `_flatten_keys`, `_locate_key_line`, `_format_parity_finding`, and `run_parity_check` to the existing module; insert a third block into `run_check` after the per-path baseline block.
|
||||
- **Rationale**: Smallest delta that fully satisfies Requirement 3; reuses the existing no-short-circuit accumulator pattern verbatim; no workflow edit (Requirement 3.6 holds for free); existing test scaffolding (`unittest`, synthetic git repos) extends naturally.
|
||||
- **Trade-offs**: The module name (`i18n_cjk_guard`) becomes slightly broader than literal — mitigated by an updated module docstring listing all three checks. Module length grows from ~393 to ~470 lines, still well below the project's de facto threshold for splitting (`oasis_profile_generator.py` exceeds 1000).
|
||||
- **Follow-up**: Update the module docstring; verify `--help` text and existing CLI smoke test still pass after the change.
|
||||
|
||||
### Decision: Treat scalar leaves identically to string leaves for parity
|
||||
|
||||
- **Context**: Requirement 1.5 — `_flatten` does not narrow by type; scalars (numbers, booleans, null) at a leaf must register as keys.
|
||||
- **Alternatives Considered**:
|
||||
1. Narrow to string leaves only (mirror `scan_locale_cjk`'s behaviour). Rejected because a numeric or null value on one side is still a string-on-the-other-side parity question, and the `log.*` namespace today is all strings — there's no payoff in narrowing.
|
||||
2. Skip dict leaves; emit everything else. Selected.
|
||||
- **Selected Approach**: `_flatten_keys(data) -> set[str]` returns every dotted path emitted by the existing `_flatten`, regardless of value type.
|
||||
- **Rationale**: Aligns with the audit script's `flatten` contract (which also does not type-narrow). Catches accidental type drift across catalogues as a side benefit (any divergence at a key surfaces as a missing key).
|
||||
- **Trade-offs**: None significant — the catalogues today are entirely string-typed at leaves; the choice is mostly future-proofing.
|
||||
- **Follow-up**: Add a unit test (Requirement 5.1.e) that plants a scalar-typed leaf on both sides at the same path and asserts the parity check passes.
|
||||
|
||||
### Decision: Failure category strings — `parity-en-only` / `parity-zh-only`
|
||||
|
||||
- **Context**: Requirement 2.1 specifies the format `<file>:<line>: <key>: en-only` (or `... zh-only`). The existing CJK-clean check formats failures as `<file>:<line>: cjk-in-en: <key> = <snippet>`.
|
||||
- **Alternatives Considered**:
|
||||
1. Use bare `en-only` / `zh-only` as the category. Inconsistent with the CJK check's namespaced category (`cjk-in-en`).
|
||||
2. Use namespaced categories `parity-en-only` / `parity-zh-only`. Selected.
|
||||
- **Selected Approach**: Format failure lines as `<en.json|zh.json>:<line>: parity-en-only: <key>` and `... parity-zh-only: <key>` (file is whichever catalogue the missing key would belong to).
|
||||
- **Rationale**: Mirrors the CJK check's `cjk-in-en` category naming, so a dev grepping CI logs for `parity-` finds all parity failures. The bare-side requirement of 2.1 is satisfied because the side appears verbatim after `parity-` (`parity-en-only` contains `en-only`).
|
||||
- **Trade-offs**: Minor verbosity vs. consistency — favour consistency.
|
||||
- **Follow-up**: Tests assert exact substring `parity-en-only` / `parity-zh-only` in failure lines.
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
- **Risk**: A future maintainer renames the existing `_flatten` and the parity check silently breaks. **Mitigation**: A test in the new `ParityCheckTests` class asserts that flattening a known nested fixture produces the expected dotted-key set (locking in the contract).
|
||||
- **Risk**: The `_locate_key_line` helper produces a misleading line number when the leaf segment also appears in another (unrelated) key or in a value. **Mitigation**: First-match on the JSON-quoted leaf is "good enough" for navigation; the dotted key in the message is the source of truth. Document this in the helper's docstring.
|
||||
- **Risk**: Future test writers forget the no-short-circuit invariant when extending `run_check`. **Mitigation**: Requirement 5.1.f's composition test guards this — both the parity check and the existing CJK check fail in the same run, and the test asserts both failure lines appear together.
|
||||
|
||||
## References
|
||||
|
||||
- `scripts/ci/i18n_cjk_guard.py` — existing guard (extension target).
|
||||
- `.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py` — reference parity algorithm.
|
||||
- `.kiro/specs/i18n-ci-guard/design.md` — prior CI guard design (style and boundary precedents).
|
||||
- `scripts/ci/tests/test_i18n_cjk_guard.py` — existing test patterns (extension target).
|
||||
- `.github/workflows/i18n-cjk-guard.yml` — workflow that runs the guard (no edit required).
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
{
|
||||
"feature_name": "i18n-locale-parity-guard",
|
||||
"created_at": "2026-05-09T00:29:21Z",
|
||||
"updated_at": "2026-05-09T00:46:00Z",
|
||||
"language": "en",
|
||||
"phase": "tasks-generated",
|
||||
"approvals": {
|
||||
"requirements": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"design": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"tasks": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
}
|
||||
},
|
||||
"ready_for_implementation": true,
|
||||
"ticket": 11
|
||||
}
|
||||
|
|
@ -0,0 +1,63 @@
|
|||
# Implementation Plan
|
||||
|
||||
- [x] 1. Add parity primitives to the i18n CJK Guard module
|
||||
- Introduce a constant naming the Chinese catalogue path alongside the existing English-catalogue constant.
|
||||
- Add a private helper that returns the dotted-key set of a parsed catalogue, mirroring the audit pipeline's `flatten` contract (descend into dicts only; treat scalar leaves and string leaves identically; type-narrow nothing).
|
||||
- Add a private helper that resolves the 1-based line number of a dotted key in raw JSON source text by searching for the leaf segment wrapped in JSON quotes, and falls back to line 1 on any miss.
|
||||
- Add a private helper that formats a single parity-failure line in the layout `<file>:<line>: parity-en-only: <key>` or `... parity-zh-only: <key>`, with the side parameter typed as a literal of the two allowed strings (improvement carried over from the design review).
|
||||
- Add an immutable result carrier (named tuple or frozen dataclass) holding the parity outcome (passed flag, formatted failure lines including the trailing summary, optional success-summary line).
|
||||
- All additions stay stdlib-only and import nothing new beyond what the existing module already imports.
|
||||
- Observable completion: the module exports the new constant, helpers, and result carrier; importing the module from a Python REPL or test stays warning-free, and the helpers can be exercised in isolation.
|
||||
- _Requirements: 1.1, 1.5, 2.1, 2.2, 4.1, 4.3_
|
||||
- _Boundary: i18n_cjk_guard module — helper layer_
|
||||
|
||||
- [x] 2. Implement the parity-check orchestrator
|
||||
- Read both locale catalogues from the working tree using the existing path constants.
|
||||
- Flatten each catalogue and compute the symmetric difference of the dotted-key sets.
|
||||
- On match, build the success-summary string of the form `OK locale-parity: <count> keys per side`.
|
||||
- On mismatch, sort en-only keys lexicographically and emit one formatted failure line per key with the EN catalogue path and a best-effort line number; then sort zh-only keys lexicographically and emit one line per key with the ZH catalogue path and a best-effort line number.
|
||||
- Append a final summary line of the form `parity: en-only=<n>, zh-only=<m>` to the failure list so the orchestrator can print all lines uniformly.
|
||||
- Treat a missing or malformed catalogue file as a parity failure that returns a single descriptive failure line; if the EN catalogue is the unreadable side, attribute the error to the parity check without re-stating the en-only error already produced by the existing CJK-clean block (refinement carried over from the design review).
|
||||
- All output strings are deterministic across runs for identical inputs.
|
||||
- Observable completion: calling the orchestrator function with synthetic parity-clean and parity-divergent catalogues returns a result carrier whose passed flag, failure list, and success summary match the documented contracts; running it against the live `locales/` directory returns `passed=True`.
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 2.1, 2.3, 2.4, 2.5, 4.2, 4.3_
|
||||
- _Boundary: i18n_cjk_guard module — orchestrator-leaf layer_
|
||||
|
||||
- [x] 3. Compose the parity check into the existing run-check orchestrator
|
||||
- Insert a new block inside the existing `run_check` function, after the per-path-ratchet block and before the final all-success branch.
|
||||
- Invoke the parity-check orchestrator with the working-tree root.
|
||||
- When the result is not passed, set the existing `failed` accumulator to true and print every entry of the result's failure list to stderr, one per call, preserving order.
|
||||
- When the result is passed, append the result's success-summary line to the existing `success_summary` collector so it prints alongside the other success summaries on a fully-clean run.
|
||||
- Update the module docstring to list all three checks (CJK-clean, per-path ratchet, locale-parity).
|
||||
- Leave the CLI argument parser, `--update-baseline`, `--baseline`, `--repo-root`, the workflow file, and the baseline file format untouched. Confirm by visual diff that no other functions or files are modified.
|
||||
- Observable completion: invoking the guard script via its CLI produces a single exit code, and `--help` text plus the existing CLI smoke test continues to pass without modification.
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6_
|
||||
- _Boundary: i18n_cjk_guard module — run_check orchestrator_
|
||||
- _Depends: 2_
|
||||
|
||||
- [x] 4. Add unit and integration tests for the parity check
|
||||
- Extend the existing test-fixture helper that builds synthetic git repositories so callers can supply a Chinese catalogue alongside the English one; default the Chinese catalogue to a parity-clean mirror of the English fixture so the existing test cases continue to pass without semantic change.
|
||||
- Add unit-level tests for the dotted-key flattener (empty input, flat input, mixed scalar/string/null leaves, three-level nesting), the line-number resolver (exact match, multi-occurrence first-wins, not-found line-1 fallback), and the failure-line formatter (both sides, special characters in key names).
|
||||
- Add integration tests against the parity-check orchestrator covering: identical key sets pass; an en-only divergence fails with the expected category token, summary, and line attributing the key to the EN catalogue; a zh-only divergence fails with the symmetric output; a both-sides divergence yields en-only lines first then zh-only lines, each lex-sorted within its group; same-path scalar leaves on both sides do not count as a parity failure; a missing or malformed catalogue file produces a single deterministic failure line.
|
||||
- All new tests use the standard-library testing framework already used in the existing test module; negative-path fixtures are self-contained and do not depend on the live catalogues.
|
||||
- Observable completion: running the test module from the repository root produces a passing run with at least the new test cases reported, and a manually-induced en-only or zh-only key reliably trips the relevant test.
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4_
|
||||
- _Boundary: i18n_cjk_guard test module — parity unit + integration coverage_
|
||||
- _Depends: 3_
|
||||
|
||||
- [x] 5. Add a no-short-circuit composition test covering all three guard checks
|
||||
- Plant CJK content in a synthetic English catalogue AND a parity-divergent key (in either direction) inside the same synthetic repository.
|
||||
- Assert that running the full composed guard returns exit code 1, that stderr contains both the existing CJK-related category token and the new parity category token, and that the order of these blocks is preserved (CJK first, then ratchet, then parity) so failure logs remain greppable.
|
||||
- Assert that on a fully-clean repository (no CJK in EN, ratchet within baseline, parity holds) the composed guard prints all three success summaries on stdout and exits 0.
|
||||
- Observable completion: the new test case fails if any future change short-circuits the orchestrator after the first failure or before invoking the parity check.
|
||||
- _Requirements: 3.1, 3.2, 3.3, 5.1_
|
||||
- _Boundary: i18n_cjk_guard test module — composition coverage_
|
||||
- _Depends: 3, 4_
|
||||
|
||||
- [x] 6. Verify the guard against the live locale catalogues
|
||||
- Run the guard once from the repository root against the live `locales/en.json` and `locales/zh.json` and confirm it exits 0 with three success-summary lines (CJK-clean, per-path ratchet, locale-parity).
|
||||
- If the live catalogues turn out to have non-zero symmetric difference at the time of implementation, document the divergence in this `tasks.md` as a blocking finding and remediate the divergence before completing the task; do not weaken the parity check.
|
||||
- Observable completion: the guard's CLI invocation against the live tree prints `OK locale-parity: <count> keys per side` and exits 0, demonstrating that the new check is satisfied by the merge target without any source change.
|
||||
- _Requirements: 6.1, 6.2_
|
||||
- _Boundary: live `locales/` content (read-only verification)_
|
||||
- _Depends: 5_
|
||||
|
|
@ -6,15 +6,17 @@ Run from the repository root::
|
|||
python scripts/ci/i18n_cjk_guard.py
|
||||
python scripts/ci/i18n_cjk_guard.py --update-baseline
|
||||
|
||||
Two checks always run (no short-circuit):
|
||||
Three checks always run (no short-circuit):
|
||||
|
||||
* ``locales/en.json`` must contain zero CJK characters
|
||||
(range ``U+4E00..U+9FFF``).
|
||||
* CJK match counts under ``backend/app/`` and ``frontend/src/`` must not
|
||||
exceed the committed per-path baseline at
|
||||
``.kiro/specs/i18n-ci-guard/baseline.txt``.
|
||||
* Locale-key parity: every flattened dotted key in ``locales/en.json``
|
||||
must also appear in ``locales/zh.json`` and vice versa.
|
||||
|
||||
Both checks rely on the canonical scan
|
||||
The first two checks rely on the canonical scan
|
||||
``git grep -nIP '[\\x{4e00}-\\x{9fff}]' -- <scoped_path>`` so the guard
|
||||
stays bytewise-aligned with the broader audit pipeline.
|
||||
|
||||
|
|
@ -30,11 +32,13 @@ import re
|
|||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Literal, NamedTuple
|
||||
|
||||
CJK_RE: re.Pattern[str] = re.compile(r"[一-鿿]")
|
||||
CJK_PATTERN: str = r"[\x{4e00}-\x{9fff}]"
|
||||
SCOPED_PATHS: tuple[str, ...] = ("backend/app", "frontend/src")
|
||||
EN_JSON_REL_PATH: str = "locales/en.json"
|
||||
ZH_JSON_REL_PATH: str = "locales/zh.json"
|
||||
DEFAULT_BASELINE_REL_PATH: str = ".kiro/specs/i18n-ci-guard/baseline.txt"
|
||||
SNIPPET_MAX_LEN: int = 80
|
||||
REFRESH_COMMAND: str = "python scripts/ci/i18n_cjk_guard.py --update-baseline"
|
||||
|
|
@ -217,6 +221,168 @@ def _format_regression_line(path: str, baseline: int, current: int) -> str:
|
|||
)
|
||||
|
||||
|
||||
ParitySide = Literal["en-only", "zh-only"]
|
||||
|
||||
|
||||
class ParityResult(NamedTuple):
|
||||
"""Outcome of the locale-key parity check.
|
||||
|
||||
``failure_lines`` is non-empty only when ``passed`` is ``False`` and
|
||||
always ends with the trailing ``parity: en-only=N, zh-only=M``
|
||||
summary line in that case. ``success_summary`` is non-``None`` only
|
||||
when ``passed`` is ``True``.
|
||||
"""
|
||||
|
||||
passed: bool
|
||||
failure_lines: list[str]
|
||||
success_summary: str | None
|
||||
|
||||
|
||||
def _flatten_keys(data: dict[str, object]) -> set[str]:
|
||||
"""Return the set of dotted-key paths of a parsed JSON catalogue.
|
||||
|
||||
Path semantics match
|
||||
``.kiro/specs/i18n-e2e-english-verification/audit/scripts/check_parity.py:flatten``:
|
||||
descend into ``dict`` values only; treat any non-``dict`` value
|
||||
(string, number, bool, ``None``, list) as a leaf and emit its key.
|
||||
Dict-typed parents are not themselves emitted as keys.
|
||||
"""
|
||||
flat: list[tuple[str, object]] = []
|
||||
_flatten("", data, flat)
|
||||
return {key for key, _ in flat}
|
||||
|
||||
|
||||
def _locate_key_line(text_lines: list[str], dotted_key: str) -> int:
|
||||
"""Best-effort 1-based line number for ``dotted_key`` in raw JSON text.
|
||||
|
||||
Searches for the leaf segment of ``dotted_key`` (after the last dot)
|
||||
wrapped in JSON quotes, e.g. ``"missingKey"``. Returns the line of
|
||||
the first match, or ``1`` when no match is found. The dotted key
|
||||
itself remains the source of truth in the failure message; the line
|
||||
number is a navigation aid only.
|
||||
"""
|
||||
leaf = dotted_key.rsplit(".", 1)[-1]
|
||||
needle = f'"{leaf}"'
|
||||
for index, line in enumerate(text_lines, start=1):
|
||||
if needle in line:
|
||||
return index
|
||||
return 1
|
||||
|
||||
|
||||
def _format_parity_finding(
|
||||
file_rel_path: str,
|
||||
line_no: int,
|
||||
dotted_key: str,
|
||||
side: ParitySide,
|
||||
) -> str:
|
||||
"""Format one parity-failure line.
|
||||
|
||||
Layout: ``<file>:<line>: parity-<side>: <dotted-key>``. Side is
|
||||
constrained to ``"en-only"`` / ``"zh-only"`` to keep the failure
|
||||
category greppable across CI logs.
|
||||
"""
|
||||
return f"{file_rel_path}:{line_no}: parity-{side}: {dotted_key}"
|
||||
|
||||
|
||||
def _safe_load_catalogue(
|
||||
path: Path,
|
||||
rel_path: str,
|
||||
failure_lines: list[str],
|
||||
) -> dict[str, object] | None:
|
||||
"""Load a locale catalogue or append a parity-error line and return ``None``.
|
||||
|
||||
Catches missing-file and malformed-JSON errors so the guard reports
|
||||
a clean stderr line rather than crashing CI with a Python traceback.
|
||||
"""
|
||||
try:
|
||||
raw = path.read_text(encoding="utf-8")
|
||||
except (FileNotFoundError, OSError) as exc:
|
||||
failure_lines.append(
|
||||
f"{rel_path}: parity-error: cannot read ({exc.__class__.__name__})"
|
||||
)
|
||||
return None
|
||||
try:
|
||||
data = json.loads(raw)
|
||||
except json.JSONDecodeError as exc:
|
||||
failure_lines.append(
|
||||
f"{rel_path}: parity-error: invalid JSON: {exc.msg}"
|
||||
)
|
||||
return None
|
||||
if not isinstance(data, dict):
|
||||
failure_lines.append(
|
||||
f"{rel_path}: parity-error: top-level value is not an object"
|
||||
)
|
||||
return None
|
||||
return data
|
||||
|
||||
|
||||
def run_parity_check(repo_root: Path) -> ParityResult:
|
||||
"""Compute locale-key parity between ``en.json`` and ``zh.json``.
|
||||
|
||||
Reads both catalogues from ``repo_root``, flattens each into a
|
||||
dotted-key set, and computes the symmetric difference. On match
|
||||
the result carries an ``OK locale-parity: <count> keys per side``
|
||||
success summary. On mismatch the result carries one
|
||||
``parity-en-only`` line per en-only key (lex-sorted), then one
|
||||
``parity-zh-only`` line per zh-only key (lex-sorted), then a final
|
||||
``parity: en-only=<n>, zh-only=<m>`` summary line.
|
||||
|
||||
Missing or malformed catalogues are surfaced as a single
|
||||
``parity-error`` line per offending file and yield a non-passing
|
||||
result without raising.
|
||||
"""
|
||||
en_path = repo_root / EN_JSON_REL_PATH
|
||||
zh_path = repo_root / ZH_JSON_REL_PATH
|
||||
failure_lines: list[str] = []
|
||||
en_data = _safe_load_catalogue(en_path, EN_JSON_REL_PATH, failure_lines)
|
||||
zh_data = _safe_load_catalogue(zh_path, ZH_JSON_REL_PATH, failure_lines)
|
||||
if en_data is None or zh_data is None:
|
||||
return ParityResult(
|
||||
passed=False,
|
||||
failure_lines=failure_lines,
|
||||
success_summary=None,
|
||||
)
|
||||
|
||||
en_keys = _flatten_keys(en_data)
|
||||
zh_keys = _flatten_keys(zh_data)
|
||||
en_only = sorted(en_keys - zh_keys)
|
||||
zh_only = sorted(zh_keys - en_keys)
|
||||
|
||||
if not en_only and not zh_only:
|
||||
return ParityResult(
|
||||
passed=True,
|
||||
failure_lines=[],
|
||||
success_summary=(
|
||||
f"OK locale-parity: {len(en_keys)} keys per side"
|
||||
),
|
||||
)
|
||||
|
||||
en_text_lines = en_path.read_text(encoding="utf-8").splitlines()
|
||||
zh_text_lines = zh_path.read_text(encoding="utf-8").splitlines()
|
||||
for key in en_only:
|
||||
line_no = _locate_key_line(en_text_lines, key)
|
||||
failure_lines.append(
|
||||
_format_parity_finding(
|
||||
EN_JSON_REL_PATH, line_no, key, "en-only"
|
||||
)
|
||||
)
|
||||
for key in zh_only:
|
||||
line_no = _locate_key_line(zh_text_lines, key)
|
||||
failure_lines.append(
|
||||
_format_parity_finding(
|
||||
ZH_JSON_REL_PATH, line_no, key, "zh-only"
|
||||
)
|
||||
)
|
||||
failure_lines.append(
|
||||
f"parity: en-only={len(en_only)}, zh-only={len(zh_only)}"
|
||||
)
|
||||
return ParityResult(
|
||||
passed=False,
|
||||
failure_lines=failure_lines,
|
||||
success_summary=None,
|
||||
)
|
||||
|
||||
|
||||
def run_check(repo_root: Path, baseline_path: Path) -> int:
|
||||
"""Run both guard checks and return the script exit code.
|
||||
|
||||
|
|
@ -292,6 +458,14 @@ def run_check(repo_root: Path, baseline_path: Path) -> int:
|
|||
f"OK per-path counts within baseline ({per_path})"
|
||||
)
|
||||
|
||||
parity_result = run_parity_check(repo_root)
|
||||
if not parity_result.passed:
|
||||
for line in parity_result.failure_lines:
|
||||
print(line, file=sys.stderr)
|
||||
failed = True
|
||||
elif parity_result.success_summary is not None:
|
||||
success_summary.append(parity_result.success_summary)
|
||||
|
||||
if not failed:
|
||||
for line in success_summary:
|
||||
print(line)
|
||||
|
|
@ -323,9 +497,10 @@ def _build_parser() -> argparse.ArgumentParser:
|
|||
parser = argparse.ArgumentParser(
|
||||
prog="i18n_cjk_guard",
|
||||
description=(
|
||||
"PR-time guard: fail when locales/en.json contains CJK or when "
|
||||
"PR-time guard: fail when locales/en.json contains CJK, when "
|
||||
"backend/app + frontend/src CJK match counts exceed the "
|
||||
"committed baseline."
|
||||
"committed baseline, or when locales/en.json and "
|
||||
"locales/zh.json have non-equal flattened-key sets."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
|
|
|
|||
|
|
@ -198,6 +198,7 @@ class RunCheckEndToEndTests(unittest.TestCase):
|
|||
en_json: dict,
|
||||
backend_lines: int,
|
||||
frontend_lines: int,
|
||||
zh_json: dict | None = None,
|
||||
) -> tuple[Path, Path]:
|
||||
repo = _make_repo(tmp)
|
||||
_commit_file(
|
||||
|
|
@ -205,6 +206,12 @@ class RunCheckEndToEndTests(unittest.TestCase):
|
|||
"locales/en.json",
|
||||
json.dumps(en_json, indent=2, ensure_ascii=False),
|
||||
)
|
||||
zh_payload = zh_json if zh_json is not None else en_json
|
||||
_commit_file(
|
||||
repo,
|
||||
"locales/zh.json",
|
||||
json.dumps(zh_payload, indent=2, ensure_ascii=False),
|
||||
)
|
||||
if backend_lines:
|
||||
content = "\n".join(f"# 中{i}" for i in range(backend_lines)) + "\n"
|
||||
_commit_file(repo, "backend/app/x.py", content)
|
||||
|
|
@ -316,6 +323,11 @@ class UpdateBaselineTests(unittest.TestCase):
|
|||
"locales/en.json",
|
||||
json.dumps({"k": "Confirm"}, indent=2),
|
||||
)
|
||||
_commit_file(
|
||||
repo,
|
||||
"locales/zh.json",
|
||||
json.dumps({"k": "Confirm"}, indent=2),
|
||||
)
|
||||
_commit_file(repo, "backend/app/x.py", "# 一\n# 二\n")
|
||||
_commit_file(repo, "frontend/src/.gitkeep", "")
|
||||
baseline_path = repo / "baseline.txt"
|
||||
|
|
@ -354,5 +366,345 @@ class CliSmokeTests(unittest.TestCase):
|
|||
self.assertNotEqual(proc.returncode, 0)
|
||||
|
||||
|
||||
class FlattenKeysTests(unittest.TestCase):
|
||||
"""``_flatten_keys`` returns the dotted-key set of a parsed catalogue."""
|
||||
|
||||
def test_empty_dict_returns_empty_set(self) -> None:
|
||||
self.assertEqual(guard._flatten_keys({}), set())
|
||||
|
||||
def test_flat_dict_returns_top_level_keys(self) -> None:
|
||||
self.assertEqual(
|
||||
guard._flatten_keys({"a": "v", "b": "w"}),
|
||||
{"a", "b"},
|
||||
)
|
||||
|
||||
def test_nested_dict_uses_dot_separator(self) -> None:
|
||||
self.assertEqual(
|
||||
guard._flatten_keys({"a": {"b": {"c": "v"}}}),
|
||||
{"a.b.c"},
|
||||
)
|
||||
|
||||
def test_scalar_leaves_count_as_keys(self) -> None:
|
||||
# Requirement 1.5: scalar leaves (number, bool, null) and string
|
||||
# leaves are treated identically for parity purposes.
|
||||
self.assertEqual(
|
||||
guard._flatten_keys(
|
||||
{
|
||||
"n": 42,
|
||||
"b": True,
|
||||
"s": "x",
|
||||
"z": None,
|
||||
"f": 3.14,
|
||||
}
|
||||
),
|
||||
{"n", "b", "s", "z", "f"},
|
||||
)
|
||||
|
||||
def test_dict_leaf_does_not_become_a_key(self) -> None:
|
||||
# Only non-dict leaves emit keys; the parent path is NOT itself
|
||||
# emitted when it has children.
|
||||
keys = guard._flatten_keys({"parent": {"child": "v"}})
|
||||
self.assertNotIn("parent", keys)
|
||||
self.assertIn("parent.child", keys)
|
||||
|
||||
|
||||
class LocateKeyLineTests(unittest.TestCase):
|
||||
"""``_locate_key_line`` resolves the 1-based line of a dotted key."""
|
||||
|
||||
def test_returns_line_number_of_quoted_leaf_segment(self) -> None:
|
||||
text_lines = [
|
||||
"{",
|
||||
' "a": {',
|
||||
' "missingKey": "v"',
|
||||
" }",
|
||||
"}",
|
||||
]
|
||||
self.assertEqual(
|
||||
guard._locate_key_line(text_lines, "a.missingKey"),
|
||||
3,
|
||||
)
|
||||
|
||||
def test_first_match_wins(self) -> None:
|
||||
text_lines = [
|
||||
"{",
|
||||
' "k": "first"',
|
||||
' "k": "second"',
|
||||
"}",
|
||||
]
|
||||
self.assertEqual(guard._locate_key_line(text_lines, "k"), 2)
|
||||
|
||||
def test_missing_key_falls_back_to_line_one(self) -> None:
|
||||
text_lines = ["{", ' "other": "v"', "}"]
|
||||
self.assertEqual(guard._locate_key_line(text_lines, "absent"), 1)
|
||||
|
||||
|
||||
class FormatParityFindingTests(unittest.TestCase):
|
||||
"""``_format_parity_finding`` produces canonical parity-failure lines."""
|
||||
|
||||
def test_en_only_layout(self) -> None:
|
||||
line = guard._format_parity_finding(
|
||||
"locales/en.json", 17, "common.foo", "en-only"
|
||||
)
|
||||
self.assertEqual(
|
||||
line, "locales/en.json:17: parity-en-only: common.foo"
|
||||
)
|
||||
|
||||
def test_zh_only_layout(self) -> None:
|
||||
line = guard._format_parity_finding(
|
||||
"locales/zh.json", 5, "log.api.bar", "zh-only"
|
||||
)
|
||||
self.assertEqual(
|
||||
line, "locales/zh.json:5: parity-zh-only: log.api.bar"
|
||||
)
|
||||
|
||||
|
||||
class RunParityCheckTests(unittest.TestCase):
|
||||
"""``run_parity_check`` returns a ``ParityResult`` for the live tree."""
|
||||
|
||||
def _write_catalogues(
|
||||
self,
|
||||
repo: Path,
|
||||
en_payload: dict,
|
||||
zh_payload: dict,
|
||||
) -> None:
|
||||
(repo / "locales").mkdir(parents=True, exist_ok=True)
|
||||
(repo / "locales" / "en.json").write_text(
|
||||
json.dumps(en_payload, indent=2, ensure_ascii=False) + "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
(repo / "locales" / "zh.json").write_text(
|
||||
json.dumps(zh_payload, indent=2, ensure_ascii=False) + "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
def test_passes_when_keys_match(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = Path(tmp)
|
||||
payload = {"common": {"a": "A", "b": "B"}, "k": "v"}
|
||||
self._write_catalogues(repo, payload, payload)
|
||||
result = guard.run_parity_check(repo)
|
||||
self.assertTrue(result.passed)
|
||||
self.assertEqual(result.failure_lines, [])
|
||||
self.assertIsNotNone(result.success_summary)
|
||||
self.assertIn(
|
||||
"OK locale-parity:", result.success_summary or ""
|
||||
)
|
||||
# Three flattened keys: common.a, common.b, k.
|
||||
self.assertIn("3 keys per side", result.success_summary or "")
|
||||
|
||||
def test_fails_on_en_only_key(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = Path(tmp)
|
||||
self._write_catalogues(
|
||||
repo,
|
||||
{"k": "v", "extra": "only-en"},
|
||||
{"k": "v"},
|
||||
)
|
||||
result = guard.run_parity_check(repo)
|
||||
self.assertFalse(result.passed)
|
||||
self.assertTrue(
|
||||
any(
|
||||
"parity-en-only: extra" in line
|
||||
for line in result.failure_lines
|
||||
),
|
||||
result.failure_lines,
|
||||
)
|
||||
self.assertEqual(
|
||||
result.failure_lines[-1],
|
||||
"parity: en-only=1, zh-only=0",
|
||||
)
|
||||
self.assertIsNone(result.success_summary)
|
||||
|
||||
def test_fails_on_zh_only_key(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = Path(tmp)
|
||||
self._write_catalogues(
|
||||
repo,
|
||||
{"k": "v"},
|
||||
{"k": "v", "extra": "only-zh"},
|
||||
)
|
||||
result = guard.run_parity_check(repo)
|
||||
self.assertFalse(result.passed)
|
||||
self.assertTrue(
|
||||
any(
|
||||
"parity-zh-only: extra" in line
|
||||
for line in result.failure_lines
|
||||
),
|
||||
result.failure_lines,
|
||||
)
|
||||
self.assertEqual(
|
||||
result.failure_lines[-1],
|
||||
"parity: en-only=0, zh-only=1",
|
||||
)
|
||||
|
||||
def test_fails_on_two_sided_divergence_with_en_first(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = Path(tmp)
|
||||
self._write_catalogues(
|
||||
repo,
|
||||
{"a": "v", "z": "v", "shared": "v"},
|
||||
{"b": "v", "y": "v", "shared": "v"},
|
||||
)
|
||||
result = guard.run_parity_check(repo)
|
||||
self.assertFalse(result.passed)
|
||||
categories = [
|
||||
"en-only" if "parity-en-only" in line else
|
||||
"zh-only" if "parity-zh-only" in line else
|
||||
"summary"
|
||||
for line in result.failure_lines
|
||||
]
|
||||
# All en-only lines come before all zh-only lines, and the
|
||||
# summary is last.
|
||||
self.assertEqual(
|
||||
categories,
|
||||
[
|
||||
"en-only", "en-only",
|
||||
"zh-only", "zh-only",
|
||||
"summary",
|
||||
],
|
||||
result.failure_lines,
|
||||
)
|
||||
# Within each side keys appear lexicographically.
|
||||
en_only_lines = [
|
||||
line for line in result.failure_lines
|
||||
if "parity-en-only" in line
|
||||
]
|
||||
zh_only_lines = [
|
||||
line for line in result.failure_lines
|
||||
if "parity-zh-only" in line
|
||||
]
|
||||
self.assertTrue(en_only_lines[0].endswith(": a"))
|
||||
self.assertTrue(en_only_lines[1].endswith(": z"))
|
||||
self.assertTrue(zh_only_lines[0].endswith(": b"))
|
||||
self.assertTrue(zh_only_lines[1].endswith(": y"))
|
||||
self.assertEqual(
|
||||
result.failure_lines[-1],
|
||||
"parity: en-only=2, zh-only=2",
|
||||
)
|
||||
|
||||
def test_passes_with_scalar_leaves_at_same_path(self) -> None:
|
||||
# Requirement 1.5: scalar leaves at the same dotted path on both
|
||||
# sides do not count as a parity divergence.
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = Path(tmp)
|
||||
self._write_catalogues(
|
||||
repo,
|
||||
{"flag": True, "count": 42, "label": "x", "missing": None},
|
||||
{"flag": False, "count": 7, "label": "y", "missing": None},
|
||||
)
|
||||
result = guard.run_parity_check(repo)
|
||||
self.assertTrue(result.passed)
|
||||
|
||||
def test_missing_zh_catalogue_fails(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo = Path(tmp)
|
||||
(repo / "locales").mkdir(parents=True)
|
||||
(repo / "locales" / "en.json").write_text(
|
||||
'{"k": "v"}\n', encoding="utf-8"
|
||||
)
|
||||
# zh.json deliberately not written.
|
||||
result = guard.run_parity_check(repo)
|
||||
self.assertFalse(result.passed)
|
||||
self.assertTrue(
|
||||
any(
|
||||
"locales/zh.json" in line and "parity-error" in line
|
||||
for line in result.failure_lines
|
||||
),
|
||||
result.failure_lines,
|
||||
)
|
||||
|
||||
|
||||
class RunCheckParityCompositionTests(unittest.TestCase):
|
||||
"""End-to-end: ``run_check`` composes CJK, ratchet, and parity."""
|
||||
|
||||
def _make_repo(
|
||||
self,
|
||||
tmp: Path,
|
||||
*,
|
||||
en_json: dict,
|
||||
zh_json: dict | None = None,
|
||||
backend_lines: int = 0,
|
||||
) -> tuple[Path, Path]:
|
||||
repo = _make_repo(tmp)
|
||||
_commit_file(
|
||||
repo,
|
||||
"locales/en.json",
|
||||
json.dumps(en_json, indent=2, ensure_ascii=False),
|
||||
)
|
||||
zh_payload = zh_json if zh_json is not None else en_json
|
||||
_commit_file(
|
||||
repo,
|
||||
"locales/zh.json",
|
||||
json.dumps(zh_payload, indent=2, ensure_ascii=False),
|
||||
)
|
||||
if backend_lines:
|
||||
content = (
|
||||
"\n".join(f"# 中{i}" for i in range(backend_lines)) + "\n"
|
||||
)
|
||||
_commit_file(repo, "backend/app/x.py", content)
|
||||
else:
|
||||
_commit_file(repo, "backend/app/.gitkeep", "")
|
||||
_commit_file(repo, "frontend/src/.gitkeep", "")
|
||||
baseline_path = repo / "baseline.txt"
|
||||
return repo, baseline_path
|
||||
|
||||
def test_clean_repo_emits_three_success_summaries(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo, baseline_path = self._make_repo(
|
||||
Path(tmp),
|
||||
en_json={"k": "Confirm"},
|
||||
)
|
||||
guard.write_baseline(
|
||||
baseline_path,
|
||||
{"backend/app": 0, "frontend/src": 0},
|
||||
)
|
||||
from io import StringIO
|
||||
|
||||
captured_out = StringIO()
|
||||
old_out = sys.stdout
|
||||
sys.stdout = captured_out
|
||||
try:
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
finally:
|
||||
sys.stdout = old_out
|
||||
self.assertEqual(rc, 0)
|
||||
stdout = captured_out.getvalue()
|
||||
self.assertIn("OK locales/en.json is CJK-clean", stdout)
|
||||
self.assertIn("OK per-path counts within baseline", stdout)
|
||||
self.assertIn("OK locale-parity:", stdout)
|
||||
|
||||
def test_no_short_circuit_on_combined_failures(self) -> None:
|
||||
# Plant CJK in en.json AND a parity divergence so that BOTH
|
||||
# the existing CJK-clean check and the new parity check fail
|
||||
# in the same run. The orchestrator must run both blocks
|
||||
# without short-circuiting; both failure tokens must surface
|
||||
# in stderr together.
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
repo, baseline_path = self._make_repo(
|
||||
Path(tmp),
|
||||
en_json={"k": "Confirm", "extra": "中文"},
|
||||
zh_json={"k": "Confirm"},
|
||||
)
|
||||
guard.write_baseline(
|
||||
baseline_path,
|
||||
{"backend/app": 0, "frontend/src": 0},
|
||||
)
|
||||
from io import StringIO
|
||||
|
||||
captured_err = StringIO()
|
||||
old_err = sys.stderr
|
||||
sys.stderr = captured_err
|
||||
try:
|
||||
rc = guard.run_check(repo, baseline_path)
|
||||
finally:
|
||||
sys.stderr = old_err
|
||||
self.assertEqual(rc, 1)
|
||||
err = captured_err.getvalue()
|
||||
# Both check categories must surface.
|
||||
self.assertIn("cjk-in-en", err)
|
||||
self.assertIn("parity-en-only: extra", err)
|
||||
self.assertIn("parity: en-only=1, zh-only=0", err)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
|
|
|||
Loading…
Reference in New Issue