8.9 KiB
Requirements Document
Project Description (Input)
Add a permanent CI guard that runs an i18n CJK audit on every pull request.
Linked GitHub issue: #26 (.ticket/26.md).
The guard must fail a PR build when:
- locales/en.json contains any CJK character (range U+4E00..U+9FFF), or
- The total count of CJK matches across backend/app/ and frontend/src/ regresses (i.e. exceeds) a committed baseline value.
Introduction
The i18n initiative has driven the project toward English-by-default UI, logs,
prompts, and documentation. Manual audits (see PR #27, the
i18n-e2e-english-verification spec) have repeatedly surfaced regressions
where Chinese strings re-enter the codebase. This spec installs a permanent,
self-contained CI guard that runs on every pull request and fails the build
when (a) locales/en.json is no longer CJK-clean, or (b) the total CJK match
count under backend/app/ and frontend/src/ regresses against a committed
baseline.
The guard is intentionally minimal: it captures the two highest-signal checks from the larger audit pipeline so it can run on every PR with a sub-minute budget and without depending on the (currently unmerged) verification spec. The committed baseline lets the project ratchet down gaps over time without blocking unrelated PRs on pre-existing CJK content.
Boundary Context
- In scope:
- A locally runnable Python script that performs both guard checks on the current working tree.
- A baseline file committed under the spec directory recording the accepted CJK match counts per scoped path.
- A GitHub Actions workflow that runs the script on every pull request
targeting
mainand fails the build when either check fails. - A clear, actionable failure message (which path regressed, baseline value, current value, command to update the baseline).
- Out of scope:
- The full classification pipeline (
classify.py,render_report.py,post_comment.sh) from the unmergedi18n-e2e-english-verificationspec — those scripts perform deeper audit work and are not required for the PR-time guard. - Auto-updating the baseline on
main(the baseline is a normal reviewable file). - Translation work itself; this spec only enforces a regression gate.
- Any change to production source under
backend/app/,frontend/src/, orlocales/apart from translations needed to satisfy the guard against its own initial baseline.
- The full classification pipeline (
- Adjacent expectations:
- PR #27 (
chore/i18n-10-e2e-english-verification) provides the methodology referenced here. This spec must remain functional whether PR #27 has been merged or not. - The guard reuses the canonical CJK regex range
[一-鿿]already established by that audit.
- PR #27 (
Requirements
Requirement 1: Locale-catalogue CJK cleanliness check
Objective: As a maintainer of the English locale catalogue, I want every
PR to fail when locales/en.json reintroduces any CJK character, so that the
English catalogue stays CJK-free.
Acceptance Criteria
- When the guard script is run from the repository root, the i18n CI Guard
shall scan the contents of
locales/en.jsonfor any character in the rangeU+4E00..U+9FFF. - If
locales/en.jsoncontains at least one such character, the i18n CI Guard shall exit with a non-zero status and report each offendingkey:linepair on standard output. - While
locales/en.jsoncontains zero such characters, the i18n CI Guard shall report the catalogue as CJK-clean. - If
locales/en.jsonis missing or unreadable, the i18n CI Guard shall exit with a non-zero status and emit an explicit error message naming the missing file.
Requirement 2: Backend/frontend CJK regression check against committed baseline
Objective: As a maintainer of English support across the codebase, I
want every PR to fail when the total CJK match count under backend/app/
or frontend/src/ exceeds a committed baseline, so that the codebase
ratchets monotonically toward English-only without blocking PRs on
pre-existing CJK content.
Acceptance Criteria
- When the guard script is run, the i18n CI Guard shall count the total
number of CJK matches (range
U+4E00..U+9FFF, line-level, text files only) under each of the scoped pathsbackend/app/andfrontend/src/. - The i18n CI Guard shall read the baseline counts from a single committed baseline file under the spec directory.
- If the current count for any scoped path exceeds the baseline count for that path, the i18n CI Guard shall exit with a non-zero status.
- While the current count for every scoped path is less than or equal to the baseline, the i18n CI Guard shall exit with status zero for this check.
- The i18n CI Guard shall ignore matches inside binary files
(image, font, archive, lockfile, or other non-text formats) by relying
on
git grep -Isemantics. - The i18n CI Guard shall scope its scan to tracked files only (matches in untracked or ignored files shall not contribute to the count).
Requirement 3: Actionable failure messaging
Objective: As a contributor whose PR was rejected by the guard, I want the failure message to tell me exactly what regressed and how to fix it, so that I can either translate the offending content or — when intentional — update the baseline through normal review.
Acceptance Criteria
- If the locale-catalogue check fails, the i18n CI Guard shall print, for
each offending entry: the dotted catalogue key, the line number in
locales/en.json, and a truncated snippet of the value. - If the regression check fails, the i18n CI Guard shall print, for each regressed scoped path: the path name, the baseline count, the current count, and the delta.
- If the regression check fails, the i18n CI Guard shall print the exact shell command a contributor must run locally to refresh the baseline file so the PR can be re-reviewed against the new value.
- The i18n CI Guard shall print, on success, a one-line summary per check confirming the catalogue is CJK-clean and the per-path counts are at or below baseline.
Requirement 4: Baseline file lifecycle
Objective: As a reviewer enforcing English support, I want the baseline to live in the repository as a small, human-readable file that only changes through code review, so that downward ratcheting is intentional and auditable.
Acceptance Criteria
- The i18n CI Guard shall store the baseline as a single committed file
under
.kiro/specs/i18n-ci-guard/. - The baseline file shall record one count per scoped path, in a stable, diff-friendly text format (no JSON line shuffling, no trailing whitespace).
- When the guard script is invoked with an explicit "refresh baseline" subcommand or flag, the i18n CI Guard shall overwrite the baseline file with the current per-path counts and exit with status zero.
- While no refresh flag is supplied, the i18n CI Guard shall never modify the baseline file.
- If the baseline file is missing at check time, the i18n CI Guard shall exit with a non-zero status and instruct the contributor to refresh it.
Requirement 5: GitHub Actions PR integration
Objective: As a project maintainer, I want every pull request targeting
main to be gated by the guard, so that no merge silently regresses the
English-only state of the catalogue or codebase.
Acceptance Criteria
- The i18n CI Guard workflow shall trigger on every
pull_requestevent whose base ref ismain. - While the workflow runs, the i18n CI Guard shall check out the PR head
commit with full history sufficient for
git grepto scan tracked files. - When the guard script exits with non-zero status, the workflow shall fail and surface the script's standard output and standard error in the GitHub Actions log.
- When the guard script exits with status zero, the workflow shall pass.
- The workflow shall use only Python from the standard
actions/setup-pythondistribution and tools already available on the GitHub-hostedubuntu-latestrunner (bash,git); it shall not install third-party Python packages. - The workflow shall complete within sixty seconds of wall-clock time on
a clean
ubuntu-latestrunner.
Requirement 6: Local reproducibility
Objective: As a developer preparing a PR, I want to run the same guard locally before pushing, so that I can catch regressions before CI does.
Acceptance Criteria
- When the guard script is invoked from a developer machine that has
Python 3.11 or newer and
gitavailable, the i18n CI Guard shall produce the same pass/fail result and the same per-path counts that it would produce in CI for the same working tree. - The i18n CI Guard shall expose a single, stable invocation entry point
(a script under
scripts/ci/) documented in the spec's design and README touchpoints. - The i18n CI Guard shall require zero environment variables or secrets to run locally.