190 lines
8.9 KiB
Markdown
190 lines
8.9 KiB
Markdown
# Requirements Document
|
|
|
|
## Project Description (Input)
|
|
Add a permanent CI guard that runs an i18n CJK audit on every pull request.
|
|
|
|
Linked GitHub issue: #26 (.ticket/26.md).
|
|
|
|
The guard must fail a PR build when:
|
|
1. locales/en.json contains any CJK character (range U+4E00..U+9FFF), or
|
|
2. The total count of CJK matches across backend/app/ and frontend/src/ regresses (i.e. exceeds) a committed baseline value.
|
|
|
|
## Introduction
|
|
|
|
The i18n initiative has driven the project toward English-by-default UI, logs,
|
|
prompts, and documentation. Manual audits (see PR #27, the
|
|
`i18n-e2e-english-verification` spec) have repeatedly surfaced regressions
|
|
where Chinese strings re-enter the codebase. This spec installs a permanent,
|
|
self-contained CI guard that runs on every pull request and fails the build
|
|
when (a) `locales/en.json` is no longer CJK-clean, or (b) the total CJK match
|
|
count under `backend/app/` and `frontend/src/` regresses against a committed
|
|
baseline.
|
|
|
|
The guard is intentionally minimal: it captures the two highest-signal checks
|
|
from the larger audit pipeline so it can run on every PR with a sub-minute
|
|
budget and without depending on the (currently unmerged) verification spec.
|
|
The committed baseline lets the project ratchet down gaps over time without
|
|
blocking unrelated PRs on pre-existing CJK content.
|
|
|
|
## Boundary Context
|
|
|
|
- **In scope**:
|
|
- A locally runnable Python script that performs both guard checks on the
|
|
current working tree.
|
|
- A baseline file committed under the spec directory recording the
|
|
accepted CJK match counts per scoped path.
|
|
- A GitHub Actions workflow that runs the script on every pull request
|
|
targeting `main` and fails the build when either check fails.
|
|
- A clear, actionable failure message (which path regressed, baseline
|
|
value, current value, command to update the baseline).
|
|
- **Out of scope**:
|
|
- The full classification pipeline (`classify.py`, `render_report.py`,
|
|
`post_comment.sh`) from the unmerged `i18n-e2e-english-verification`
|
|
spec — those scripts perform deeper audit work and are not required
|
|
for the PR-time guard.
|
|
- Auto-updating the baseline on `main` (the baseline is a normal
|
|
reviewable file).
|
|
- Translation work itself; this spec only enforces a regression gate.
|
|
- Any change to production source under `backend/app/`, `frontend/src/`,
|
|
or `locales/` apart from translations needed to satisfy the guard
|
|
against its own initial baseline.
|
|
- **Adjacent expectations**:
|
|
- PR #27 (`chore/i18n-10-e2e-english-verification`) provides the
|
|
methodology referenced here. This spec must remain functional whether
|
|
PR #27 has been merged or not.
|
|
- The guard reuses the canonical CJK regex range
|
|
`[一-鿿]` already established by that audit.
|
|
|
|
## Requirements
|
|
|
|
### Requirement 1: Locale-catalogue CJK cleanliness check
|
|
|
|
**Objective:** As a maintainer of the English locale catalogue, I want every
|
|
PR to fail when `locales/en.json` reintroduces any CJK character, so that the
|
|
English catalogue stays CJK-free.
|
|
|
|
#### Acceptance Criteria
|
|
|
|
1. When the guard script is run from the repository root, the i18n CI Guard
|
|
shall scan the contents of `locales/en.json` for any character in the
|
|
range `U+4E00..U+9FFF`.
|
|
2. If `locales/en.json` contains at least one such character, the i18n CI
|
|
Guard shall exit with a non-zero status and report each offending
|
|
`key:line` pair on standard output.
|
|
3. While `locales/en.json` contains zero such characters, the i18n CI Guard
|
|
shall report the catalogue as CJK-clean.
|
|
4. If `locales/en.json` is missing or unreadable, the i18n CI Guard shall
|
|
exit with a non-zero status and emit an explicit error message naming
|
|
the missing file.
|
|
|
|
### Requirement 2: Backend/frontend CJK regression check against committed baseline
|
|
|
|
**Objective:** As a maintainer of English support across the codebase, I
|
|
want every PR to fail when the total CJK match count under `backend/app/`
|
|
or `frontend/src/` exceeds a committed baseline, so that the codebase
|
|
ratchets monotonically toward English-only without blocking PRs on
|
|
pre-existing CJK content.
|
|
|
|
#### Acceptance Criteria
|
|
|
|
1. When the guard script is run, the i18n CI Guard shall count the total
|
|
number of CJK matches (range `U+4E00..U+9FFF`, line-level, text files
|
|
only) under each of the scoped paths `backend/app/` and `frontend/src/`.
|
|
2. The i18n CI Guard shall read the baseline counts from a single
|
|
committed baseline file under the spec directory.
|
|
3. If the current count for any scoped path exceeds the baseline count for
|
|
that path, the i18n CI Guard shall exit with a non-zero status.
|
|
4. While the current count for every scoped path is less than or equal to
|
|
the baseline, the i18n CI Guard shall exit with status zero for this
|
|
check.
|
|
5. The i18n CI Guard shall ignore matches inside binary files
|
|
(image, font, archive, lockfile, or other non-text formats) by relying
|
|
on `git grep -I` semantics.
|
|
6. The i18n CI Guard shall scope its scan to tracked files only (matches
|
|
in untracked or ignored files shall not contribute to the count).
|
|
|
|
### Requirement 3: Actionable failure messaging
|
|
|
|
**Objective:** As a contributor whose PR was rejected by the guard, I want
|
|
the failure message to tell me exactly what regressed and how to fix it,
|
|
so that I can either translate the offending content or — when intentional —
|
|
update the baseline through normal review.
|
|
|
|
#### Acceptance Criteria
|
|
|
|
1. If the locale-catalogue check fails, the i18n CI Guard shall print, for
|
|
each offending entry: the dotted catalogue key, the line number in
|
|
`locales/en.json`, and a truncated snippet of the value.
|
|
2. If the regression check fails, the i18n CI Guard shall print, for each
|
|
regressed scoped path: the path name, the baseline count, the current
|
|
count, and the delta.
|
|
3. If the regression check fails, the i18n CI Guard shall print the exact
|
|
shell command a contributor must run locally to refresh the baseline
|
|
file so the PR can be re-reviewed against the new value.
|
|
4. The i18n CI Guard shall print, on success, a one-line summary per check
|
|
confirming the catalogue is CJK-clean and the per-path counts are at or
|
|
below baseline.
|
|
|
|
### Requirement 4: Baseline file lifecycle
|
|
|
|
**Objective:** As a reviewer enforcing English support, I want the baseline
|
|
to live in the repository as a small, human-readable file that only changes
|
|
through code review, so that downward ratcheting is intentional and
|
|
auditable.
|
|
|
|
#### Acceptance Criteria
|
|
|
|
1. The i18n CI Guard shall store the baseline as a single committed file
|
|
under `.kiro/specs/i18n-ci-guard/`.
|
|
2. The baseline file shall record one count per scoped path, in a stable,
|
|
diff-friendly text format (no JSON line shuffling, no trailing
|
|
whitespace).
|
|
3. When the guard script is invoked with an explicit "refresh baseline"
|
|
subcommand or flag, the i18n CI Guard shall overwrite the baseline file
|
|
with the current per-path counts and exit with status zero.
|
|
4. While no refresh flag is supplied, the i18n CI Guard shall never modify
|
|
the baseline file.
|
|
5. If the baseline file is missing at check time, the i18n CI Guard shall
|
|
exit with a non-zero status and instruct the contributor to refresh it.
|
|
|
|
### Requirement 5: GitHub Actions PR integration
|
|
|
|
**Objective:** As a project maintainer, I want every pull request targeting
|
|
`main` to be gated by the guard, so that no merge silently regresses the
|
|
English-only state of the catalogue or codebase.
|
|
|
|
#### Acceptance Criteria
|
|
|
|
1. The i18n CI Guard workflow shall trigger on every `pull_request` event
|
|
whose base ref is `main`.
|
|
2. While the workflow runs, the i18n CI Guard shall check out the PR head
|
|
commit with full history sufficient for `git grep` to scan tracked
|
|
files.
|
|
3. When the guard script exits with non-zero status, the workflow shall
|
|
fail and surface the script's standard output and standard error in the
|
|
GitHub Actions log.
|
|
4. When the guard script exits with status zero, the workflow shall pass.
|
|
5. The workflow shall use only Python from the standard
|
|
`actions/setup-python` distribution and tools already available on the
|
|
GitHub-hosted `ubuntu-latest` runner (`bash`, `git`); it shall not
|
|
install third-party Python packages.
|
|
6. The workflow shall complete within sixty seconds of wall-clock time on
|
|
a clean `ubuntu-latest` runner.
|
|
|
|
### Requirement 6: Local reproducibility
|
|
|
|
**Objective:** As a developer preparing a PR, I want to run the same guard
|
|
locally before pushing, so that I can catch regressions before CI does.
|
|
|
|
#### Acceptance Criteria
|
|
|
|
1. When the guard script is invoked from a developer machine that has
|
|
Python 3.11 or newer and `git` available, the i18n CI Guard shall
|
|
produce the same pass/fail result and the same per-path counts that
|
|
it would produce in CI for the same working tree.
|
|
2. The i18n CI Guard shall expose a single, stable invocation entry point
|
|
(a script under `scripts/ci/`) documented in the spec's design and
|
|
README touchpoints.
|
|
3. The i18n CI Guard shall require zero environment variables or secrets
|
|
to run locally.
|