gstack/ios-qa/SKILL.md.tmpl

222 lines
9.9 KiB
Cheetah

---
name: ios-qa
preamble-tier: 3
version: 1.0.0
description: |
Live-device iOS QA for SwiftUI apps. Connects to a real iPhone via USB
CoreDevice IPv6 tunnel, reads Swift source to understand every screen, then
runs a vision-driven agent loop: screenshot → analyze → decide → act →
verify → repeat. All interaction happens via HTTP to an embedded
StateServer in the app under test. Optionally exposes the device over
Tailscale so remote agents (OpenClaw, Codex, any HTTP-capable agent) can
run iOS QA from anywhere without touching the hardware.
Use when asked to "ios qa", "test my iPhone app", "find bugs on the device",
or "qa the iOS app". (gstack)
voice-triggers:
- "iOS quality check"
- "test the iPhone app"
- "run iOS QA"
allowed-tools:
- Bash
- Read
- Write
- Edit
- Grep
- Glob
- AskUserQuestion
triggers:
- ios qa
- test the iphone app
- test my ios app
- find bugs on the device
- qa the ios app
---
{{PREAMBLE}}
# Live-device iOS QA
This skill drives a real iPhone via USB. The agent reads your Swift source,
generates typed state accessors, deploys a debug bridge, and runs a closed
find→fix→verify loop. No simulator, no XCTest, no WebDriverAgent.
## Architecture
```
┌──────────────────────┐ USB CoreDevice (IPv6) ┌──────────────────┐
│ gstack-ios-qa daemon │ ────────────────────────▶ │ iOS app │
│ (Mac, bun/TS) │ bearer + X-Session-Id │ StateServer │
│ │ │ (loopback only) │
│ - boot token rotate │ │ - /tap /swipe │
│ - session minting │ │ - /type /state │
│ - audit + redact │ │ - /snapshot │
└──────────────────────┘ └──────────────────┘
│ Tailscale (optional, --tailnet)
┌──────────────────────┐
│ Remote agent │
│ (OpenClaw, etc.) │
└──────────────────────┘
```
The iOS app's `StateServer` binds loopback only (`::1` + `127.0.0.1`). Tailnet
ingress is exclusively the Mac daemon's job. The daemon validates Tailscale
identities via the local `tailscaled` socket and mints short-lived session
tokens (default 1h) for remote agents.
## Prerequisites
- macOS (the daemon uses `devicectl` from Xcode).
- iPhone connected via USB, paired and trusted.
- Xcode + Swift toolchain installed (`swift --version` reports >= 5.9).
- App source available on disk, with at least one `@Observable` class.
- For remote-control mode: Tailscale installed and the user logged in.
## Phase 0: Session warm-start (optional)
If `~/.gstack/ios-qa-session.json` exists and the device is still connected,
skip Phase 1-2 and jump to Phase 3. The session cache holds the rotated token,
UDID, tunnel address, and accessor hash. Invalidate the cache when:
- The user passes `--cold` to force a full bootstrap.
- The accessor hash mismatch is detected on first state query.
- The daemon reports the cached UDID is no longer connected.
```bash
SESSION="$HOME/.gstack/ios-qa-session.json"
if [ -f "$SESSION" ] && [ "$COLD" != "1" ]; then
CACHED_UDID=$(python3 -c "import json,os; d=json.load(open(os.path.expanduser('$SESSION'))); print(d['udid'])")
CACHED_PORT=$(python3 -c "import json,os; d=json.load(open(os.path.expanduser('$SESSION'))); print(d['daemon_port'])")
if curl -sf "http://127.0.0.1:$CACHED_PORT/healthz" > /dev/null; then
echo "Warm start: daemon alive, device $CACHED_UDID connected"
fi
fi
```
## Phase 1: Read source, plan codegen
1. Walk the app source (passed as `--source <dir>`) and identify all `@Observable`
classes. Note any property marked with the `@Snapshotable` wrapper — those
are the snapshot-eligible fields.
2. Run `swift run --package-path $GSTACK_HOME/ios-qa/scripts/gen-accessors-tool gen-accessors --input <source-dir>`.
First invocation builds the swift-syntax dependency tree (cold: 2-5 min).
Subsequent runs are content-hash-cached and finish in ~50ms.
3. Show the user the accessor list and ask whether to install the DebugBridge
SPM dependency into their `Package.swift` (one AskUserQuestion).
## Phase 2: Bootstrap the device bridge
1. Add the `DebugBridge` SPM dependency to the app's `Package.swift`. The package
ships three Debug-config-only library products:
- `DebugBridgeCore` (Swift, cross-platform) — StateServer + bridge protocols.
- `DebugBridgeTouch` (Objective-C, iOS-only) — KIF-derived in-process touch
synthesis with iOS 18+ `_UIHitTestContext` SwiftUI hit-testing.
- `DebugBridgeUI` (Swift, iOS-only) — Screenshot / Elements / Mutation
bridge implementations.
The app target depends on `DebugBridgeUI` with `.when(configuration: .debug)`
(transitively pulls in Core + Touch). Release builds refuse to link these
targets.
2. Wire the bridges from the `@main` App init, gated on `#if DEBUG`:
```swift
#if DEBUG
import DebugBridgeCore
StateServer.shared.start()
#if canImport(UIKit)
import DebugBridgeUI
DebugBridgeUIWiring.installAll()
#endif
#endif
```
3. Build + deploy to the device with `xcodebuild -scheme <SchemeName>
-destination 'platform=iOS,id=<UDID>' build install`.
4. Launch via `devicectl device process launch --device <UDID> --console <bundle-id>`.
Capture the boot token printed to `os_log` on first run.
5. Spawn the Mac-side daemon (on-demand) — `gstack-ios-qa-daemon`. Daemon
acquires an exclusive flock on `~/.gstack/ios-qa-daemon.pid`. If another
daemon is alive, the second invocation discovers its port and connects.
6. Daemon immediately calls `POST /auth/rotate` on the iOS StateServer with a
fresh in-memory-only token. The boot token becomes useless ~5s later.
Anything scraping `os_log` past this point sees a dead credential.
## Phase 3: Vision-driven agent loop
Each iteration:
1. `GET /screenshot` (via daemon) → save PNG.
2. `GET /elements` → accessibility tree.
3. `GET /state/snapshot` (only `@Snapshotable` fields) → current state.
4. Decide next action based on what's on the screen vs the test goal.
5. `POST /session/acquire` to grab the device lock.
6. Execute `POST /tap`, `/swipe`, `/type`, or `POST /state/<key>` write.
7. Re-screenshot; compare; record finding if buggy.
8. `POST /session/release` once the iteration is done.
Each authenticated mutating request through the tailnet listener (if remote
mode is active) writes an audit row to
`~/.gstack/security/ios-qa-audit.jsonl`.
## Modes
**Local-USB mode (default).** Daemon binds loopback only; no Tailscale
required. The spawning skill gets full-surface access. Best for solo
development.
**Tailnet mode (`--tailnet`).** Daemon additionally binds the Tailscale
interface (never `0.0.0.0`). Requires `tailscaled` to be running locally and
the daemon to be able to read `/var/run/tailscale.sock`. Fails closed if the
socket is missing, permission-denied, or returns an unparseable WhoIs
response. Remote agents hit `POST /auth/mint` over tailnet, daemon
canonicalizes identity via WhoIs, checks the allowlist file, mints a
session token. See `ios-qa/docs/tailscale-acl-example.md`.
**Capability tiers (tailnet mode).** Minted tokens default to `interact`
(taps, swipes, types). Higher tiers require explicit owner mint:
- **observe:** `/screenshot`, `/elements`, `GET /state/*`, `/healthz`,
`/session/heartbeat`.
- **interact:** observe + `/tap`, `/swipe`, `/type`.
- **mutate:** interact + `POST /state/<key>`.
- **restore:** mutate + `POST /state/restore`.
Owner mints via `gstack-ios-qa-mint --remote <identity> --capability <tier>`
on the Mac. Self-service mint over tailnet only succeeds for already-allowlisted
identities.
**Recording mode (`--recording`).** DebugOverlay renders a small diagonal
"AGENT DEMO" watermark in a corner so screencasts are unambiguous about the
device being agent-driven.
## Demo mode
If the user says "demo", "demo mode", "show me", or "I want to see it
working", run in **DEMO MODE**. This changes how the agent interacts with
the app:
**DEMO MODE OVERRIDES ALL OTHER RULES.** When demo mode is active, the
agent MUST drive every action through visible UI (`/tap`, `/swipe`, `/type`)
and NEVER use `POST /state/*` writes to skip steps. Viewers see the agent
type every key, tap every button. The on-device DebugOverlay attribution
chip shows "Driven by Claude Code (demo)" or the remote agent identity.
In demo mode, the screencap rate is bumped to 4fps so the recording feels
live.
## Failure modes + recovery
| Symptom | Likely cause | Action |
|---|---|---|
| `curl: connection refused` to daemon | daemon crashed | Re-run `/ios-qa`; spawn-race lock will fail closed |
| `403 identity_not_allowed` from `/auth/mint` | identity missing from allowlist | Run `gstack-ios-qa-mint --remote <identity>` on the Mac |
| `409 schema_mismatch` on `/state/restore` | snapshot from older app build | Discard the snapshot; re-capture |
| `503 device_disconnected` from proxy | USB tunnel dropped | Reconnect device; daemon auto-reconnects within 30s |
| `429 rate_limited` from `/auth/mint` | >10 mints/min from one identity | Wait 60s; check audit log for anomalies |
| `413 body_too_large` on `/state/restore` | snapshot >1MB | Increase `--max-body` or trim snapshot |
## Cleanup
Use `/ios-clean` to remove the DebugBridge SPM dependency and all `#if DEBUG`
wiring before a Release build. This is a convenience flow; the structural
Release-build guard (Package.swift `.when(configuration: .debug)` + CI
`swift build -c release` check) is the safety-critical path.