mirror of https://github.com/garrytan/gstack.git
2 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
c7ae63201a
|
v1.58.1.0 feat: hermetic local E2E + Conductor prose AskUserQuestion (#2004)
* feat: add shared call-time isConductor() helper
Single source of truth for Conductor host detection in TS consumers
(CONDUCTOR_WORKSPACE_PATH / CONDUCTOR_PORT). Reads the passed env at
call time, not a module-load snapshot, so unit tests can pin the env
inline without Bun --preload (esm-hoist-breaks-env-pin-bootstrap).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test: harden question-preference-hook harness against ambient Conductor env
runHook copied all of process.env into the hook subprocess, so running the
suite inside Conductor (CONDUCTOR_WORKSPACE_PATH/PORT set) would leak those
markers. Strip them so the existing cases deterministically characterize
NON-Conductor behavior before the Conductor branch lands. Baseline: 15 pass.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: PreToolUse hook denies AskUserQuestion in Conductor, redirects to prose
Conductor disables native AskUserQuestion and routes through a flaky MCP
variant that returns '[Tool result missing due to internal error]'. The
hook now denies any AUQ call in a Conductor session and instructs the model
to render a prose decision brief instead (transport avoidance, not preference
enforcement) — firing for one-way doors too, with a typed-confirmation
requirement for destructive paths.
Precedence: never-ask auto-decide still wins (user already settled those);
Conductor prose is the fallback for everything else; non-Conductor behavior
is byte-for-byte unchanged. Restructured the per-question loop to compute
eligibility without early-returning so the Conductor branch can run as the
fallback while preserving memoryContext on every exit.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: Conductor renders AskUserQuestion decisions as prose by default
In Conductor, native AskUserQuestion is disabled and the MCP variant is
flaky, so skills now render every decision as a plain-text prose brief the
user answers by typing a letter — proactively, not as a failure reaction.
- Preamble emits CONDUCTOR_SESSION, gated on != headless so eval/CI inside
Conductor still BLOCKs instead of rendering prose to nobody.
- AskUserQuestion Format gains a Conductor-default-prose rule (auto-decide
preferences still apply first; prose decisions log via gstack-question-log
since PostToolUse never fires), a one-way/destructive typed-confirmation
rule, and a typed-reply continuation protocol for split chains.
- Regenerated all SKILL.md + ship golden fixtures; bumped affected carve
skeleton caps to absorb the always-loaded additions.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: deploy the Conductor AskUserQuestion hook (setup + upgrade migration)
The PreToolUse hook only delivers its Conductor-prose guarantee if it's
installed, but setup skips hook registration in non-interactive (conductor/CI)
setups. Two fixes so layer 3 actually deploys:
- setup: treat a Conductor workspace as an implicit opt-in for the PreToolUse
hook on the silent fall-through (never overriding an explicit opt-out).
- migration v1.58.0.0: re-register the hook for existing Conductor installs on
/gstack-upgrade, idempotent and respecting plan_tune_hooks=no.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test: E2E for Conductor prose + fix auto-decide-preserved GSTACK_HOME bug
- New skill-e2e-conductor-prose (periodic): Conductor env + plan-eng-review
surfaces a prose decision brief, not a silent skip. Header documents this is
end-to-end behavior coverage; the deterministic Conductor guard is the
question-preference-hook unit test (the PTY harness can't register the MCP
variant — Codex #10).
- Fix the pre-existing bug in auto-decide-preserved: it seeded the never-ask
preference under GSTACK_HOME=tmpHome but never passed GSTACK_HOME into the
PTY run, so the spawned claude read the real ~/.gstack and the preference
was inert (Codex #9). Now passes GSTACK_HOME + CONDUCTOR_WORKSPACE_PATH to
prove auto-decide still wins over the Conductor prose redirect.
- Register both in touchfiles (periodic tier).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* v1.58.0.0 feat: Conductor renders AskUserQuestion decisions as prose
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test: strip ambient Conductor env in memory-cache-injection hook harness
Same dev-in-Conductor leak fixed for question-preference-hook: this suite's
runHook copies process.env, so running it inside Conductor flipped the
defer-path memoryContext assertions into the [conductor] prose deny. Strip
CONDUCTOR_* so the cases characterize non-Conductor behavior. (CI is headless,
so this only bit local Conductor runs.)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: gstack-detach — run agent eval/bench jobs in their own session
Long agent-run jobs (30-60 min evals, benchmarks) die when the harness sends
SIGTERM to a background task's process group on turn boundaries / monitor
stops / interruptions (observed: 'script test:gate terminated by signal
SIGTERM'). gstack-detach runs the command in a fresh session (python3
os.setsid, or setsid on Linux, nohup fallback) so a group SIGTERM can't reach
it, and wraps it in caffeinate -i on macOS so idle-sleep can't kill it either.
Returns immediately; caller polls the logfile. Secrets stay in env, never argv.
The guard test pins the contract: the command runs in a different process
group than the caller and outlives the launching shell.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: eval:bg* scripts — detached eval runs for agents
Agent-facing convenience scripts that launch the eval suites through
gstack-detach so a harness SIGTERM can't kill a long run. eval:bg (diff-based),
eval:bg:all, eval:bg:gate, eval:bg:periodic — each returns immediately and
streams to /tmp/gstack-evals.log for polling. The plain test:evals / test:e2e
scripts stay foreground for humans.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* docs: CLAUDE.md — agents must run long evals via gstack-detach
Codifies the detached-execution default: agent-launched eval/benchmark runs go
through bin/gstack-detach (or the eval:bg* scripts) so a harness SIGTERM or
macOS idle-sleep can't kill a 30-60 min run, then poll the log with a
death-aware watcher. Humans keep foreground scripts.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: harden gstack-detach against all four eval-infra killers
The basic bash detach fixed SIGTERM but a real run on a shared dev box hit
three more killers: cross-worktree API saturation (15-way concurrency x a
sibling worktree mass-timed-out the suite), a silent hang (periodic bun died
with no exit marker), and shared-/tmp log contamination (a concurrent
worktree's agent output bled into the log). Rewrite as a portable python3 tool
that bakes in all four fixes:
- fork + setsid: SIGTERM-proof (own session, survives harness polite-quit)
- caffeinate -i on macOS: no idle-sleep death
- --lock NAME (fcntl, machine-wide): concurrent worktrees SERIALIZE instead of
saturating the shared model API
- run-scoped default log (~/.gstack-dev/eval-runs/<label>-<slug>-<branch>-<ts>-<pid>):
no cross-worktree collision/contamination
- --timeout watchdog + a guaranteed '### gstack-detach EXIT=<code> ###' sentinel
on every terminal path: no silent hang, finished-vs-died always detectable
Guard test pins all four: detached pgid differs + outlives launcher, run-scoped
log path, watchdog EXIT=timeout, and lock serialization (second run WAITS).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: eval:bg* use run-scoped logs + machine lock + watchdog
Drop the shared /tmp/gstack-evals.log path (the cross-worktree collision that
contaminated a live run) for gstack-detach's run-scoped default, and add the
machine-wide gstack-evals lock (concurrent worktrees serialize, no API
saturation) plus per-tier watchdog timeouts (60/90/120 min). Each eval:bg*
prints its run-scoped log path to poll.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* docs: wire detached-eval guidance into /ship + correct CLAUDE.md flags
- /ship eval step (sections/tests.md): long eval suites launch via gstack-detach
(own session, machine lock, EXIT sentinel) so a turn boundary can't kill a
30+ min run mid-ship — the exact failure observed during this branch's ship.
- CLAUDE.md: correct the now-stale /tmp reference; document the --lock (serialize
worktrees, no API saturation), --timeout watchdog, run-scoped log, and the
guaranteed EXIT sentinel the poller breaks on.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* refactor: extract pure promotedEnv() from conductor-env-shim
Single source of truth for GSTACK_* key promotion semantics. The ambient
promoteConductorEnv() becomes a wrapper; behavior-preserving. Needed by the
hermetic env builder which must not mutate process.env.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: hermetic child-env builder for E2E runners
Allowlist scrub (basics/network/named-auth kept; CONDUCTOR_*, CLAUDE_*,
GSTACK_*, MCP_*, GBRAIN_*, operator credentials dropped), per-runner
extraAllow, overrides merge last, EVALS_HERMETIC=0 byte-identical escape
hatch read at call time (ESM-hoist safe). Sync memoized singleton temp dirs
(<runRoot>/.claude keeps the extractPlanFilePath contract), seeded
.claude.json for non-interactive first run, pid-aware GC of crashed runs.
19 free unit tests.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: session-runner spawns hermetic children + isolation canaries
claude -p children now get the allowlist-scrubbed env and a gated
--strict-mcp-config (EVALS_HERMETIC=0 restores operator env AND args).
Two gate-tier canaries make the clean room falsifiable: hermetic-canary
asserts env redirect + scrub + zero MCP servers + nonzero API-key cost
from the Bash tool_result (never model prose); hermetic-sentinel plants a
poisoned operator config (user CLAUDE.md + MCP server) and proves the
child cannot see it. Empirically verified on claude 2.1.175: print mode
needs no seed config (the seed serves the PTY path); the child CLI sets
CLAUDECODE for its own tools, so that scrub is pinned in unit tests, not
E2E. hermetic-env.ts joins GLOBAL_TOUCHFILES.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: PTY runner spawns hermetic claude sessions
launchClaudePty children get the allowlist-scrubbed env, a gated
--strict-mcp-config, and the session exposes hermeticConfigDir for
forensics (hermetic plan files live under <dir>/plans/ and still match
extractPlanFilePath via the /.claude dir-name contract). Seeded trust
state covers repo-cwd sessions; the 15s trust-watcher stays as fallback.
Verified foreground via the plan-mode-no-op gate test.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: codex/gemini runners spawn hermetic children
Same allowlist scrub as the claude runners, with each provider's auth
surface re-admitted via extraAllow (codex: OPENAI_API_KEY/CODEX_* plus
its tempHome .codex copy; gemini: GEMINI_*/GOOGLE_* with real HOME for
~/.gemini auth). The gemini spawn previously inherited the full operator
env with no env property at all.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat: agent-sdk-runner spawns hermetic children via complete Options.env
The historical 'env: breaks SDK auth' failure was partial-env replacement:
Options.env replaces the child's entire environment, so objects lacking
ANTHROPIC_API_KEY killed auth. Passing the complete hermetic env (key +
PATH + redirected CLAUDE_CONFIG_DIR/GSTACK_HOME) works — validated live
via query() with a Bash tool call (success, real cost, Conductor vars
scrubbed). Per-test opts.env merges last; ambient key mutation still
works because the builder reads process.env at call time.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test: static tripwire pins hermetic wiring in all five runners
Free-tier invariants: every runner builds child env via hermeticChildEnv,
no raw ...process.env spread at any spawn site, --strict-mcp-config gated
on isHermeticEnabled in both claude runners, and no test callsite passes
the operator env into a runner's override parameter (scoped to runner
calls — unit tests spawning gstack bin scripts directly are exempt).
Mirrors the terminal-agent-pid-identity / server-embedder-terminal-port
tripwire idiom.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test: refresh codex/factory ship goldens with detached-eval block
|
|
|
|
1d9b9c4cfc
|
v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574)
* feat(ios): author 5 iOS device-farm skill templates + generated docs
Authors ios-qa, ios-fix, ios-design-review, ios-clean, ios-sync as upstream gstack skills. Each follows the standard SKILL.md.tmpl pattern with preamble-tier:3 frontmatter. The fork at time-attack/gstack shipped these but as byte-identical .md/.tmpl pairs that wouldn't pass skill-docs.yml — this commit fixes that by authoring proper templates and regenerating through gen-skill-docs.
* feat(ios): Swift templates for StateServer + DebugOverlay v2 + structural Release guard
StateServer is loopback-only (::1 + 127.0.0.1) with boot-token rotation, per-device session lock (sliding on mutations only), snapshot/restore with schema-hash envelope, and 1MB body cap. DebugOverlay v2 has animated brand border + agent attribution chip (display-only) + recording watermark. Package.swift enforces structural Release-build exclusion via .when(configuration: .debug). Includes Tailscale ACL example doc.
* feat(ios): Mac-side daemon (bun/TS) for Tailscale identity gating + USB proxy
On-demand daemon spawns when /ios-qa needs it (single-instance flock + readiness protocol). Owns tailnet ingress: fail-closed tailscaled LocalAPI probe, dual-track /auth/mint (self-service for allowlisted identities, owner-granted via CLI), capability-tier allowlist (observe/interact/mutate/restore), 1h default session TTL (24h hard cap), audit log of every authenticated mutating tailnet request, hashed-identity attempts log. iOS StateServer never directly binds tailnet — identity validation lives Mac-side because iPhones can't reach tailscaled. 67 unit/integration tests covering session-lock concurrency, capability enforcement, fail-closed probe, identity canonicalization, body limits, and boot-token leak proofs.
* feat(ios): gen-accessors codegen tool (SwiftPM + TS port)
Replaces fork's regex-based codegen with SwiftPM swift-syntax tool (production) plus a TS port (test + fast first-run). Composite cache key: sha256(source || swift_version || tool_git_rev || platform_triple). Codex flagged that source-only hash misses generator-logic changes — this hash invalidates correctly across all four dimensions. 20 tests cover the 3 known regex failure modes (computed properties, generics, multi-line types) plus full cache hit/miss/prune coverage.
* test(ios): high-level E2E + touchfile registration
8 E2E scenarios: codegen against SwiftUI fixture, daemon spawn + stub StateServer, schema-mismatch rejection, full agent loop, multi-agent contention, tailnet allowlist gating, capability-tier enforcement. Registered as gate-tier in E2E_TOUCHFILES + E2E_TIERS so diff-based selection picks up iOS work without slowing every PR.
* chore: bump version and changelog (v1.40.0.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(ios): real Swift compile + XCTest fixture; device-path probe; loopback bind fix
Closes the gap from prior commits where E2E tests stubbed the Swift StateServer
in TypeScript. Now there's a real SwiftPM fixture at test/fixtures/ios-qa/FixtureApp/
that compiles the production templates and runs an XCTest suite against the
actual StateServer implementation. Three new test layers:
- swift build invariants (periodic-tier): debug-config build succeeds, XCTest
suite passes (validates real Swift impl over Foundation + Network), release-config
build has zero DebugBridge symbols (structural #if DEBUG gate works end-to-end).
- Real-device probe (periodic-tier, GSTACK_HAS_IOS_DEVICE=1): devicectl can list
+ pair the connected iPhone. Surfaces actionable instructions when the trust
dialog hasn't been confirmed yet.
- Fixture sources copied from ios-qa/templates/ — Package.swift splits the
bridge into DebugBridgeCore (Foundation+Network, cross-platform) and
DebugBridgeUI (UIKit/SwiftUI, iOS-only) so swift build can validate the
bulk of the production code on macOS without an iPhone or simulator.
Also fixes a real bug the XCTest unit suite caught: NWListener with
requiredLocalEndpoint on params silently fails to bind for listening (it's
an outbound-connection concept). Replaced with .requiredInterfaceType=.loopback
+ .acceptLocalOnly=true + a per-connection peer-address check. The fork's
inherited code had this bug; we shipped it untouched in v1.41.0.0 and the
new XCTest suite caught it immediately.
* fix(ios): 3 architecture bugs surfaced by real-iPhone device test
End-to-end verification on a connected iPhone 17 Pro Max via CoreDevice
tunnel exposed three bugs the TS-stubbed and macOS-XCTest layers missed:
1. acceptLocalOnly=true was too tight. Network.framework's "local" gate
only allows ::1 / 127.0.0.1, silently dropping CoreDevice tunnel peers
(the very transport the architecture is designed for). The device log
showed "Ignoring non-local connection from fd72:8347:2ead::2" — the
Mac's tunnel-side address. Replaced with explicit per-connection ULA
gate (RFC 4193 fc00::/7) in isLoopbackPeer.
2. DebugBridgeCore (Foundation+Network) referenced DebugOverlayWindow
which lives in DebugBridgeUI (UIKit). Backwards module dep. Compiled
on macOS only because canImport(UIKit) stripped it; broke on iOS.
Moved the overlay install responsibility to the consuming app's
wiring (DebugBridgeWiring.swift.template already shows the pattern).
3. @Observable macro + @Snapshotable property wrapper conflict. Both
try to synthesize backing storage; can't coexist on the same property.
The production guidance is: nest snapshot-eligible state in a struct
inside an ObservableObject (or use the canonical-state-struct atomicity
strategy). Fixture switched to a plain class to demonstrate.
Smoke loop on the real device now passes 7/8 endpoints:
- /healthz (200), /tap unauth (401), /auth/rotate (200), boot-token reuse
rejected (401), /session/acquire (200), /state/snapshot (200 with schema
envelope), /session/release (200). /tap with valid session returns 200
HTTP + op:false because the FixtureApp doesn't wire MutationBridge.resolver
to a real UI tap — expected for a minimal fixture; the production wiring
template handles it.
Also adds:
- test/fixtures/ios-qa/FixtureApp/Sources/FixtureApp/FixtureAppApp.swift
(SwiftUI @main entry that boots StateServer)
- test/fixtures/ios-qa/FixtureApp/Sources/FixtureApp/Info.plist
- test/fixtures/ios-qa/FixtureApp/project.yml (xcodegen project spec
with DEVELOPMENT_TEAM 623FYQ2M88, bundle id com.gstack.iosqa.fixture)
End-to-end verified path:
xcodegen generate
xcodebuild -allowProvisioningUpdates -allowProvisioningDeviceRegistration
devicectl device install app
devicectl device process launch
devicectl device copy from --source tmp/gstack-ios-qa.token
curl -6 http://[<corodevice-ipv6>]:9999/...
* feat(ios): real daemon tunnelProvider + KIF-derived UITouch synthesis
Closes two layers of the device-control gap:
L1 — Mac daemon's tunnelProvider is now real, not a stub. New files:
- ios-qa/daemon/src/devicectl.ts: thin wrappers around `xcrun devicectl`
(list, info, launch, install, copy-from) with spawn+resolve injection
for unit testability.
- ios-qa/daemon/src/tunnel-bootstrap.ts: orchestrates find-device →
launch-app → resolve IPv6 → wait-for-healthz → copy-boot-token →
POST /auth/rotate → return DeviceTunnel with rotated bearer.
- ios-qa/daemon/test/tunnel-bootstrap.test.ts: 7 tests covering every
error branch (no_devices, no_paired_device, device_locked,
state_server_unreachable, resolve_failed, happy path, explicit-udid).
- index.ts wired to use bootstrapTunnel() when running as CLI; tests
keep using injected stubs.
L2 — In-process touch synthesis for non-UIControl widgets. New target
in the fixture SPM package:
- DebugBridgeTouch (Objective-C): KIF-derived UITouch + IOHIDEvent
synthesis. Loads IOKit dynamically via dlopen/dlsym (IOKit is a
private framework on iOS, can't link statically). Uses iOS 18+
_UIHitTestContext for SwiftUI hit-testing. Public Swift-callable
API: DebugBridgeTouch.sendTap(at:in:). MIT-attributed to
kif-framework/KIF.
- DebugBridgeUI/Bridges.swift: rewritten MutationBridge.handleTap to
delegate to DebugBridgeTouch. ScreenshotBridge + ElementsBridge
implementations also land here.
- FixtureApp/Sources/FixtureApp/FixtureAppApp.swift: wires the bridges
on app launch under #if DEBUG.
Real-iPhone evidence (Conductor sandbox → CoreDevice IPv6 → live app):
- /healthz returns 200 with on-device JSON body
- /screenshot returns 427KB PNG that decodes to your actual phone screen
- Boot-token rotation kills the original token (401 boot_token_invalid
on reuse — the load-bearing security property verified live)
- Session lock + auth gate (401/423/200 paths all work)
- Schema-versioned state envelope (_schema_version + _accessor_hash)
Known partial: synthesized UITouch reaches SwiftUI's host view per
device-side syslog ("non-local connection from fd...:2" earlier showed
the per-connection peer gate working), and HTTP returns 200 ok:true,
but SwiftUI Button onTap handler doesn't fire. UIControl widgets DO
work via UIControl.sendActions. Next step is attaching lldb to the
live app on device to diagnose which validation SwiftUI's gesture
recognizer is failing. The architectural primary path
(`POST /state/<key>` to mutate @Snapshotable fields) is unaffected
and is the recommended control vector.
Documented sources for the KIF-derived synthesis:
- https://github.com/kif-framework/KIF (MIT)
- UITouch-KIFAdditions.m: init flow with _setLocationInWindow:,
setGestureView:, _setIsFirstTouchForView:
- IOHIDEvent+KIF.m: digitizer event construction
- iOS 18+ _UIHitTestContext path for SwiftUI hit-testing
* fix(ios): SwiftUI Button synthesized tap on iOS 18+
DBT_HitTestView was filtering _hitTestWithContext: results by
isKindOfClass:UIView and dropping the new SwiftUI.UIKitGestureContainer
(a UIResponder, not UIView). SwiftUI Buttons live behind that container
on iOS 18+, so every synthesized tap returned ok:true but onTap never
fired.
Mirror KIF PR #1323: return id, pass the responder through to
UITouch.setView: directly (the setter accepts non-UIView responders).
Verified: real iPhone 17 Pro Max, iOS 26.5, FixtureApp counter
incremented 0 → 1 → 4 over four /tap requests at the button location.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ios): hoist DebugBridgeTouch into canonical templates
Bridges.swift.template imports DebugBridgeTouch but no .m/.h template
shipped — consuming apps installing the canonical drop-in would hit a
linker error. Closes that gap with the fixture's verified working code.
Changes:
- New ios-qa/templates/DebugBridgeTouch.{h,m}.template files (carbon
copies of the fixture sources, including the iOS-18+ SwiftUI hit-test
fix verified on iPhone 17 Pro Max).
- Package.swift.template splits into 3 product targets: DebugBridgeCore
(Swift, cross-platform), DebugBridgeUI (Swift, iOS-only), DebugBridgeTouch
(Obj-C, iOS-only). Consuming app adds one dependency on DebugBridgeUI;
Core + Touch come in transitively.
- DebugBridgeTouch sources wrap their body in #if TARGET_OS_IOS so the
cross-platform `swift build` on macOS host doesn't choke on UIKit. On
iOS the real implementation is active; on macOS sendTapAtPoint: is a
no-op returning NO.
- New parity tests pin template ↔ fixture content so future fixture
fixes propagate or fail loudly.
- Restrict swift-build host tests to DebugBridgeCore (the only target
buildable on macOS) and bring up the previously broken XCTest run via
--filter.
Verified post-change: real iPhone 17 Pro Max, iOS 26.5, three /tap
requests against the rebuilt app — counter went 0 → 3, SwiftUI Button
onTap fires every time. Templates now sufficient to ship to any
consuming iOS app.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ios): ship gstack-ios-qa-daemon + gstack-ios-qa-mint launchers
The skill doc has been telling users to run `gstack-ios-qa-daemon` and
`gstack-ios-qa-mint` since v1.41.0.0, but neither binary actually existed.
Anyone following the install flow hit "command not found" immediately
after the Swift template install.
Adds the missing pieces:
- bin/gstack-ios-qa-daemon — bash shim that execs
`bun run ios-qa/daemon/src/index.ts`. Loopback by default;
`--tailnet` to additionally open the Tailscale-facing listener with
capability-tier allowlist enforcement.
- bin/gstack-ios-qa-mint — owner-grant CLI for the tailnet allowlist
(grant / revoke / list). Writes ~/.gstack/ios-qa-allowlist.json at
mode 0600. Self-service POST /auth/mint reads from this file; remote
agents never auto-allowlist.
- ios-qa/daemon/src/cli-mint.ts — TS implementation behind the shim.
Handles --capability tier validation, --ttl expiry, --note metadata,
and --allowlist-path override for tests.
- ios-qa/daemon/src/allowlist.ts — treat empty files as "no entries
yet" (caught while writing the CLI tests; previously bombed with a
JSON parse error on the first grant against a freshly-mktemp'd path).
Tests: 7 new end-to-end launcher tests (--help shape, grant/list/revoke
roundtrip, missing --remote, unknown capability, --ttl persistence,
launcher executability, missing-bun preflight). All 81 daemon tests
pass.
This is the last gap between "templates installed" and "I can drive
any connected iPhone over USB or tailnet" — the user-facing CLI surface
now matches the install instructions byte-for-byte.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: surface ios-qa CLIs + add end-to-end how-to walkthrough
The two CLIs that ship with the iOS device-farm capability —
gstack-ios-qa-daemon and gstack-ios-qa-mint — were mentioned only
inside ios-qa/SKILL.md. Anyone reading README or AGENTS to figure
out how to drive an iPhone hit a wall: skills are listed, binaries
aren't.
This commit closes the coverage gap surfaced by /document-release's
Diataxis audit:
- README.md, AGENTS.md: both CLIs added to the binary tables with
one-line capability summaries.
- docs/howto-ios-testing-with-gstack.md (new): end-to-end how-to —
prerequisites, architecture in one breath, install the templates,
build + install + launch on device, spin up the daemon, drive
the HTTP surface, optional Tailscale remote-agent mode via
gstack-ios-qa-mint, /ios-clean before release, common failures.
Pulled directly from the real iPhone 17 Pro Max / iOS 26.5
verification run.
- README + AGENTS link to the new how-to from the iOS skill row.
No CHANGELOG entry change — the consolidated 1.43.0.0 entry is /ship
work. No VERSION bump — already at 1.43.0.0 covering all branch work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(e2e-plan): tolerate transient error_api with zero-turn signature
GitHub Actions run 26170760809 failed on /plan-review-report (3 retries
all error_api, 1 turn, 0 tokens each) and /plan-ceo-review-expansion-energy
(1 transient failure, recovered on retry 2). The prior run on the same
branch (
|