Commit Graph

982 Commits

Author SHA1 Message Date
bellman 7214573f35 Keep approval token contracts in their own runtime module
Constraint: G004 task 3 now owns approval-token contracts through rust/crates/runtime/src/approval_tokens.rs, while auto-integration left a duplicate unused copy in permissions.rs.\nRejected: suppressing dead-code warnings | the duplicate implementation was obsolete after the dedicated module landed.\nConfidence: high\nScope-risk: narrow\nDirective: Keep permission-mode authorization in permissions.rs and approval-token policy handoff in approval_tokens.rs.\nTested: cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml -p runtime; cargo test --manifest-path rust/Cargo.toml -p runtime approval_token -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime --test g004_conformance -- --nocapture\nNot-tested: full workspace test suite; G004 tasks 2/4/5 remain non-terminal.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 18:11:20 +09:00
bellman dcf11f8190 harden report contract projection identity
Add a runtime report schema v1 contract so downstream consumers can negotiate structured fields, verify canonical report identity, and audit projection redactions without reverse-engineering prose.\n\nConstraint: Task 2 scope was limited to report schema/projection/redaction modules/docs/tests and prohibited .omx/ultragoal mutation.\nRejected: Wiring into broader CLI report emitters | kept this slice focused on the reusable contract and deterministic fixtures.\nConfidence: high\nScope-risk: narrow\nDirective: Future report emitters should build canonical payloads through CanonicalReportV1 before projecting audience-specific views.\nTested: cargo test -p runtime report_schema -- --nocapture; cargo test -p runtime lane_events -- --nocapture; cargo check -p runtime\nNot-tested: cargo clippy -p runtime --all-targets -- -D warnings remains blocked by pre-existing non-task warnings in compact.rs, file_ops.rs, policy_engine.rs, sandbox.rs.
2026-05-14 18:09:36 +09:00
bellman e1641aa010 Prove G004 contract bundles are machine-checkable
Constraint: Task 6 needed a regression harness without overwriting Task 1-4 implementation files.\nRejected: Editing lane_events/report-schema/approval-token owners directly | would create shared-file conflicts with active lanes.\nConfidence: high\nScope-risk: narrow\nDirective: Keep this harness as a consumer-facing conformance layer; extend fixtures after Task 2/3 land schema/token producers.\nTested: cd rust && cargo test -p runtime --test g004_conformance -- --nocapture; cd rust && cargo check -p runtime; cd rust && cargo fmt --check; git diff --check\nNot-tested: cargo clippy -p runtime --tests -- -D warnings fails on pre-existing runtime lint debt outside changed files.
2026-05-14 18:07:11 +09:00
bellman 5cebdd999d omx(team): auto-checkpoint worker-2 [3] 2026-05-14 18:07:05 +09:00
bellman bf533d77a7 task: approval token chain
Add a runtime approval-token ledger so policy-blocked actions can require scoped owner grants, consume one-time tokens, reject replay, and retain delegation traceability.\n\nConstraint: Task 3 scope is the G004 approval-token chain for runtime event/report contract families.\nRejected: Extending the existing permission prompt path directly | the token contract can be tested independently without changing live tool authorization behavior.\nConfidence: high\nScope-risk: narrow\nDirective: Keep approval grants scoped to policy/action/repo/branch before wiring them into external execution paths.\nTested: cargo check --manifest-path rust/Cargo.toml --workspace; cargo test --manifest-path rust/crates/runtime/Cargo.toml; cargo test --manifest-path rust/crates/runtime/Cargo.toml approval_token -- --nocapture\nNot-tested: cargo clippy --manifest-path rust/crates/runtime/Cargo.toml --all-targets -- -D warnings is blocked by pre-existing warnings in compact.rs, file_ops.rs, policy_engine.rs, and sandbox.rs.
2026-05-14 18:07:03 +09:00
bellman e34209ff7f omx(team): auto-checkpoint worker-2 [3] 2026-05-14 18:07:00 +09:00
bellman ff37d395bb Stabilize G004 contract integration after worker merges
Constraint: G004 worker integrations introduced unparseable approval-token tests and a conformance path bug that blocked leader verification.\nRejected: waiting for another auto-integration cycle | local leader verification had exact parse and fixture failures to repair safely.\nConfidence: high\nScope-risk: moderate\nDirective: Keep approval-token regression tests in cfg(test) modules or integration tests, never inside type definitions.\nTested: cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml -p runtime; cargo test --manifest-path rust/Cargo.toml -p runtime approval_token -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime --test g004_conformance -- --nocapture; python3 .github/scripts/check_doc_source_of_truth.py\nNot-tested: full workspace test suite; remaining G004 tasks 1-5 still non-terminal.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 18:06:14 +09:00
bellman f8d744bb37 omx(team): auto-checkpoint worker-1 [1] 2026-05-14 18:05:26 +09:00
bellman c8c936ede1 omx(team): auto-checkpoint worker-3 [6] 2026-05-14 18:00:23 +09:00
bellman 57b3e3258b omx(team): auto-checkpoint worker-2 [3] 2026-05-14 18:00:19 +09:00
bellman 06e545325d omx(team): auto-checkpoint worker-1 [1] 2026-05-14 18:00:16 +09:00
bellman f4e08d0ecf omx(team): auto-checkpoint worker-2 [3] 2026-05-14 17:58:46 +09:00
bellman 16d6525de4 omx(team): auto-checkpoint worker-2 [3] 2026-05-14 17:57:59 +09:00
bellman aec291caab omx(team): auto-checkpoint worker-4 [unknown] 2026-05-14 17:51:53 +09:00
bellman 43b182882a Lock doctor JSON boot preflight contract
Constraint: G003 boot/session work adds a structured doctor boot-preflight check that must be visible in JSON output.
Rejected: reducing the doctor check count back to six | boot preflight is an explicit G003 acceptance surface.
Confidence: high
Scope-risk: narrow
Directive: Keep doctor/status JSON contract tests aligned with boot_preflight schema fields when extending preflight diagnostics.
Tested: git diff --check; cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo test --manifest-path rust/Cargo.toml -p runtime trusted_roots -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime startup -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime worker_boot -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p tools path_scope -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p rusty-claude-cli --test output_format_contract -- --nocapture; cargo check --manifest-path rust/Cargo.toml --workspace
Not-tested: full cargo test --workspace remains deferred during active G003 team reconciliation.

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 17:51:47 +09:00
bellman 307b23d27f omx(team): auto-checkpoint worker-4 [unknown] 2026-05-14 17:50:36 +09:00
bellman 8c11dd16f4 task: preserve startup no-evidence timestamp evidence
Lock the startup-no-evidence contract so prompt timestamps remain the original send time while lifecycle and pane timestamps prove timeout ordering.

Constraint: task 4 scope limited changes to runtime worker boot/session/startup modules and tests; .omx/ultragoal not mutated.

Rejected: CLI-surface changes | runtime evidence contract already exposes the typed worker.startup_no_evidence payload.

Confidence: high

Scope-risk: narrow

Directive: Keep startup timeout evidence timestamps stable across later lifecycle observations.

Tested: cargo test -p runtime worker_boot -- --nocapture; cargo check --workspace

Not-tested: cargo clippy -p runtime --tests -- -D warnings is blocked by pre-existing runtime warnings in compact.rs, file_ops.rs, policy_engine.rs, and sandbox.rs.
2026-05-14 17:50:33 +09:00
bellman 79d3b809f9 omx(team): auto-checkpoint worker-4 [unknown] 2026-05-14 17:46:16 +09:00
bellman 9ec4d8398e omx(team): auto-checkpoint worker-3 [unknown] 2026-05-14 17:46:13 +09:00
bellman 5f45740408 omx(team): auto-checkpoint worker-2 [unknown] 2026-05-14 17:46:10 +09:00
bellman 675d9ddc78 Harden workspace path classification
Canonicalize absolute shell path operands before comparing them with the workspace root so symlink-expanded reads cannot be downgraded under workspace-write enforcement. Also resolves local clippy findings in the touched tools crate so targeted linting can run cleanly.\n\nConstraint: Task 1 scope is workspace/path scope enforcement only; do not mutate .omx/ultragoal.\nRejected: Editing shared path-scope regression tests | worker-3 owns that test coverage and the current tests already prove the contract.\nConfidence: high\nScope-risk: narrow\nDirective: Keep shell/file permission classification canonical-path based before permitting workspace-write execution.\nTested: ../scripts/fmt.sh --check; cargo test -p tools --test path_scope_enforcement -- --nocapture; cargo test -p tools given_workspace_write_enforcer_when_bash -- --nocapture; cargo check -p tools; cargo clippy -p tools --all-targets --no-deps -- -D warnings\nNot-tested: Full workspace clippy still has known unrelated runtime crate warnings outside this task scope.
2026-05-14 17:46:07 +09:00
bellman 087e31d190 Keep G003 integrated runtime tests compiling
Constraint: G003 worker outputs added config and startup evidence fields that must compile under focused runtime validation before leader push.
Rejected: pushing auto-checkpoints without leader validation | integrated tests initially failed to compile due missing imports and stale StartupEvidenceBundle fixtures.
Confidence: high
Scope-risk: narrow
Directive: When extending StartupEvidenceBundle, update all in-crate fixtures in the same change.
Tested: git diff --check; cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo test --manifest-path rust/Cargo.toml -p runtime trusted_roots -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime startup -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime worker_boot -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p tools path_scope -- --nocapture; cargo check --manifest-path rust/Cargo.toml --workspace
Not-tested: full cargo test --workspace remains deferred during active G003 team work.

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 17:45:46 +09:00
bellman a6ee51baab omx(team): auto-checkpoint worker-3 [unknown] 2026-05-14 17:40:32 +09:00
bellman 6df60a4683 omx(team): auto-checkpoint worker-2 [unknown] 2026-05-14 17:40:29 +09:00
bellman 964458ad4a omx(team): auto-checkpoint worker-1 [1] 2026-05-14 17:38:59 +09:00
bellman ac888623a8 Merge commit '3a8ce832341884322ede0855b150e3ceebe9180d' 2026-05-14 17:34:07 +09:00
bellman 3a8ce83234 Deny scoped file reads before tool dispatch
Worker-3's path-scope regression showed outside read_file paths were blocked by the workspace wrapper after dispatch instead of by the permission enforcer. File, glob, and grep tools now classify path scope before dispatch and require danger-full-access for paths that resolve outside the current workspace.

Constraint: G002-alpha-security requires permission-mode event/status visibility for blocked file and shell paths

Rejected: relying only on runtime wrapper errors | it hides the active permission-mode denial contract from callers

Confidence: high

Scope-risk: narrow

Directive: keep path-sensitive tool permission classification aligned with workspace wrapper resolution

Tested: cargo test -p tools --test path_scope_enforcement --manifest-path rust/Cargo.toml --quiet; cargo test -p tools given_workspace_write_enforcer_when_bash --manifest-path rust/Cargo.toml --quiet; cargo check --manifest-path rust/Cargo.toml --workspace; cargo fmt --all --manifest-path rust/Cargo.toml -- --check

Not-tested: full workspace test suite after this small permission-classification follow-up

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 17:34:03 +09:00
bellman 37b2b75287 Keep G002 path-scope tests aligned with enforced denials
Constraint: G002-alpha-security requires direct file-tool escapes to fail before reads while accepting the canonical runtime error text.
Rejected: weakening the test to accept successful reads | the verified behavior denies the escape and only the assertion vocabulary was stale.
Confidence: high
Scope-risk: narrow
Directive: Keep path-scope tests asserting denial semantics, not a single legacy wording.
Tested: cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo test --manifest-path rust/Cargo.toml -p tools path_scope -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p tools --test path_scope_enforcement -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime workspace_ -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p rusty-claude-cli --test output_format_contract -- --nocapture; python3 -m pytest tests/test_security_scope.py -q; cargo check --manifest-path rust/Cargo.toml --workspace; git diff --check
Not-tested: full cargo test --workspace due known unrelated session_lifecycle_prefers_running_process_over_idle_shell failure.

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 17:33:47 +09:00
bellman f2dc615a8a Prevent workspace escape through tool path resolution
File and shell tool dispatch now resolves path-sensitive operations through workspace-scoped wrappers so direct paths, globs, symlinks, shell expansion, and Windows absolute path probes fail before execution when they leave the workspace.

Constraint: G002-alpha-security requires alpha-blocking workspace/path scope enforcement without mutating .omx/ultragoal

Rejected: string-prefix only checks | they miss canonical symlink and glob expansion escapes

Confidence: high

Scope-risk: moderate

Directive: keep new file/shell tool entrypoints wired through workspace-aware wrappers before dispatch

Tested: python3 -m unittest discover -s tests -v; python3 -m compileall -q src tests; cargo test -p runtime workspace --manifest-path rust/Cargo.toml --quiet; cargo test -p tools workspace --manifest-path rust/Cargo.toml --quiet; cargo test -p tools given_workspace_write_enforcer_when_bash --manifest-path rust/Cargo.toml --quiet; cargo test -p tools file_tools_reject --manifest-path rust/Cargo.toml --quiet; cargo fmt --all --manifest-path rust/Cargo.toml -- --check; cargo check --manifest-path rust/Cargo.toml --workspace

Not-tested: full unfiltered cargo test workspace due task-time constraints; targeted runtime/tools workspace security tests and full cargo check passed

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 17:30:57 +09:00
bellman 9bc55f9946 omx(team): auto-checkpoint worker-1 [1] 2026-05-14 17:30:54 +09:00
bellman 180ebb3b02 Reject Windows absolute PowerShell paths from workspace scope
The G002 security gate caught that PowerShell path classification still treated Windows absolute paths as workspace-relative on POSIX, so workspace scope now rejects those tokens before permission downgrades.

Constraint: G002-alpha-security requires workspace/path scope across Windows path cases as well as direct paths, symlinks, globbing, shell expansion, and worktrees.

Rejected: Relying on PathBuf::is_absolute for Windows syntax on POSIX | it treats C:\ and UNC-like tokens as relative and weakens permission classification.

Confidence: high

Scope-risk: narrow

Directive: Keep bash and PowerShell path classifiers aligned whenever new shell syntax is admitted.

Tested: cargo test --manifest-path rust/Cargo.toml -p tools path_scope -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p tools --test path_scope_enforcement -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p runtime workspace_ -- --nocapture; python3 -m pytest tests/test_security_scope.py -q; cargo check --manifest-path rust/Cargo.toml --workspace.

Not-tested: Full cargo test --workspace still has existing unrelated rusty-claude-cli session lifecycle failure reported by workers.

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 17:29:57 +09:00
bellman 9c2ebb4f39 task: prefer tests before fixes
Add focused regression coverage for path-scope enforcement before implementation changes land, preserving worker-1 ownership of the fix path.

Constraint: task 4 requested tests-first coverage for direct path, symlink, glob/shell expansion, worktree, and Windows-style path cases.\nRejected: implementation edits in enforcement code | worker-1 owns minimal implementation changes.\nConfidence: high\nScope-risk: narrow\nDirective: Keep these regressions red until path canonicalization/enforcement blocks outside-workspace reads before dispatch.\nTested: cargo fmt -p tools -- --check; cargo check -p tools; cargo clippy -p tools --test path_scope_enforcement (warnings only, pre-existing); cargo test -p tools --test path_scope_enforcement (expected red: 4 failing path-scope gaps, 2 passing baselines).\nNot-tested: Full workspace test suite because the new regression tests intentionally fail until implementation lands.
2026-05-14 17:29:31 +09:00
bellman 2c48400293 omx(team): auto-checkpoint worker-3 [4] 2026-05-14 17:27:21 +09:00
bellman 713ca7aee4 omx(team): auto-checkpoint worker-1 [1] 2026-05-14 17:27:18 +09:00
bellman 02b591ac64 omx(team): auto-checkpoint worker-3 [4] 2026-05-14 17:22:09 +09:00
bellman f789525839 omx(team): auto-checkpoint worker-1 [1] 2026-05-14 17:22:06 +09:00
bellman 17da2964d7 omx(team): auto-checkpoint worker-3 [4] 2026-05-14 17:18:58 +09:00
bellman 9ab569e626 omx(team): auto-checkpoint worker-2 [3] 2026-05-14 17:18:55 +09:00
wangguan1995 fa8eecaf8f fix 2026-05-10 13:23:40 +00:00
wangguan1995 2033c90921 fix log 2026-05-10 13:21:58 +00:00
wangguan1995 8cada12c48 Add Qwen model token limits for DashScope compatibility 2026-05-10 13:09:07 +00:00
Jobdori b98b9a712e fix(fmt): expand Thinking struct literals to pass cargo fmt 2026-05-09 15:52:54 +09:00
YeonGyu-Kim 357629dbd9 fix(skills): route help flags to local dispatch + fix push_output_block test arity
Cherry-pick from Yeachan-Heo's #2945 with manual conflict resolution:
- classify_skills_slash_command now catches -h/--help anywhere in args
- Restored pending_thinking parameter in push_output_block test calls

Co-authored-by: Yeachan-Heo <bellman@ultraworkers.dev>
2026-05-06 15:41:25 +09:00
YeonGyu-Kim 75c08bc982 fix: REPL display, /compact panic, identity leak, DeepSeek reasoning, thinking blocks
Five interrelated fixes from parallel Hephaestus sessions:

1. fix(repl): display assistant text after spinner (#2981, #2982, #2937)
   - Added final_assistant_text() call after run_turn spinner completes
   - REPL now shows response text like run_prompt_json does

2. fix(compact): handle Thinking content blocks (#2985)
   - Added ContentBlock::Thinking variant throughout compact summarizer
   - Prevents panic when /compact encounters thinking blocks

3. fix(prompt): provider-aware model identity (#2822)
   - New ModelFamilyIdentity enum (Claude vs Generic)
   - Non-Anthropic models no longer say 'I am Claude'
   - model_family_identity_for() detects provider and sets identity

4. fix(openai): preserve DeepSeek reasoning_content (#2821)
   - Stream parser now captures reasoning_content from OpenAI-compat
   - Emits ThinkingDelta/SignatureDelta events for reasoning models
   - Thinking blocks included in conversation history for re-send

5. feat(runtime): Thinking block support across codebase
   - AssistantEvent::Thinking variant in conversation.rs
   - ContentBlock::Thinking in session serialization
   - Thinking-aware compact summarization
   - Tests for thinking block ordering and content

Closes #2981, #2982, #2937, #2985, #2822, #2821
2026-05-06 15:32:34 +09:00
YeonGyu-Kim 5be173edf6
Merge pull request #2986 from andhai/pr/glob-search-prune-heavy-dirs
runtime: prune heavy directories during glob searches
2026-05-06 14:53:37 +09:00
YeonGyu-Kim 28998422e2
Merge pull request #2984 from andhai/pr/openai-token-limit-hardening
openai: harden token-limit handling and default output-token caps
2026-05-06 14:53:24 +09:00
YeonGyu-Kim b4733b67a6
Merge pull request #2949 from Yeachan-Heo/fix/export-help-json
Fix export help JSON output
2026-05-06 14:52:09 +09:00
YeonGyu-Kim d074d1c046
fix(mcp): exit 1 when JSON envelope contains ok:false (#2995)
* fix(mcp): exit 1 when JSON envelope contains ok:false

mcp info, mcp describe, and mcp list-filter all return
{"action":"error","ok":false,...} but previously exited 0,
requiring automation callers to inspect the envelope field.

After this fix: print_mcp detects ok:false in the rendered JSON
value and calls process::exit(1) after printing, so the exit code
reflects the semantic error in the envelope.

Unaffected: mcp list, mcp show, mcp help all have no ok field and
continue to exit 0 (they are not error paths).

Closes ROADMAP #68 (partial — agents bogus/mcp show nonexistent
found:false remain exit:0 as they use different envelope shapes).

* feat(scripts): add dogfood-build.sh — build from checkout and verify provenance

Builds claw from the current HEAD, then checks that the binary's
git_sha matches git rev-parse --short HEAD. Exits non-zero if the
binary is stale or provenance is opaque (git_sha: null).

Usage:
  CLAW=$(bash scripts/dogfood-build.sh)   # fail-fast if stale
  $CLAW version --output-format json       # provenance confirmed

Addresses ROADMAP #69: dogfooders using a stale installed binary
cannot attribute behavior to specific commits. This script makes
dogfood round zero unambiguous.

Also documents the safe workaround for contributors who have a
stale system-installed binary.
2026-05-05 06:09:11 +09:00
YeonGyu-Kim caeac828b5
fix(permissions): return guidance for multi-word forms instead of falling through to LLM (#2994)
claw permissions list / claw permissions allow <tool> / claw permissions deny <tool>
all fell through to the prompt/LLM path because parse_subcommand had no
arm for "permissions". The single-word bare form was already intercepted
by bare_slash_command_guidance, but any form with rest.len() > 1 bypassed
the single-word guard and landed in the _other => CliAction::Prompt branch.

Fix: add a "permissions" arm in parse_subcommand that returns a structured
guidance Err so all multi-word forms get the same exit:1 + JSON error as
the bare single-word form, without any LLM call or session creation.

Verified: all invocation forms (bare, list, read-only, workspace-write,
allow/deny <tool>) exit 1 with kind:unknown guidance JSON. Zero sessions.
2026-05-05 05:35:50 +09:00
YeonGyu-Kim 85435ad4b5
fix(plugins): route plugin and marketplace aliases through local handler (#2993)
claw plugin list / claw marketplace / claw marketplace list all fell
through to the prompt/LLM path because parse_subcommand only matched
"plugins" (the primary name) while the canonical spec aliases
"plugin" and "marketplace" were unhandled.

This manifested as auth errors and session creation on direct
invocation — dogfood confirmed Gaebal's binary created one session
via plugin prompt fallback.

Fix: extend the plugins arm in parse_subcommand to also match
"plugin" | "marketplace" so all three forms route to the same
CliAction::Plugins without network calls or session creation.

Verified: all six forms (bare + list subcommand for each name) return
kind:plugin JSON, exit 0, and create zero sessions.

Closes ROADMAP #55 partial (plugins/marketplace bypass complete).
2026-05-05 05:16:00 +09:00
YeonGyu-Kim 5eb4b8a944
fix(mcp): return typed error JSON for unsupported actions (info/describe/list-filter) (#2989)
`claw mcp info nonexistent --output-format json` and
`claw mcp list nonexistent --output-format json` fell through to
the generic help renderer, returning an opaque envelope with only
`unexpected` set — no machine-readable error_kind.

Fix:
- Add typed guards in render_mcp_report_for/_json_for for:
  - `list <filter>`: list accepts no filter argument
  - `info <name>` / `describe <name>`: suggest `mcp show`
- New render_mcp_unsupported_action_text/json helpers emit
  `ok:false`, `error_kind:"unsupported_action"`, `hint`, `requested_action`
- `mcp show`, `mcp list`, `mcp help` existing paths unchanged

Test: mcp_unsupported_actions_return_typed_error_not_generic_help
asserts kind=="mcp", ok==false, error_kind=="unsupported_action"
for info/list-filter/describe paths.

Pinpoint: ROADMAP #504
2026-05-05 05:13:07 +09:00
YeonGyu-Kim 65aa559733
fix: support /plugins slash command in resume mode (#2973)
* fix: support /plugins slash command in resume mode

Move SlashCommand::Plugins out of the 'unsupported resumed slash
command' catch-all and add a handler arm in run_resume_command that
calls handle_plugins_slash_command for list/help actions.

Mutation actions (install/uninstall/enable/disable) are rejected with
a clear error since there is no runtime to reload in resume mode.

Add /plugins coverage to resumed_inventory_commands test in
output_format_contract.rs: kind, action, reload_runtime, target.

Before: claw --resume session.jsonl /plugins --output-format json
-> {error: 'unsupported resumed slash command', type: 'error'}, exit 1

After: claw --resume session.jsonl /plugins --output-format json
-> {kind: 'plugin', action: 'list', ...}, exit 0

* style: cargo fmt line wrap in run_resume_command plugins handler

* fix: block /plugins update in resume mode, fix comment

Address REQUEST_CHANGES from OMX review:
1. Add 'update' to the blocked mutation actions in resume mode
   (previously only install/uninstall/enable/disable were blocked)
2. Fix comment: 'Only list is supported' instead of 'Only list/help'
   since /plugins help doesn't actually parse as a valid action

* style: cargo fmt after conflict resolution
2026-05-05 04:55:39 +09:00
YeonGyu-Kim ac8a24b30b
fix(config): emit section and section_value in JSON output for config subcommands (#2990)
`claw config model --output-format json` and all other section subcommands
(`env`, `hooks`, `plugins`) returned identical output with no section field
— the section arg was parsed but discarded (_section parameter).

Fix: render_config_json now:
- Passes section through to handler
- Looks up the section value via runtime_config.get(), converting the
  internal JsonValue to serde_json::Value via render()+parse
- Emits `section` (string) and `section_value` (JSON value or null)
  in the response envelope
- Returns ok:false + error for unsupported section tokens

Test: config_section_json_emits_section_and_value asserts:
- No section field when no section arg
- section + section_value fields present for all known sections
- ok:false + error for unknown section

Pinpoint: ROADMAP #126
2026-05-05 04:50:33 +09:00
YeonGyu-Kim 94b80a05d3
fix(skills): route show/info/list-filter to local, not model invoke (#2988)
`claw skills show <name>`, `claw skills info <name>`, and
`claw skills list <filter>` were all falling through to
SkillSlashDispatch::Invoke, which spawned a real model session,
consumed tokens, and created session files.

Root cause: classify_skills_slash_command had no guards for
these discovery prefixes; every non-reserved arg became Invoke.

Fix:
- Add "show", "info" as Local-only bare tokens
- Add starts_with guards for "show ", "info ", "list " args
- handle_skills_slash_command: filter skill list by name/substring
  for show/info/list-filter paths (no model call, no session)
- handle_skills_slash_command_json: same structured filtering

Test: skills_show_and_list_filter_do_not_invoke_model asserts
  classify_skills_slash_command returns Local for all discovery
  patterns and still returns Invoke for bare skill names.

Pinpoint: ROADMAP #502
2026-05-05 04:50:30 +09:00
YeonGyu-Kim 9b97c4d832
fix(tests): isolate CLAW_CONFIG_HOME in resumed_status JSON test (#2992)
resumed_status_command_emits_structured_json_when_requested was reading
the real ~/.claw/settings.json, causing loaded_config_files to be 1
instead of the expected 0 on machines with user config present.

Root cause: unlike other tests (e.g. resumed_config_command_loads_settings_files),
this test did not pass an isolated CLAW_CONFIG_HOME env var to run_claw,
so claw fell back to the real HOME and loaded the developer's settings file.

Fix: create a temp config-home dir and pass it as CLAW_CONFIG_HOME via
run_claw_with_env. This gives the assertion a clean 0-file baseline.

Unblocks PRs #2973, #2988, #2990 which all failed this same test on main.

Ref: ROADMAP #65
2026-05-05 04:49:46 +09:00
YeonGyu-Kim 1206f4131d
fix(resume): emit structured JSON for /agents --output-format json (#2987)
Resumed /agents --output-format json was returning a human-readable
text render wrapped in a JSON envelope field instead of the actual
structured agent list. The run_resume_command handler was calling
handle_agents_slash_command (text) for the json field instead of
handle_agents_slash_command_json.

Fix: use handle_agents_slash_command_json for the json outcome field,
matching the pattern already used by /skills and /plugins.

Test: extended resumed_inventory_commands_emit_structured_json_when_requested
to cover /agents, asserting kind=="agents", action=="list",
agents is an array, and count is a number (not a text render).
2026-05-05 04:20:52 +09:00
YeonGyu-Kim c99330372c
fix(version): add build_date and executable_path to version JSON output
`claw version --output-format json` was missing build_date and
executable_path, making it impossible to identify which binary is
running or correlate it with a specific build/commit.

Fix: version_json_value() now includes:
- build_date: compile-time BUILD_DATE env (already in text output)
- executable_path: std::env::current_exe() at runtime

Test: version_emits_json_when_requested extended to assert both fields
are strings in the JSON envelope.

Pinpoint: ROADMAP #507
2026-05-05 04:20:12 +09:00
Andreas Haida 9a512633a5 Cap OpenAI default output tokens using model metadata 2026-05-03 22:16:12 +02:00
Andreas Haida 6ac13ffdad Handle OpenAI token-limit errors as context-window failures 2026-05-03 22:16:12 +02:00
Andreas Haida 482681cdfe Prune heavy directories during glob searches 2026-05-03 22:13:58 +02:00
YeonGyu-Kim 8e45f1850c
test(output_format_contract): add plugins json coverage to inventory_commands test (#2972)
Add four assertions to inventory_commands_emit_structured_json_when_requested:
- kind == "plugin"
- action == "list"
- reload_runtime is boolean
- target is null when no plugin is targeted

Closes the only major --output-format json surface with zero contract
coverage. All other surfaces (agents, mcp, skills, status, sandbox,
doctor, help, version, acp, bootstrap-plan, system-prompt, init, diff,
config) already had test assertions.
2026-05-01 06:03:31 +09:00
Yeachan-Heo 51b9e6b37f Fix export help JSON output 2026-04-30 09:04:11 +00:00
Sigrid Jin (ง'̀-'́)ง oO 1011a83823
Merge pull request #2829 from ultraworkers/fix/issue-320-session-lifecycle-classification
Fix session lifecycle classification for idle tmux shells
2026-04-29 16:11:58 +09:00
Yeachan-Heo 1376d92064 Filter stub commands from resume-safe help
Keep claw --help's resume-safe slash command summary aligned with the interactive command list by filtering STUB_COMMANDS and adding regression coverage.
2026-04-29 03:31:34 +00:00
Yeachan-Heo be53e04671 Classify saved sessions by live work rather than pane existence
Operator status previously treated any tmux pane in a workspace as equivalent to active work. The new classifier uses tmux pane command/path metadata as a soft signal, treats plain shells as idle, and adds dirty-worktree abandoned markers to status and session-list output for clawhip consumers.

Constraint: Keep issue #320 prototype minimal and additive without new dependencies

Rejected: Screen-scraping pane output | fragile and broader than needed for lifecycle classification

Confidence: high

Scope-risk: narrow

Tested: cargo test -p rusty-claude-cli

Tested: cargo check -p rusty-claude-cli

Not-tested: cargo clippy -p rusty-claude-cli --all-targets -- -D warnings is blocked by pre-existing commands crate clippy::unnecessary_wraps warnings
2026-04-28 13:12:37 +00:00
Yeachan-Heo cb56dc12ab Document Rust formatting wrapper
Make scripts/fmt.sh robust to caller cwd and document it as the supported repo-root formatting entrypoint for the Rust workspace.
2026-04-28 09:38:46 +00:00
Yeachan-Heo 74ea754d29 Restore Rust formatting compliance
Run rustfmt from the Rust workspace so CI format checks pass without changing behavior.

Constraint: Scope is formatting-only across tracked Rust files

Confidence: high

Scope-risk: narrow

Tested: cd rust && cargo fmt --check

Tested: git diff --check
2026-04-28 09:19:16 +00:00
Yeachan-Heo 77afde768c Clarify allowed tool status handling
Reject empty --allowedTools inputs instead of treating them as an empty restriction, and surface status JSON metadata that distinguishes default unrestricted tools from flag-provided allow lists.

Confidence: high
Scope-risk: narrow
Tested: cargo test -p rusty-claude-cli rejects_empty_allowed_tools_flag -- --nocapture
Tested: cargo test -p tools allowed_tools_rejects_empty_token_lists -- --nocapture
Tested: cargo check -p rusty-claude-cli -p tools
Tested: cargo test -p rusty-claude-cli -p tools
Not-tested: full workspace cargo fmt --check is blocked by pre-existing unrelated formatting drift
2026-04-28 05:44:14 +00:00
Yeachan-Heo 6db68a2baa Expose tool permission gates as structured worker blockers
Worker boot could previously stall on an interactive MCP/tool permission prompt while readiness and startup-timeout surfaces only had generic idle/no-evidence shapes. This adds a first-class blocked lifecycle state, structured event payload, startup evidence fields, and regression coverage so callers can report the exact server/tool gate instead of pane-scraping.

Constraint: ROADMAP #200 requires tool/server identity, prompt age, and session-only versus always-allow capability in status/evidence surfaces
Rejected: Treat MCP/tool prompts as trust gates | conflates distinct prompts and loses tool identity
Rejected: Leave allow-scope as pane text only | clawhip still could not classify the blocker without scraping
Confidence: high
Scope-risk: moderate
Directive: Keep tool_permission_required distinct from trust_required; downstream claws rely on server/tool payload plus allow-scope metadata
Tested: cargo test -p runtime tool_permission
Tested: cargo fmt -p runtime -- --check && cargo clippy -p runtime --all-targets -- -D warnings && cargo test -p runtime
Tested: cargo test --workspace
Not-tested: live interactive MCP permission prompt in tmux
2026-04-27 09:28:09 +00:00
Yeachan-Heo 5b910356a2 Preserve trust boundaries during pulled follow-up
The pull brought the branch current with origin/main while replaying local follow-up work. Conflict resolution kept the roadmap/progress additions and integrated the runtime event/trust changes with upstream's newer surfaces.

The trust allowlist now treats worktree_pattern as an additional required predicate, including the missing-worktree case, so auto-trust cannot fall back to cwd-only matching when a worktree constraint was declared. The runtime formatting cleanup keeps clippy/fmt green after the merge.

Constraint: Local branch was 109 commits behind origin/main with dirty tracked follow-up work.

Rejected: Drop the autostash after conflict resolution | keeping it preserves a reversible safety backup for unrelated recovery.

Confidence: high

Scope-risk: moderate

Directive: Do not relax worktree_pattern matching without preserving the missing-worktree regression.

Tested: git diff --cached --check; cargo fmt -p runtime -- --check; cargo clippy -p runtime --all-targets -- -D warnings; cargo test -p runtime; cargo test --workspace; architect verification approved

Not-tested: Live tmux/worker auto-trust behavior outside unit/integration tests
2026-04-27 09:05:50 +00:00
YeonGyu-Kim f1e4ad7574 feat: #156 — error classification for text-mode output (Phase 2 of #77)
## Problem

#77 Phase 1 added machine-readable error `kind` discriminants to JSON error
payloads. Text-mode (stderr) errors still emit prose-only output with no
structured classification.

Observability tools (log aggregators, CI error parsers) parsing stderr can't
distinguish error classes without regex-scraping the prose.

## Fix

Added `[error-kind: <class>]` prefix line to all text-mode error output.
The prefix appears before the error prose, making it immediately parseable by
line-based log tools without any substring matching.

**Examples:**

## Impact

- Stderr observers (log aggregators, CI systems) can now parse error class
  from the first line without regex or substring scraping
- Same classifier function used for JSON (#77 P1) and text modes
- Text-mode output remains human-readable (error prose unchanged)
- Prefix format follows syslog/structured-logging conventions

## Tests

All 179 rusty-claude-cli tests pass. Verified on 3 different error classes.

Closes ROADMAP #156.
2026-04-22 00:21:32 +09:00
YeonGyu-Kim 9362900b1b feat: #77 Phase 1 — machine-readable error classification in JSON error payloads
## Problem

All JSON error payloads had the same three-field envelope:
```json
{"type": "error", "error": "<prose with hint baked in>"}
```

Five distinct error classes were indistinguishable at the schema level:
- missing_credentials (no API key)
- missing_worker_state (no state file)
- session_not_found / session_load_failed
- cli_parse (unrecognized args)
- invalid_model_syntax

Downstream claws had to regex-scrape the prose to route failures.

## Fix

1. **Added `classify_error_kind()`** — prefix/keyword classifier that returns a
   snake_case discriminant token for 12 known error classes:
   `missing_credentials`, `missing_manifests`, `missing_worker_state`,
   `session_not_found`, `session_load_failed`, `no_managed_sessions`,
   `cli_parse`, `invalid_model_syntax`, `unsupported_command`,
   `unsupported_resumed_command`, `confirmation_required`, `api_http_error`,
   plus `unknown` fallback.

2. **Added `split_error_hint()`** — splits multi-line error messages into
   (short_reason, optional_hint) so the runbook prose stops being stuffed
   into the `error` field.

3. **Extended JSON envelope** at 4 emit sites:
   - Main error sink (line ~213)
   - Session load failure in resume_session
   - Stub command (unsupported_command)
   - Unknown resumed command (unsupported_resumed_command)

## New JSON shape

```json
{
  "type": "error",
  "error": "short reason (first line)",
  "kind": "missing_credentials",
  "hint": "Hint: export ANTHROPIC_API_KEY..."
}
```

`kind` is always present. `hint` is null when no runbook follows.
`error` now carries only the short reason, not the full multi-line prose.

## Tests

Added 2 new regression tests:
- `classify_error_kind_returns_correct_discriminants` — all 9 known classes + fallback
- `split_error_hint_separates_reason_from_runbook` — with and without hints

All 179 rusty-claude-cli tests pass. Full workspace green.

Closes ROADMAP #77 Phase 1.
2026-04-21 22:38:13 +09:00
YeonGyu-Kim ff45e971aa fix: #80 — session-lookup error messages now show actual workspace-fingerprint directory
## Problem

Two session error messages advertised `.claw/sessions/` as the managed-session
location, but the actual on-disk layout is `.claw/sessions/<workspace_fingerprint>/`
where the fingerprint is a 16-char FNV-1a hash of the CWD path.

Users see error messages like:
```
no managed sessions found in .claw/sessions/
```

But the real directory is:
```
.claw/sessions/8497f4bcf995fc19/
```

The error copy was a direct lie — it made workspace-fingerprint partitioning
invisible and left users confused about whether sessions were lost or just in
a different partition.

## Fix

Updated two error formatters to accept the resolved `sessions_root` path
and extract the actual workspace-fingerprint directory:

1. **format_missing_session_reference**: now shows the actual fingerprint dir
   and explains that it's a workspace-specific partition

2. **format_no_managed_sessions**: now shows the actual fingerprint dir and
   includes a note that sessions from other CWDs are intentionally invisible

Updated all three call sites to pass `&self.sessions_root` to the formatters.

## Examples

**Before:**
```
no managed sessions found in .claw/sessions/
```

**After:**
```
no managed sessions found in .claw/sessions/8497f4bcf995fc19/
Start `claw` to create a session, then rerun with `--resume latest`.
Note: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.
```

```
session not found: nonexistent-id
Hint: managed sessions live in .claw/sessions/8497f4bcf995fc19/ (workspace-specific partition).
Try `latest` for the most recent session or `/session list` in the REPL.
```

## Impact

- Users can now tell from the error message that they're looking in the right
  directory (the one their current CWD maps to)
- The workspace-fingerprint partitioning stops being invisible
- Operators understand why sessions from adjacent CWDs don't appear
- Error copy matches the actual on-disk structure

## Tests

All 466 runtime tests pass. Verified on two real workspaces with actual
workspace-fingerprint directories.

Closes ROADMAP #80.
2026-04-21 22:18:12 +09:00
YeonGyu-Kim 3cfe6e2b14 feat: #154 — hint provider prefix and env var when model name looks like different provider
## Problem

When a user types `claw --model gpt-4` or `--model qwen-plus`, they get:
```
error: invalid model syntax: 'gpt-4'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias
```

USAGE.md documents that "The error message now includes a hint that names the detected env var" — but this hint does not actually exist. The user has to re-read USAGE.md or guess the correct prefix.

## Fix

Enhance `validate_model_syntax` to detect when a model name looks like it belongs to a different provider:

1. **OpenAI models** (starts with `gpt-` or `gpt_`):
   ```
   Did you mean `openai/gpt-4`? (Requires OPENAI_API_KEY env var)
   ```

2. **Qwen/DashScope models** (starts with `qwen`):
   ```
   Did you mean `qwen/qwen-plus`? (Requires DASHSCOPE_API_KEY env var)
   ```

3. **Grok/xAI models** (starts with `grok`):
   ```
   Did you mean `xai/grok-3`? (Requires XAI_API_KEY env var)
   ```

Unrelated invalid models (e.g., `asdfgh`) do not get a spurious hint.

## Verification

- `claw --model gpt-4` → hints `openai/gpt-4` + `OPENAI_API_KEY`
- `claw --model qwen-plus` → hints `qwen/qwen-plus` + `DASHSCOPE_API_KEY`
- `claw --model grok-3` → hints `xai/grok-3` + `XAI_API_KEY`
- `claw --model asdfgh` → generic error (no hint)

## Tests

Added 3 new assertions in `parses_multiple_diagnostic_subcommands`:
- GPT model error hints openai/ prefix and OPENAI_API_KEY
- Qwen model error hints qwen/ prefix and DASHSCOPE_API_KEY
- Unrelated models don't get a spurious hint

All 177 rusty-claude-cli tests pass.

Closes ROADMAP #154.
2026-04-21 21:40:48 +09:00
YeonGyu-Kim 79352a2d20 feat: #152 — hint `--output-format json` when user types `--json` on diagnostic verbs
## Problem

Users commonly type `claw doctor --json`, `claw status --json`, or
`claw system-prompt --json` expecting JSON output. These fail with
`unrecognized argument \`--json\` for subcommand` with no hint that
`--output-format json` is the correct flag.

## Discovery

Filed as #152 during 21:17 dogfood nudge. The #127 worktree contained
a more comprehensive patch but conflicted with #141 (unified --help).
On re-investigation of main, Bugs 1 and 3 from #127 are already closed
(positional arg rejection works, no double "error:" prefix). Only
Bug 2 (the `--json` hint) remained.

## Fix

Two call sites add the hint:

1. `parse_single_word_command_alias`'s diagnostic-verb suffix path:
   when rest[1] == "--json", append "Did you mean \`--output-format json\`?"

2. `parse_system_prompt_options` unknown-option path: same hint when
   the option is exactly `--json`.

## Verification

Before:
  $ claw doctor --json
  error: unrecognized argument `--json` for subcommand `doctor`
  Run `claw --help` for usage.

After:
  $ claw doctor --json
  error: unrecognized argument `--json` for subcommand `doctor`
  Did you mean `--output-format json`?
  Run `claw --help` for usage.

Covers: `doctor --json`, `status --json`, `sandbox --json`,
`system-prompt --json`, and any other diagnostic verb that routes
through `parse_single_word_command_alias`.

Other unrecognized args (`claw doctor garbage`) correctly don't
trigger the hint.

## Tests

- 2 new assertions in `parses_multiple_diagnostic_subcommands`:
  - `claw doctor --json` produces hint
  - `claw doctor garbage` does NOT produce hint
- 177 rusty-claude-cli tests pass
- Workspace tests green

Closes ROADMAP #152.
2026-04-21 21:23:17 +09:00
YeonGyu-Kim 7bc66e86e8 feat: #151 — canonicalize workspace path in SessionStore::from_cwd/data_dir
## Problem

`workspace_fingerprint(path)` hashes the raw path string without
canonicalization. Two equivalent paths (e.g. `/tmp/foo` vs
`/private/tmp/foo` on macOS) produce different fingerprints and
therefore different session stores. #150 fixed the test-side symptom;
this fixes the underlying product contract.

## Discovery path

#150 fix (canonicalize in test) was a workaround. Q's ack on #150
surfaced the deeper gap: the function itself is still fragile for
any caller passing a non-canonical path:

1. Embedded callers with a raw `--data-dir` path
2. Programmatic `SessionStore::from_cwd(user_path)` calls
3. NixOS store paths, Docker bind mounts, case-insensitive normalization

The REPL's default flow happens to work because `env::current_dir()`
returns canonical paths on macOS. But any caller passing a raw path
risks silent session-store divergence.

## Fix

Canonicalize inside `SessionStore::from_cwd()` and `from_data_dir()`
before computing the fingerprint. Kept `workspace_fingerprint()` itself
as a pure function for determinism — canonicalization is the entry
point's responsibility.

```rust
let canonical_cwd = fs::canonicalize(cwd).unwrap_or_else(|_| cwd.to_path_buf());
let sessions_root = canonical_cwd.join(".claw").join("sessions").join(workspace_fingerprint(&canonical_cwd));
```

Falls back to the raw path if canonicalize fails (directory doesn't
exist yet).

## Test-side updates

Three legacy-session tests expected the non-canonical base path to
match the store's workspace_root. Updated them to canonicalize
`base` after creation — same defensive pattern as #150, now
explicit across all three tests.

## Regression test

Added `session_store_from_cwd_canonicalizes_equivalent_paths` that
creates two stores from equivalent paths (raw vs canonical) and
asserts they resolve to the same sessions_dir.

## Verification

- `cargo test -p runtime session_store_` — 9/9 pass
- `cargo test --workspace` — all green, no FAILED markers
- No behavior change for existing users (REPL default flow already
  used canonical paths)

## Backward compatibility

Users on macOS who always went through `env::current_dir()`:
no hash change, sessions resume identically.

Users who ever called with a non-canonical path: hash would change,
but those sessions were already broken (couldn't be resumed from a
canonical-path cwd). Net improvement.

Closes ROADMAP #151.
2026-04-21 21:06:09 +09:00
YeonGyu-Kim eaa077bf91 fix: #150 — eliminate symlink canonicalization flake in resume_latest test + file #246 (reminder outcome ambiguity)
## #150 Fix: resume_latest test flake

**Problem:** `resume_latest_restores_the_most_recent_managed_session` intermittently
fails when run in the workspace suite or multiple times in sequence, but passes in
isolation.

**Root cause:** `workspace_fingerprint(path)` hashes the path string without
canonicalization. On macOS, `/tmp` is a symlink to `/private/tmp`. The test
creates a temp dir via `std::env::temp_dir().join(...)` which returns
`/var/folders/...` (non-canonical). When the subprocess spawns,
`env::current_dir()` returns the canonical path `/private/var/folders/...`.
The two fingerprints differ, so the subprocess looks in
`.claw/sessions/<hash1>` while files are in `.claw/sessions/<hash2>`.
Session discovery fails.

**Fix:** Call `fs::canonicalize(&project_dir)` after creating the directory
to ensure test and subprocess use identical path representations.

**Verification:** 5 consecutive runs of the full test suite — all pass.
Previously: 5/5 failed when run in sequence.

## #246 Filing: Reminder cron outcome ambiguity (control-loop blocker)

The `clawcode-dogfood-cycle-reminder` cron times out repeatedly with no
structured feedback on whether the nudge was delivered, skipped, or died in-flight.

**Phase 1 outcome schema** — add explicit field to cron result:
- `delivered` — nudge posted to Discord
- `timed_out_before_send` — died before posting
- `timed_out_after_send` — posted but cleanup timed out
- `skipped_due_to_active_cycle` — previous cycle active
- `aborted_gateway_draining` — daemon shutdown

Assigned to gaebal-gajae (cron/orchestration domain). Unblocks trustworthy
dogfood cycle observability.

Closes ROADMAP #150. Filed ROADMAP #246.
2026-04-21 21:01:09 +09:00
YeonGyu-Kim bc259ec6f9 fix: #149 — eliminate parallel-test flake in runtime::config tests
## Problem

`runtime::config::tests::validates_unknown_top_level_keys_with_line_and_field_name`
intermittently fails during `cargo test --workspace` (witnessed during
#147 and #148 workspace runs) but passes deterministically in isolation.

Example failure from workspace run:
  test result: FAILED. 464 passed; 1 failed

## Root cause

`runtime/src/config.rs::tests::temp_dir()` used nanosecond timestamp
alone for namespace isolation:

  std::env::temp_dir().join(format!("runtime-config-{nanos}"))

Under parallel test execution on fast machines with coarse clock
resolution, two tests start within the same nanosecond bucket and
collide on the same path. One test's `fs::remove_dir_all(root)` then
races another's in-flight `fs::create_dir_all()`.

Other crates already solved this pattern:
- plugins::tests::temp_dir(label) — label-parameterized
- runtime::git_context::tests::temp_dir(label) — label-parameterized

runtime/src/config.rs was missed.

## Fix

Added process id + monotonically-incrementing atomic counter to the
namespace, making every callsite provably unique regardless of clock
resolution or scheduling:

  static COUNTER: AtomicU64 = AtomicU64::new(0);
  let pid = std::process::id();
  let seq = COUNTER.fetch_add(1, Ordering::Relaxed);
  std::env::temp_dir().join(format!("runtime-config-{pid}-{nanos}-{seq}"))

Chose counter+pid over the label-parameterized pattern to avoid
touching all 20 callsites in the same commit (mechanical noise with
no added safety — counter alone is sufficient).

## Verification

Before: one failure per workspace run (config test flake).
After: 5 consecutive `cargo test --workspace` runs — zero config
test failures. Only pre-existing `resume_latest` flake remains
(orthogonal, unrelated to this change).

  for i in 1 2 3 4 5; do cargo test --workspace; done
  # All 5 runs: config tests green. Only resume_latest flake appears.

  cargo test -p runtime
  # 465 passed; 0 failed

## ROADMAP.md

Added Pinpoint #149 documenting the gap, root cause, and fix.

Closes ROADMAP #149.
2026-04-21 20:54:12 +09:00
YeonGyu-Kim f84c7c4ed5 feat: #148 + #128 closure — model provenance in claw status JSON/text
## Scope

Two deltas in one commit:

### #128 closure (docs)

Re-verified on main HEAD `4cb8fa0`: malformed `--model` strings already
rejected at parse time (`validate_model_syntax` in parse_args). All
historical repro cases now produce specific errors:

  claw --model ''                       → error: model string cannot be empty
  claw --model 'bad model'              → error: invalid model syntax: 'bad model' contains spaces
  claw --model 'sonet'                  → error: invalid model syntax: 'sonet'. Expected provider/model or known alias
  claw --model '@invalid'               → error: invalid model syntax: '@invalid'. Expected provider/model ...
  claw --model 'totally-not-real-xyz'   → error: invalid model syntax: ...
  claw --model sonnet                   → ok, resolves to claude-sonnet-4-6
  claw --model anthropic/claude-opus-4-6 → ok, passes through

Marked #128 CLOSED in ROADMAP with repro block. Residual provenance gap
split off as #148.

### #148 implementation

**Problem.** After #128 closure, `claw status --output-format json`
still surfaces only the resolved model string. No way for a claw to
distinguish whether `claude-sonnet-4-6` came from `--model sonnet`
(alias resolution) vs `--model claude-sonnet-4-6` (pass-through) vs
`ANTHROPIC_MODEL` env vs `.claw.json` config vs compiled-in default.

Debug forensics had to re-read argv instead of reading a structured
field. Clawhip orchestrators sending `--model` couldn't confirm the
flag was honored vs falling back to default.

**Fix.** Added two fields to status JSON envelope:
- `model_source`: "flag" | "env" | "config" | "default"
- `model_raw`: user's input before alias resolution (null on default)

Text mode appends a `Model source` line under `Model`, showing the
source and raw input (e.g. `Model source     flag (raw: sonnet)`).

**Resolution order** (mirrors resolve_repl_model but with source
attribution):
1. If `--model` / `--model=` flag supplied → source: flag, raw: flag value
2. Else if ANTHROPIC_MODEL set → source: env, raw: env value
3. Else if `.claw.json` model key set → source: config, raw: config value
4. Else → source: default, raw: null

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

- Added `ModelSource` enum (Flag/Env/Config/Default) with `as_str()`.
- Added `ModelProvenance` struct (resolved, raw, source) with
  three constructors: `default_fallback()`, `from_flag(raw)`, and
  `from_env_or_config_or_default(cli_model)`.
- Added `model_flag_raw: Option<String>` field to `CliAction::Status`.
- Parse loop captures raw input in `--model` and `--model=` arms.
- Extended `parse_single_word_command_alias` to thread
  `model_flag_raw: Option<&str>` through.
- Extended `print_status_snapshot` signature to accept
  `model_flag_raw: Option<&str>`. Resolves provenance at dispatch time
  (flag provenance from arg; else probe env/config/default).
- Extended `status_json_value` signature with
  `provenance: Option<&ModelProvenance>`. On Some, adds `model_source`
  and `model_raw` fields; on None (legacy resume paths), omits them
  for backward compat.
- Extended `format_status_report` signature with optional provenance.
  On Some, renders `Model source` line after `Model`.
- Updated all existing callers (REPL /status, resume /status, tests)
  to pass None (legacy paths don't carry flag provenance).
- Added 2 regression assertions in parse_args test covering both
  `--model sonnet` and `--model=...` forms.

### ROADMAP.md

- Marked #128 CLOSED with re-verification block.
- Filed #148 documenting the provenance gap split, fix shape, and
  acceptance criteria.

## Live verification

$ claw --model sonnet --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-sonnet-4-6", "model_source": "flag", "model_raw": "sonnet"}

$ claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-6", "model_source": "default", "model_raw": null}

$ ANTHROPIC_MODEL=haiku claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-haiku-4-5-20251213", "model_source": "env", "model_raw": "haiku"}

$ echo '{"model":"claude-opus-4-7"}' > .claw.json && claw --output-format json status | jq '{model,model_source,model_raw}'
{"model": "claude-opus-4-7", "model_source": "config", "model_raw": "claude-opus-4-7"}

$ claw --model sonnet status
Status
  Model            claude-sonnet-4-6
  Model source     flag (raw: sonnet)
  Permission mode  danger-full-access
  ...

## Tests

- rusty-claude-cli bin: 177 tests pass (2 new assertions for #148)
- Full workspace green except pre-existing resume_latest flake (unrelated)

Closes ROADMAP #128, #148.
2026-04-21 20:48:46 +09:00
YeonGyu-Kim 4cb8fa059a feat: #147 — reject empty / whitespace-only prompts at CLI fallthrough
## Problem

The `"prompt"` subcommand arm enforced `if prompt.trim().is_empty()`
and returned a specific error. The fallthrough `other` arm in the same
match block — which routes any unrecognized first positional arg to
`CliAction::Prompt` — had no such guard. Result:

$ claw ""
error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN ...

$ claw "   "
error: missing Anthropic credentials; ...

$ claw "" ""
error: missing Anthropic credentials; ...

$ claw --output-format json ""
{"error":"missing Anthropic credentials; ...","type":"error"}

An empty prompt should never reach the credentials check. Worse: with
valid credentials, the literal empty string gets sent to Claude as a
user prompt, either burning tokens for nothing or triggering a model-
side refusal. Same prompt-misdelivery family as #145.

## Root cause

In `parse_subcommand()`, the final `other =>` arm in the top-level
match only guards against typos (#108 guard via `looks_like_subcommand_typo`)
and then unconditionally builds `CliAction::Prompt { prompt: rest.join(" ") }`.
An empty/whitespace-only join passes through.

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

Added the same `if joined.trim().is_empty()` guard already used in the
`"prompt"` arm to the fallthrough path. Error message distinguishes it
from the `prompt` subcommand path:

  empty prompt: provide a subcommand (run `claw --help`) or a
  non-empty prompt string

Runs AFTER the typo guard (so `claw sttaus` still suggests `status`)
and BEFORE CliAction::Prompt construction (so no network call ever
happens for empty inputs).

### Regression tests

Added 4 assertions in the existing parse_args test:
- parse_args([""]) → Err("empty prompt: ...")
- parse_args(["   "]) → Err("empty prompt: ...")
- parse_args(["", ""]) → Err("empty prompt: ...")
- parse_args(["sttaus"]) → Err("unknown subcommand: ...") [verifies #108 typo guard still takes precedence]

### ROADMAP.md

Added Pinpoint #147 documenting the gap, verification, root cause,
fix shape, and acceptance. Joins the prompt-misdelivery cluster
alongside #145.

## Live verification

$ claw ""
error: empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string

$ claw "   "
error: empty prompt: provide a subcommand (run `claw --help`) or a non-empty prompt string

$ claw --output-format json ""
{"error":"empty prompt: provide a subcommand ...","type":"error"}

$ claw prompt ""   # unchanged: subcommand-specific error preserved
error: prompt subcommand requires a prompt string

$ claw hello        # unchanged: typo guard still fires
error: unknown subcommand: hello.
  Did you mean     help

$ claw "real prompt here"   # unchanged: real prompts still reach API
error: api returned 401 Unauthorized (with dummy key, as expected)

All empty/whitespace-only paths exit 1. No network call. No misleading
credentials error.

## Tests

- rusty-claude-cli bin: 177 tests pass (4 new assertions)
- Full workspace green except pre-existing resume_latest flake (unrelated)

Closes ROADMAP #147.
2026-04-21 20:35:17 +09:00
YeonGyu-Kim f877acacbf feat: #146 — wire `claw config` and `claw diff` as standalone subcommands
## Problem

`claw config` and `claw diff` are pure-local read-only introspection
commands (config merges .claw.json + .claw/settings.json from disk; diff
shells out to `git diff --cached` + `git diff`). Neither needs a session
context, yet both rejected direct CLI invocation:

$ claw config
error: `claw config` is a slash command. Use `claw --resume SESSION.jsonl /config` ...

$ claw diff
error: `claw diff` is a slash command. ...

This forced clawing operators to spin up a full session just to inspect
static disk state, and broke natural pipelines like
`claw config --output-format json | jq`.

## Root cause

Sibling of #145: `SlashCommand::Config { section }` and
`SlashCommand::Diff` had working renderers (`render_config_report`,
`render_config_json`, `render_diff_report`, `render_diff_json_for`)
exposed for resume sessions, but the top-level CLI parser in
`parse_subcommand()` had no arms for them. Zero-arg `config`/`diff`
hit `parse_single_word_command_alias`'s fallback to
`bare_slash_command_guidance`, producing the misleading guidance.

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

- Added `CliAction::Config { section, output_format }` and
  `CliAction::Diff { output_format }` variants.
- Added `"config"` / `"diff"` arms to the top-level parser in
  `parse_subcommand()`. `config` accepts an optional section name
  (env|hooks|model|plugins) matching SlashCommand::Config semantics.
  `diff` takes no positional args. Both reject extra trailing args
  with a clear error.
- Added `"config" | "diff" => None` to
  `parse_single_word_command_alias` so bare invocations fall through
  to the new parser arms instead of the slash-guidance error.
- Added dispatch in run() that calls existing renderers: text mode uses
  `render_config_report` / `render_diff_report`; JSON mode uses
  `render_config_json` / `render_diff_json_for` with
  `serde_json::to_string_pretty`.
- Added 5 regression assertions in parse_args test covering:
  parse_args(["config"]), parse_args(["config", "env"]),
  parse_args(["config", "--output-format", "json"]),
  parse_args(["diff"]), parse_args(["diff", "--output-format", "json"]).

### ROADMAP.md

Added Pinpoint #146 documenting the gap, verification, root cause,
fix shape, and acceptance. Explicitly notes which other slash commands
(`hooks`, `usage`, `context`, etc.) are NOT candidates because they
are session-state-modifying.

## Live verification

$ claw config   # no config files
Config
  Working directory /private/tmp/cd-146-verify
  Loaded files      0
  Merged keys       0
Discovered files
  user    missing ...
  project missing ...
  local   missing ...
Exit 0.

$ claw config --output-format json
{
  "cwd": "...",
  "files": [...],
  ...
}

$ claw diff   # no git
Diff
  Result           no git repository
  Detail           ...
Exit 0.

$ claw diff --output-format json   # inside claw-code
{
  "kind": "diff",
  "result": "changes",
  "staged": "",
  "unstaged": "diff --git ..."
}
Exit 0.

## Tests

- rusty-claude-cli bin: 177 tests pass (5 new assertions in parse_args)
- Full workspace green except pre-existing resume_latest flake (unrelated)

## Not changed

`hooks`, `usage`, `context`, `tasks`, `theme`, `voice`, `rename`,
`copy`, `color`, `effort`, `branch`, `rewind`, `ide`, `tag`,
`output-style`, `add-dir` — all session-mutating or interactive-only;
correctly remain slash-only.

Closes ROADMAP #146.
2026-04-21 20:07:28 +09:00
YeonGyu-Kim 7d63699f9f feat: #145 — wire `claw plugins` subcommand to CLI parser (prompt misdelivery fix)
## Problem

`claw plugins` (and `claw plugins list`, `claw plugins --help`,
`claw plugins info <name>`, etc.) fell through the top-level subcommand
match and got routed into the prompt-execution path. Result: a purely
local introspection command triggered an Anthropic API call and surfaced
`missing Anthropic credentials` to the user. With valid credentials, it
would actually send the literal string "plugins" as a user prompt to
Claude, burning tokens for a local query.

$ claw plugins
error: missing Anthropic credentials; export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY before calling the Anthropic API

$ ANTHROPIC_API_KEY=dummy claw plugins
⠋ 🦀 Thinking...
✘  Request failed
error: api returned 401 Unauthorized

Meanwhile siblings (`agents`, `mcp`, `skills`) all worked correctly:

$ claw agents
No agents found.
$ claw mcp
MCP
  Working directory ...
  Configured servers 0

## Root cause

`CliAction::Plugins` exists, has a working dispatcher
(`LiveCli::print_plugins`), and is produced inside the REPL via
`SlashCommand::Plugins`. But the top-level CLI parser in
`parse_subcommand()` had arms for `agents`, `mcp`, `skills`, `status`,
`doctor`, `init`, `export`, `prompt`, etc., and **no arm for
`plugins`**. The dispatch never ran from the CLI entry point.

## Changes

### rust/crates/rusty-claude-cli/src/main.rs

Added a `"plugins"` arm to the top-level match in `parse_subcommand()`
that produces `CliAction::Plugins { action, target, output_format }`,
following the same positional convention as `mcp` (`action` = first
positional, `target` = second). Rejects >2 positional args with a clear
error.

Added four regression assertions in the existing `parse_args` test:
- `plugins` alone → `CliAction::Plugins { action: None, target: None }`
- `plugins list` → action: Some("list"), target: None
- `plugins enable <name>` → action: Some("enable"), target: Some(...)
- `plugins --output-format json` → action: None, output_format: Json

### ROADMAP.md

Added Pinpoint #145 documenting the gap, verification, root cause,
fix shape, and acceptance.

## Live verification

$ claw plugins   # no credentials set
Plugins
  example-bundled      v0.1.0      disabled
  sample-hooks         v0.1.0      disabled

$ claw plugins --output-format json   # no credentials set
{
  "action": "list",
  "kind": "plugin",
  "message": "Plugins\n  example-bundled ...\n  sample-hooks ...",
  "reload_runtime": false,
  "target": null
}

Exit 0 in all modes. No network call. No "missing credentials" error.

## Tests

- rusty-claude-cli bin: 177 tests pass (new plugin assertions included)
- Full workspace green except pre-existing resume_latest flake (unrelated)

Closes ROADMAP #145.
2026-04-21 19:36:49 +09:00
YeonGyu-Kim faeaa1d30c feat: #144 phase 1 + ROADMAP filing — claw mcp degrades gracefully on malformed config
Filing + Phase 1 fix in one commit (sibling of #143).

## Context

With #143 Phase 1 landed (`claw status` degrades), `claw mcp` was the
remaining diagnostic surface that hard-failed on a malformed `.claw.json`.
Same input, same parse error, same partial-success violation. Fresh
dogfood at 18:59 KST caught it on main HEAD `e2a43fc`.

## Changes

### ROADMAP.md
Added Pinpoint #144 documenting the gap and acceptance criteria. Joins
the partial-success / Principle #5 cluster with #143.

### rust/crates/commands/src/lib.rs
`render_mcp_report_for()` + `render_mcp_report_json_for()` now catch the
ConfigError at loader.load() instead of propagating:

- **Text mode** prepends a "Config load error" block (same shape as
  #143's status output) before the MCP listing. The listing still renders
  with empty servers so the output structure is preserved.
- **JSON mode** adds top-level `status: "ok" | "degraded"` +
  `config_load_error: string | null` fields alongside existing fields
  (`kind`, `action`, `working_directory`, `configured_servers`,
  `servers[]`). On clean runs, `status: "ok"` and
  `config_load_error: null`. On parse failure, `status: "degraded"`,
  `config_load_error: "..."`, `servers: []`, exit 0.
- Both list and show actions get the same treatment.

### Regression test
`commands::tests::mcp_degrades_gracefully_on_malformed_mcp_config_144`:
- Injects the same malformed .claw.json as #143 (one valid + one broken
  mcpServers entry).
- Asserts mcp list returns Ok (not Err).
- Asserts top-level status: "degraded" and config_load_error names the
  malformed field path.
- Asserts show action also degrades.
- Asserts clean path returns status: "ok" with config_load_error null.

## Live verification

$ claw mcp --output-format json
{
  "action": "list",
  "kind": "mcp",
  "status": "degraded",
  "config_load_error": ".../.claw.json: mcpServers.missing-command: missing string field command",
  "working_directory": "/Users/yeongyu/clawd",
  "configured_servers": 0,
  "servers": []
}
Exit 0.

## Contract alignment after this commit

All three diagnostic surfaces match now:
- `doctor` — degraded envelope with typed check entries 
- `status` — degraded envelope with config_load_error  (#143)
- `mcp` — degraded envelope with config_load_error  (this commit)

Phase 2 (typed-error object joining taxonomy §4.44) tracked separately
across all three surfaces.

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #144 phase 1.
2026-04-21 19:07:17 +09:00
YeonGyu-Kim e2a43fcd49 feat: #143 phase 1 — claw status degrades gracefully on malformed config
Previously `claw status` hard-failed on any config parse error, emitting
a bare error string and exiting 1. This took down the entire health
surface for a single malformed MCP entry, even though workspace, git,
model, permission, and sandbox state could all be reported independently.

`claw doctor` already degraded gracefully on the exact same input.
This commit matches `claw status` to that contract.

Changes:
- Add `StatusContext::config_load_error: Option<String>` to capture parse
  errors without aborting.
- Rewrite `status_context()` to match on `ConfigLoader::load()`: on Err,
  fall back to default `SandboxConfig` for sandbox resolution and record
  the parse error, then continue populating workspace/git/memory fields.
- JSON output gains top-level `status: "ok" | "degraded"` marker and a
  `config_load_error` string (null on clean runs). All other existing
  fields preserved for backward compat.
- Text output prepends a "Config load error" block with Details + Hint
  when config failed to parse, then a "Status (degraded)" header on the
  main block. Clean runs show the usual "Status" header.
- Doctor path updated to pass the config load error through StatusContext.

Regression test `status_degrades_gracefully_on_malformed_mcp_config_143`:
- Injects a .claw.json with one valid + one malformed mcpServers entry
- Asserts status_context() returns Ok (not Err)
- Asserts config_load_error names the malformed field path
- Asserts workspace/sandbox fields still populated in JSON
- Asserts top-level status is 'degraded'
- Asserts clean config path still returns status: 'ok'

Verified live on /Users/yeongyu/clawd (contains deliberately broken MCP entries):
  $ claw status --output-format json
  { "status": "degraded",
    "config_load_error": ".../mcpServers.missing-command: missing string field command",
    "model": "claude-opus-4-6",
    "workspace": {...},
    "sandbox": {...},
    ... }

Phase 2 (typed error object joining #4.44 taxonomy) tracked separately.

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #143 phase 1.
2026-04-21 18:37:42 +09:00
YeonGyu-Kim 541c5bb95d feat: #139 actionable worker-state guidance in claw state error + help
Previously `claw state` errored with "no worker state file found ... — run a
worker first" but there is no `claw worker` subcommand, so claws had no
discoverable path from the error to a fix.

Changes:
- Rewrite the missing-state error to name the two concrete commands that
  produce .claw/worker-state.json:
    * `claw` (interactive REPL, writes state on first turn)
    * `claw prompt <text>` (one non-interactive turn)
  Also tell the user what to rerun: `claw state [--output-format json]`.
- Expand the State --help topic with "Produces state", "Observes state",
  and "Exit codes" lines so the worker-state contract is discoverable
  before the user hits the error.
- Add regression test state_error_surfaces_actionable_worker_commands_139
  asserting the error contains `claw prompt`, REPL mention, and the
  rerun path, plus that the help topic documents the producer contract.

Verified live:
  $ claw state
  error: no worker state file found at .claw/worker-state.json
    Hint: worker state is written by the interactive REPL or a non-interactive prompt.
    Run:   claw               # start the REPL (writes state on first turn)
    Or:    claw prompt <text> # run one non-interactive turn
    Then rerun: claw state [--output-format json]

JSON mode preserves the full hint inside the error envelope so CI/claws
can match on `claw prompt` without losing the canonical prefix.

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #139.
2026-04-21 18:04:04 +09:00
YeonGyu-Kim 611eed1537 feat: #142 structured fields in claw init --output-format json
Previously `claw init --output-format json` emitted a valid JSON envelope but
packed the entire human-formatted output into a single `message` string. Claw
scripts had to substring-match human language to tell `created` from `skipped`.

Changes:
- Add InitStatus::json_tag() returning machine-stable "created"|"updated"|"skipped"
  (unlike label() which includes the human " (already exists)" suffix).
- Add InitReport::NEXT_STEP constant so claws can read the next-step hint
  without grepping the message string.
- Add InitReport::artifacts_with_status() to partition artifacts by state.
- Add InitReport::artifact_json_entries() for the structured artifacts[] array.
- Rewrite run_init + init_json_value to emit first-class fields alongside the
  legacy message string (kept for text consumers): project_path, created[],
  updated[], skipped[], artifacts[], next_step, message.
- Update the slash-command Init dispatch to use the same structured JSON.
- Add regression test artifacts_with_status_partitions_fresh_and_idempotent_runs
  asserting both fresh + idempotent runs produce the right partitioning and
  that the machine-stable tag is bare 'skipped' not label()'s phrasing.

Verified output:
- Fresh dir: created[] has 4 entries, skipped[] empty
- Idempotent call: created[] empty, skipped[] has 4 entries
- project_path, next_step as first-class keys
- message preserved verbatim for backward compat

Full workspace test green except pre-existing resume_latest flake (unrelated).

Closes ROADMAP #142.
2026-04-21 17:42:00 +09:00
YeonGyu-Kim 7763ca3260 feat: #141 unify claw <subcommand> --help contract across all 14 subcommands
Previously, `claw <subcommand> --help` had 5 different behaviors:
- 7 subcommands returned subcommand-specific help (correct)
- init/export/state/version silently fell back to global `claw --help`
- system-prompt/dump-manifests errored with `unknown <cmd> option: --help`
- bootstrap-plan printed its phase list instead of help text

Changes:
- Extend LocalHelpTopic enum with Init, State, Export, Version, SystemPrompt,
  DumpManifests, BootstrapPlan variants.
- Extend parse_local_help_action() to resolve those 7 subcommands to their
  local help topic instead of falling through to the main dispatch.
- Remove init/state/export/version from the explicit wants_help=true matcher
  so they reach parse_local_help_action() before being routed to global help.
- Add render_help_topic() entries for the 7 new topics with consistent
  Usage/Purpose/Output/Formats/Related structure.
- Add regression test subcommand_help_flag_has_one_contract_across_all_subcommands_141
  asserting every documented subcommand + both --help and -h variants resolve
  to a HelpTopic with non-empty text that contains a Usage line.

Verification:
- All 14 subcommands now return subcommand-specific help (live dogfood).
- Full workspace test green except pre-existing resume_latest flake.

Closes ROADMAP #141.
2026-04-21 17:36:48 +09:00
YeonGyu-Kim 27ffd75f03 fix: #140 isolate test cwd + env in punctuation_bearing_single_token test
Previously this test inherited the cargo test runner's CWD, which could contain
a stale .claw/settings.json with "permissionMode": "acceptEdits" written by
another test. The deprecated-field resolver then silently downgraded the
default permission mode to WorkspaceWrite, breaking the test's assertion.

Fix: wrap the assertion in with_current_dir() + env_lock() so the test runs in
an isolated temp directory with no stale config.

Full workspace test now passes except for pre-existing resume_latest flake
(unrelated to #140, environment-dependent, tracked separately).

Closes ROADMAP #140.
2026-04-21 16:34:58 +09:00
YeonGyu-Kim f3f6643fb9 feat: #108 add did-you-mean guard for subcommand typos (prevents silent LLM dispatch)
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-21 15:37:58 +09:00
YeonGyu-Kim a8beca1463 fix: #136 support --output-format json with --compact flag
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-21 14:47:15 +09:00
YeonGyu-Kim 21adae9570 fix: #137 update test fixtures to use canonical 'opus' alias for main branch consistency
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-21 14:32:49 +09:00
YeonGyu-Kim 50e3fa3a83 docs: add --output-format to diagnostic verb help text
Updated LocalHelpTopic help strings to surface --output-format support:
- Status, Sandbox, Doctor, Acp all now show [--output-format <format>]
- Added 'Formats: text (default), json' line to each

Diagnostic verbs support JSON output but help text didn't advertise it.
Post-#127 fix: help text now matches actual CLI surface.

Verified: cargo build passes, claw doctor --help shows output-format.

Refs: #127
2026-04-20 21:32:02 +09:00
YeonGyu-Kim a3270db602 fix: #127 reject unrecognized suffix args for diagnostic verbs
Diagnostic verbs (help, version, status, sandbox, doctor, state) now
reject unrecognized suffix arguments at parse time instead of silently
falling through to Prompt dispatch.

Fixes: claw doctor --json (and similar) no longer accepts --json silently
and attempts to send it to the LLM as a prompt. Now properly emits:
'unrecognized argument `--json` for subcommand `doctor`'

Joined parser-level trust gap quintet #108 + #117 + #119 + #122 + #127.
Prevents token burn on rejected arguments.

Verified: cargo build --workspace passes, claw doctor --json errors cleanly.

Refs: #127, ROADMAP
2026-04-20 19:23:35 +09:00
YeonGyu-Kim 12f1f9a74e feat: wire ship.prepared provenance emission at bash execution boundary
Adds ship provenance detection and emission in execute_bash_async():
- Detects git push to main/master commands
- Captures current branch, HEAD commit, git user as actor
- Emits ship.prepared event with ShipProvenance payload
- Logs to stderr as interim routing (event stream integration pending)

This is the first wired provenance event — schema (§4.44.5) now has
runtime emission at actual git operation boundary.

Verified: cargo build --workspace passes.
Next: wire ship.commits_selected, ship.merged, ship.pushed_main events.

Refs: §4.44.5.1, ROADMAP #4.44.5
2026-04-20 17:03:28 +09:00
YeonGyu-Kim 2678fa0af5 fix: #124 --model validation rejects malformed syntax at parse time
Adds validate_model_syntax() that rejects:
- Empty strings
- Strings with spaces (e.g., 'bad model')
- Invalid provider/model format

Accepts:
- Known aliases (opus, sonnet, haiku)
- Valid provider/model format (provider/model)

Wired into parse_args for both --model <value> and --model=<value> forms.
Errors exit with clear message before any API calls (no token burn).

Verified:
- 'claw --model "bad model" version' → error, exit 1
- 'claw --model "" version' → error, exit 1
- 'claw --model opus version' → works
- 'claw --model anthropic/claude-opus-4-6 version' → works

Refs: ROADMAP #124 (debbcbe cluster — parser-level trust gap family)
2026-04-20 16:32:17 +09:00
YeonGyu-Kim b9990bb27c fix: #122 + #125 doctor consistency and git_state clarity
#122: doctor invocation now checks stale-base condition
- Calls run_stale_base_preflight(None) in render_doctor_report()
- Emits stale-base warnings to stderr when branch is behind main
- Fixes inconsistency: doctor 'ok' vs prompt 'stale base' warning

#125: git_state field reflects non-git directories
- When !in_git_repo, git_state = 'not in git repo' instead of 'clean'
- Fixes contradiction: in_git_repo: false but git_state: 'clean'
- Applied in both doctor text output and status JSON

Verified: cargo build --workspace passes.

Refs: ROADMAP #122 (dd73962), #125 (debbcbe)
2026-04-20 16:13:43 +09:00
YeonGyu-Kim f33c315c93 fix: #122 doctor invocation now checks stale-base condition
Adds run_stale_base_preflight(None) call to render_doctor_report() so that
claw doctor emits stale-base warnings to stderr when the current branch is
behind main. Previously doctor reported 'ok' even when branch was stale,
creating inconsistency with prompt path warnings.

Fixes silent-state inventory gap: doctor now consistent with prompt/repl
stale-base checking. No behavior change for non-stale branches.

Verified: cargo build --workspace passes, no test failures.

Ref: ROADMAP #122 dogfood filing @ dd73962
2026-04-20 15:49:56 +09:00
YeonGyu-Kim 8a8ca8a355 ROADMAP #4.44.5: Ship/provenance events — implement §4.44.5
Adds structured ship provenance surface to eliminate delivery-path opacity:

New lane events:
- ship.prepared — intent to ship established
- ship.commits_selected — commit range locked
- ship.merged — merge completed with provenance
- ship.pushed_main — delivery to main confirmed

ShipProvenance struct carries:
- source_branch, base_commit
- commit_count, commit_range
- merge_method (direct_push/fast_forward/merge_commit/squash_merge/rebase_merge)
- actor, pr_number

Constructor methods added to LaneEvent for all four ship events.

Tests:
- Wire value serialization for ship events
- Round-trip deserialization
- Canonical event name coverage

Runtime: 465 tests pass
ROADMAP updated with IMPLEMENTED status

This closes the gap where 56 commits pushed to main had no structured
provenance trail — now emits first-class events for clawhip consumption.
2026-04-20 15:06:50 +09:00
YeonGyu-Kim b0b579ebe9 ROADMAP #133: Blocked-state subphase contract — implement §6.5
Adds BlockedSubphase enum with 7 variants for structured blocked-state reporting:
- blocked.trust_prompt — trust gate blockers
- blocked.prompt_delivery — prompt misdelivery
- blocked.plugin_init — plugin startup failures
- blocked.mcp_handshake — MCP connection issues
- blocked.branch_freshness — stale branch blockers
- blocked.test_hang — test timeout/hang
- blocked.report_pending — report generation stuck

LaneEventBlocker now carries optional subphase field that gets serialized
into LaneEvent data. Enables clawhip to route recovery without pane scraping.

Updates:
- lane_events.rs: BlockedSubphase enum, LaneEventBlocker.subphase field
- lane_events.rs: blocked()/failed() constructors with subphase serialization
- lib.rs: Export BlockedSubphase
- tools/src/lib.rs: classify_lane_blocker() with subphase: None
- Test imports and fixtures updated

Backward-compatible: subphase is Option<>, existing events continue to work.
2026-04-20 15:04:08 +09:00
YeonGyu-Kim c956f78e8a ROADMAP #4.44.5: Ship/provenance opacity — filed from dogfood
Added structured delivery-path contract to surface branch → merge → main-push
provenance as first-class events. Filed from the 56-commit 2026-04-20 push
that exposed the gap.

Also fixes: ApiError test compilation — add suggested_action: None to 4 sites

- Line ~8414: opaque_provider_wrapper_surfaces_failure_class_session_and_trace
- Line ~8436: retry_exhaustion_uses_retry_failure_class_for_generic_provider_wrapper
- Line ~8499: provider_context_window_errors_are_reframed_with_same_guidance
- Line ~8533: retry_wrapped_context_window_errors_keep_recovery_guidance
2026-04-20 14:35:07 +09:00
Yeachan-Heo 00d0eb61d4 US-024: Add token limit metadata for kimi models
Add ModelTokenLimit entries for kimi-k2.5 and kimi-k1.5 to enable
preflight context window validation. Per Moonshot AI documentation:
- Context window: 256,000 tokens
- Max output: 16,384 tokens

Includes 3 unit tests:
- returns_context_window_metadata_for_kimi_models
- kimi_alias_resolves_to_kimi_k25_token_limits
- preflight_blocks_oversized_requests_for_kimi_models

All tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 04:15:38 +00:00
Yeachan-Heo d037f9faa8 Fix strip_routing_prefix to handle kimi provider prefix (US-023)
Add "kimi" to the strip_routing_prefix matches so that models like
"kimi/kimi-k2.5" have their prefix stripped before sending to the
DashScope API (consistent with qwen/openai/xai/grok handling).

Also add unit test strip_routing_prefix_strips_kimi_provider_prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:50:15 +00:00
Yeachan-Heo cec8d17ca8 Implement US-023: Add automatic routing for kimi models to DashScope
Changes in rust/crates/api/src/providers/mod.rs:
- Add 'kimi' alias to MODEL_REGISTRY resolving to 'kimi-k2.5' with DashScope config
- Add kimi/kimi- prefix routing to DashScope endpoint in metadata_for_model()
- Add resolve_model_alias() handling for kimi -> kimi-k2.5
- Add unit tests: kimi_prefix_routes_to_dashscope, kimi_alias_resolves_to_kimi_k2_5

Users can now use:
- --model kimi (resolves to kimi-k2.5)
- --model kimi-k2.5 (auto-routes to DashScope)
- --model kimi/kimi-k2.5 (explicit provider prefix)

All 127 tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:44:21 +00:00
Yeachan-Heo 4cb1db9faa Implement US-022: Enhanced error context for API failures
Add structured error context to API failures:
- Request ID tracking across retries with full context in error messages
- Provider-specific error code mapping with actionable suggestions
- Suggested user actions for common error types (401, 403, 413, 429, 500, 502-504)
- Added suggested_action field to ApiError::Api variant
- Updated enrich_bearer_auth_error to preserve suggested_action

Files changed:
- rust/crates/api/src/error.rs: Add suggested_action field, update Display
- rust/crates/api/src/providers/openai_compat.rs: Add suggested_action_for_status()
- rust/crates/api/src/providers/anthropic.rs: Update error handling

All tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:15:00 +00:00
Yeachan-Heo 5e65b33042 US-021: Add request body size pre-flight check for OpenAI-compatible provider 2026-04-16 17:41:57 +00:00
Yeachan-Heo 87b982ece5 US-011: Performance optimization for API request serialization
Added criterion benchmarks and optimized flatten_tool_result_content:
- Added criterion dev-dependency and request_building benchmark suite
- Optimized flatten_tool_result_content to pre-allocate capacity and avoid
  intermediate Vec construction (was collecting to Vec then joining)
- Made key functions public for benchmarking: translate_message,
  build_chat_completion_request, flatten_tool_result_content,
  is_reasoning_model, model_rejects_is_error_field

Benchmark results:
- flatten_tool_result_content/single_text: ~17ns
- translate_message/text_only: ~200ns
- build_chat_completion_request/10 messages: ~16.4µs
- is_reasoning_model detection: ~26-42ns

All 119 unit tests and 29 integration tests pass.
cargo clippy passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:11:45 +00:00
Yeachan-Heo 3e4e1585b5 US-009: Add comprehensive unit tests for kimi model compatibility fix
Added 4 unit tests to verify is_error field handling for kimi models:
- model_rejects_is_error_field_detects_kimi_models: Detects kimi-k2.5, kimi-k1.5, dashscope/kimi-k2.5 (case insensitive)
- translate_message_includes_is_error_for_non_kimi_models: Verifies gpt-4o, grok-3, claude include is_error
- translate_message_excludes_is_error_for_kimi_models: Verifies kimi models exclude is_error (prevents 400 Bad Request)
- build_chat_completion_request_kimi_vs_non_kimi_tool_results: Full integration test for request building

All 119 unit tests and 29 integration tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 10:54:48 +00:00
Yeachan-Heo 866ae7562c Fix formatting in task_packet.rs for CI 2026-04-16 09:35:18 +00:00
Yeachan-Heo 1d5748f71f US-005: Typed task packet format with TaskScope enum
- Add TaskScope enum with Workspace, Module, SingleFile, Custom variants
- Update TaskPacket struct with scope_path and worktree fields
- Add validation for scope-specific requirements
- Fix tests in task_packet.rs, task_registry.rs, and tools/src/lib.rs
- Export TaskScope from runtime crate

Closes US-005 (Phase 4)
2026-04-16 09:28:42 +00:00
Yeachan-Heo 77fb62a9f1 Implement LaneEvent schema extensions for event ordering, provenance, and dedupe (US-002)
Adds comprehensive metadata support to LaneEvent for the canonical lane event schema:

- EventProvenance enum: live_lane, test, healthcheck, replay, transport
- SessionIdentity: title, workspace, purpose, with placeholder support
- LaneOwnership: owner, workflow_scope, watcher_action (Act/Observe/Ignore)
- LaneEventMetadata: seq, provenance, session_identity, ownership, nudge_id,
  event_fingerprint, timestamp_ms
- LaneEventBuilder: fluent API for constructing events with full metadata
- is_terminal_event(): detects Finished, Failed, Superseded, Closed, Merged
- compute_event_fingerprint(): deterministic fingerprint for terminal events
- dedupe_terminal_events(): suppresses duplicate terminal events by fingerprint

Provides machine-readable event provenance, session identity at creation,
monotonic sequence ordering, nudge deduplication, and terminal event suppression.

Adds 10 regression tests covering:
- Monotonic sequence ordering
- Provenance serialization round-trip
- Session identity completeness
- Ownership and workflow scope binding
- Watcher action variants
- Terminal event detection
- Fingerprint determinism and uniqueness
- Terminal event deduplication
- Builder construction with metadata
- Metadata serialization round-trip

Closes Phase 2 (partial) from ROADMAP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:12:31 +00:00
Yeachan-Heo 21909da0b5 Implement startup-no-evidence evidence bundle + classifier (US-001)
Adds typed worker.startup_no_evidence event with evidence bundle when worker
startup times out. The classifier attempts to down-rank the vague bucket into
specific failure classifications:
- trust_required
- prompt_misdelivery
- prompt_acceptance_timeout
- transport_dead
- worker_crashed
- unknown

Evidence bundle includes:
- Last known worker lifecycle state
- Pane/command being executed
- Prompt-send timestamp
- Prompt-acceptance state
- Trust-prompt detection result
- Transport health summary
- MCP health summary
- Elapsed seconds since worker creation

Includes 6 regression tests covering:
- Evidence bundle serialization
- Transport dead classification
- Trust required classification
- Prompt acceptance timeout
- Worker crashed detection
- Unknown fallback

Closes Phase 1.6 from ROADMAP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 09:05:33 +00:00
Yeachan-Heo ac45bbec15 Make ACP/Zed status obvious before users go source-diving
ROADMAP #21, #22, and #23 were already closed on current main, so the next real repo-local backlog item was the ACP/Zed discoverability gap. This adds a local `claw acp` status surface plus aliases, updates help/docs, and separates the shipped discoverability fix from the still-open daemon/protocol follow-up so editor-first users get a crisp answer immediately.

Constraint: No ACP/Zed daemon or protocol server exists in claw-code yet, so the new surface must be explicit status guidance rather than a fake implementation
Rejected: Add a pretend `acp serve` daemon path | would imply supported protocol behavior that does not exist
Rejected: Docs-only clarification | still leaves `claw --help` unable to answer the editor-launch question directly
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep ROADMAP discoverability fixes separate from future ACP daemon/protocol work so help text and backlog IDs stay unambiguous
Tested: cargo fmt --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; cargo run -q -p rusty-claude-cli -- acp; cargo run -q -p rusty-claude-cli -- --output-format json acp; architect review APPROVED
Not-tested: Real ACP/Zed daemon launch because no protocol-serving surface exists yet
2026-04-16 03:13:50 +00:00
Yeachan-Heo e874bc6a44 Improve malformed hook failures so operators can diagnose broken JSON
Malformed hook stdout that looks like JSON was collapsing into low-signal failure text during hook execution. This change preserves plain-text hook feedback for normal text hooks, but upgrades malformed JSON-like output into an explicit hook_invalid_json diagnostic that includes phase, tool, command, and bounded stdout/stderr previews. It also adds a regression test for malformed-but-nonempty output.

Constraint: User scoped the implementation to rust/crates/runtime/src/hooks.rs and tests only
Constraint: Existing plain-text hook feedback must remain intact for non-JSON hook output
Rejected: Treat every non-JSON stdout payload as invalid JSON | would break legitimate plain-text hook feedback
Confidence: high
Scope-risk: narrow
Directive: Keep malformed-hook diagnostics bounded and preserve the plain-text fallback for hooks that intentionally emit text
Tested: cargo test --manifest-path rust/Cargo.toml -p runtime hooks::tests:: -- --nocapture
Tested: cargo test --manifest-path rust/Cargo.toml -p runtime -- --nocapture
Tested: cargo clippy --manifest-path rust/Cargo.toml -p runtime --all-targets -- -D warnings
Not-tested: Full workspace clippy/test sweep outside runtime crate
2026-04-13 12:44:52 +00:00
Yeachan-Heo 6a957560bd Make recovery handoffs explain why a lane resumed instead of leaking control prose
Recent OMX dogfooding kept surfacing raw `[OMX_TMUX_INJECT]`
messages as lane results, which told operators that tmux reinjection
happened but not why or what lane/state it applied to. The lane-finished
persistence path now recognizes that control prose, stores structured
recovery metadata, and emits a human-meaningful fallback summary instead
of preserving the raw marker as the primary result.

Constraint: Keep the fix in the existing lane-finished metadata surface rather than inventing a new runtime channel
Rejected: Treat all reinjection prose as ordinary quality-floor mush | loses the recovery cause and target lane operators actually need
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Recovery classification is heuristic; extend the parser only when new operator phrasing shows up in real dogfood evidence
Tested: cargo fmt --all --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: cargo test --workspace
Tested: LSP diagnostics on rust/crates/tools/src/lib.rs (0 errors)
Tested: Architect review (APPROVE)
Not-tested: Additional reinjection phrasings beyond the currently observed `[OMX_TMUX_INJECT]` / current-mode-state variants
Related: ROADMAP #68
2026-04-12 15:50:39 +00:00
Yeachan-Heo 42bb6cdba6 Keep local clawhip artifacts from tripping routine repo work
Dogfooding kept reproducing OMX team merge conflicts on
`.clawhip/state/prompt-submit.json`, so the init bootstrap now
teaches repos to ignore `.clawhip/` alongside the existing local
`.claw/` artifacts. This also updates the current repo ignore list
so the fix helps immediately instead of only on future `claw init`
runs.

Constraint: Keep the fix narrow and centered on repo-local ignore hygiene
Rejected: Broader team merge-hygiene changes | unnecessary for the proven local root cause
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more runtime-local artifact directories appear, extend the shared init gitignore list instead of patching repos ad hoc
Tested: cargo fmt --all --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: cargo test --workspace
Tested: Architect review (APPROVE)
Not-tested: Existing clones with already-tracked `.clawhip` files still need manual cleanup
Related: ROADMAP #75
2026-04-12 14:47:40 +00:00
Yeachan-Heo f91d156f85 Keep poisoned test locks from cascading across unrelated regressions
The repo-local backlog was effectively exhausted, so this sweep promoted the
newly observed test-lock poisoning pain point into ROADMAP #74 and fixed it in
place. Test-only env/cwd lock acquisition now recovers poisoned mutexes in the
remaining strict call sites, and each affected surface has a regression that
proves a panic no longer permanently poisons later tests.

Constraint: Keep the fix test-only and avoid widening runtime behavior changes
Rejected: Refactor shared helper signatures across broader call paths | unnecessary churn beyond the remaining strict test sites
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: These guards only recover the mutex; tests that mutate env or cwd still must restore process-global state explicitly
Tested: cargo fmt --all --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: cargo test --workspace
Tested: Architect review (APPROVE)
Not-tested: Additional fault-injection around partially restored env/cwd state after panic
Related: ROADMAP #74
2026-04-12 13:52:41 +00:00
Yeachan-Heo 6b4bb4ac26 Keep finished lanes from leaving stale reminders armed
The next repo-local sweep target was ROADMAP #66: reminder/cron
state could stay enabled after the associated lane had already
finished, which left stale nudges firing into completed work. The
fix teaches successful lane persistence to disable matching enabled
cron entries and record which reminder ids were shut down on the
finished event.

Constraint: Preserve existing cron/task registries and add the shutdown behavior only on the successful lane-finished path
Rejected: Add a separate reminder-cleanup command that operators must remember to run | leaves the completion leak unfixed at the source
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If cron-matching heuristics change later, update `disable_matching_crons`, its regression, and the ROADMAP closeout together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-process cron/reminder persistence beyond the in-memory registry used in this repo
2026-04-12 12:52:27 +00:00
Yeachan-Heo e75d67dfd3 Make successful lanes explain what artifacts they actually produced
The next repo-local sweep target was ROADMAP #64: downstream consumers
still had to infer artifact provenance from prose even though the repo
already emitted structured lane events. The fix extends `lane.finished`
metadata with structured artifact provenance so successful completions
can report roadmap ids, files, diff stat, verification state, and commit
sha without relying on narration alone.

Constraint: Preserve the existing commit-created event and lane-finished metadata paths while adding structured provenance to successful completions
Rejected: Introduce a separate artifact event type first | unnecessary for this focused closeout because `lane.finished` already carries structured data and existing consumers can read it there
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If artifact provenance extraction rules change later, update `extract_artifact_provenance`, its regression payload, and the ROADMAP closeout together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Downstream consumers that ignore `lane.finished.data.artifactProvenance` and still parse only prose output
2026-04-12 11:56:00 +00:00
Yeachan-Heo 2e34949507 Keep latest-session timestamps increasing under tight loops
The next repo-local sweep target was ROADMAP #73: repeated backlog
sweeps exposed that session writes could share the same wall-clock
millisecond, which made semantic recency fragile and forced the
resume-latest regression to sleep between saves. The fix makes session
timestamps monotonic within the process and removes the timing hack
from the test so latest-session selection stays stable under tight
loops.

Constraint: Preserve the existing session file format while changing only the timestamp source semantics
Rejected: Keep the sleep-based test workaround | hides the real ordering hazard instead of fixing timestamp generation
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future session-recency logic must keep `current_time_millis`, ordering tests, and latest-session expectations aligned
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-process monotonicity when multiple binaries write sessions concurrently
2026-04-12 10:51:19 +00:00
Yeachan-Heo 8f53524bd3 Make backlog-scan lanes say what they actually selected
The next repo-local sweep target was ROADMAP #65: backlog-scanning
lanes could stop with prose-only summaries naming roadmap items, but
there was no machine-readable record of which items were chosen,
which were skipped, or whether the lane intended to execute, review,
or no-op. The fix teaches completed lane persistence to extract a
structured selection outcome while preserving the existing quality-
floor and review-verdict behavior for other lanes.

Constraint: Keep selection-outcome extraction on the existing `lane.finished` metadata path instead of inventing a separate event stream
Rejected: Add a dedicated selection event type first | unnecessary for this focused closeout because `lane.finished` already persists structured data downstream can read
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If backlog-scan summary conventions change later, update `extract_selection_outcome`, its regression test, and the ROADMAP closeout wording together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE after roadmap closeout update
Not-tested: Downstream consumers that may still ignore `lane.finished.data.selectionOutcome`
2026-04-12 09:54:37 +00:00
Yeachan-Heo b5e30e2975 Make completed review lanes emit machine-readable verdicts
The next repo-local sweep target was ROADMAP #67: scoped review lanes
could stop with prose-only output, leaving downstream consumers to infer
approval or rejection from later chatter. The fix teaches completed lane
persistence to recognize review-style `APPROVE`/`REJECT`/`BLOCKED`
results, attach structured verdict metadata to `lane.finished`, and keep
ordinary non-review lanes on the existing quality-floor path.

Constraint: Preserve the existing non-review lane summary path while enriching only review-style completions
Rejected: Add a brand-new lane event type just for review results | unnecessary when `lane.finished` already carries structured metadata and downstream consumers can read it there
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If review verdict parsing changes later, update `extract_review_outcome`, the finished-event payload fields, and the review-lane regression together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External consumers that may still ignore `lane.finished.data.reviewVerdict`
2026-04-12 08:49:40 +00:00
Yeachan-Heo dbc2824a3e Keep latest session selection tied to real session recency
The next repo-local sweep target was ROADMAP #72: the `latest`
managed-session alias could depend on filesystem mtime before the
session's own persisted recency markers, which made the selection
path vulnerable to coarse or misleading file timestamps. The fix
promotes `updated_at_ms` into the summary/order path, keeps CLI
wrappers in sync, and locks the mtime-vs-session-recency case with
regression coverage.

Constraint: Preserve existing managed-session storage layout while changing only the ordering signal
Rejected: Keep sorting by filesystem mtime and just sleep longer in tests | hides the semantic ordering bug instead of fixing it
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future managed-session ordering change must keep runtime and CLI summary structs aligned on the same recency fields
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-filesystem behavior where persisted session JSON cannot be read and fallback ordering uses mtime only
2026-04-12 07:49:32 +00:00
Yeachan-Heo f309ff8642 Stop repo lanes from executing the wrong task payload
The next repo-local sweep target was ROADMAP #71: a claw-code lane
accepted an unrelated KakaoTalk/image-analysis prompt even though the
lane itself was supposed to be repo-scoped work. This extends the
existing prompt-misdelivery guardrail with an optional structured task
receipt so worker boot can reject visible wrong-task context before the
lane continues executing.

Constraint: Keep the fix inside the existing worker_boot / WorkerSendPrompt control surface instead of inventing a new external OMX-only protocol
Rejected: Treat wrong-task receipts as generic shell misdelivery | loses the expected-vs-observed task context needed to debug contaminated lanes
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If task-receipt fields change later, update the WorkerSendPrompt schema, worker payload serialization, and wrong-task regression together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External orchestrators that have not yet started populating the optional task_receipt field
2026-04-12 07:00:07 +00:00
Yeachan-Heo 3b806702e7 Make the CLI point users at the real install source
The next repo-local backlog item was ROADMAP #70: users could
mistake third-party pages or the deprecated `cargo install
claw-code` path for the official install route. The CLI now
surfaces the source of truth directly in `claw doctor` and
`claw --help`, and the roadmap closeout records the change.

Constraint: Keep the fix inside repo-local Rust CLI surfaces instead of relying on docs alone
Rejected: Close #70 with README-only wording | the bug was user-facing CLI ambiguity, so the warning needed to appear in runtime help/doctor output
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If install guidance changes later, update both the doctor check payload and the help-text warning together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Third-party websites outside this repo that may still present stale install instructions
2026-04-12 04:50:03 +00:00
Yeachan-Heo 26b89e583f Keep completed lanes from ending on mushy stop summaries
The next repo-local sweep target was ROADMAP #69: completed lane
runs could persist vague control text like “commit push everyting,
keep sweeping $ralph”, which made downstream stop summaries
operationally useless. The fix adds a lane-finished quality floor
that preserves strong summaries, rewrites empty/control-only/too-
short-without-context summaries into a contextual fallback, and
records structured metadata explaining when the fallback fired.

Constraint: Keep legitimate concise lane summaries intact while improving only low-signal completions
Rejected: Blanket-rewrite every completed summary into a templated sentence | would erase useful model-authored detail from good lane outputs
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If lane-finished summary heuristics change later, update the structured `qualityFloorApplied/rawSummary/reasons/wordCount` contract and its regression tests together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External OMX consumers that may still ignore the new lane.finished data payload
2026-04-12 03:23:39 +00:00
Yeachan-Heo 4f83a81cf6 Make dump-manifests recoverable outside the inferred build tree
The backlog sweep found that the user-cited #21-#23 items were already
closed, and the next real pain point was `claw dump-manifests` failing
without a direct way to point at the upstream manifest source. This adds
an explicit `--manifests-dir` path, upgrades the failure messages to say
whether the source root or required files are missing, and updates the
ROADMAP closeout to reflect that #45 is now fixed.

Constraint: Preserve existing dump-manifests behavior when no explicit override is supplied
Rejected: Require CLAUDE_CODE_UPSTREAM for every invocation | breaks existing build-tree workflows and is unnecessarily rigid
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep manifest-source override guidance centralized so future error-path edits do not drift
Tested: cargo fmt --all; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Manual invocation against every legacy env-based manifest lookup layout
2026-04-12 02:57:11 +00:00
Yeachan-Heo b825713db3 Retire the stale slash-command backlog item without breaking verification
ROADMAP #39 was stale: current main already hides the unimplemented slash
commands from the help/completion surfaces that triggered the original report,
so the backlog entry should be marked done with current evidence instead of
staying open forever.

While rerunning the user's required Rust verification gates on the exact commit
we planned to push, clippy exposed duplicate and unused imports in the plugin
state-isolation files. Folding those cleanup fixes into the same closeout keeps
the proof honest and restores a green workspace before the backlog retirement
lands.

Constraint: User required fresh cargo fmt, cargo clippy --workspace --all-targets -- -D warnings, and cargo test --workspace before push
Rejected: Push the roadmap-only closeout without fixing the workspace | would violate the required verification gate and leave main red
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Re-run the full Rust workspace gates on the exact commit you intend to push when retiring stale roadmap items
Tested: cargo fmt; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: No manual interactive REPL completion/help smoke test beyond the existing automated coverage
2026-04-12 00:59:29 +00:00
YeonGyu-Kim 06d1b8ac87 docs(roadmap): add #68 — internal reinjection/resume path opacity
OMX lanes leaking internal control prose like [OMX_TMUX_INJECT]
instead of operator-meaningful state. Adding requirement for
structured recovery/reinject events with clear cause, preserved
state, and target lane info.

Also fixes merge conflict in test_isolation.rs.

Source: gaebal-gajae dogfood analysis 2026-04-12
2026-04-12 08:53:10 +09:00
Yeachan-Heo 264fdc214e Retire the stale bare-skill dispatch backlog item
ROADMAP #36 remained open even though current main already dispatches bare
skill names in the REPL through skill resolution instead of forwarding them
to the model. This change adds a direct regression test for that behavior
and marks the backlog item done with fresh verification evidence.

Constraint: User required fresh cargo fmt, cargo clippy --workspace --all-targets -- -D warnings, and cargo test --workspace before closeout
Rejected: Leave #36 open because the implementation already existed | keeps the immediate backlog inaccurate and invites duplicate work
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Reopen #36 only with a fresh repro showing a listed project skill still falls through to plain prompt handling on current main
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: No interactive manual REPL session beyond the new bare-skill unit coverage
2026-04-11 22:50:28 +00:00
Yeachan-Heo 2d5f836988 Retire the stale broken-plugin warning backlog item
ROADMAP #40 was still listed as open even though current main already keeps
valid plugins visible while surfacing broken-plugin load failures. This change
adds a direct command-surface regression test for the warning block and marks
#40 done with fresh verification evidence.

Constraint: User required fresh cargo fmt/clippy/test evidence before closing any backlog item
Rejected: Leave #40 open because the implementation already existed | keeps the immediate backlog inaccurate and invites duplicate work
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Reopen #40 only with a fresh repro showing broken installed plugins are hidden or warning-free on current main
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; cargo test -p plugins plugin_registry_report_collects_load_failures_without_dropping_valid_plugins -- --nocapture; cargo test -p plugins installed_plugin_registry_report_collects_load_failures_from_install_root -- --nocapture
Not-tested: No interactive manual /plugins list run beyond automated command-layer rendering coverage
2026-04-11 19:47:21 +00:00
Yeachan-Heo a7b1fef176 Keep the rebased workspace green after the backlog closeout
The ROADMAP #38 closeout was rebased onto a moving main branch. That pulled in
new workspace files whose clippy/rustfmt fixes were required for the exact
verification gate the user asked for. This follow-up records those remaining
cleanups so the pushed branch matches the green tree that was actually tested.

Constraint: The user-required full-workspace fmt/clippy/test sequence had to stay green after rebasing onto newer origin/main
Rejected: Leave the rebase cleanup uncommitted locally | working tree would stay dirty and the pushed branch would not match the verified code
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When rebasing onto a moving main, commit any gate-fixing follow-up so pushed history matches the verified tree
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: No additional behavior beyond the already-green verification sweep
2026-04-11 18:52:48 +00:00
Yeachan-Heo 12d955ac26 Close the stale dead-session opacity backlog item with verified probe coverage
ROADMAP #38 stayed open even though the runtime already had a post-compaction
session-health probe. This change adds direct regression tests for that health
probe behavior and marks the roadmap item done. While re-running the required
workspace verification after a remote rebase, a small set of upstream clippy /
compile issues in plugins and test-isolation code also had to be repaired so the
user-requested full fmt/clippy/test sequence could pass on the rebased main.

Constraint: User required cargo fmt, cargo clippy --workspace --all-targets -- -D warnings, and cargo test --workspace before commit/push
Constraint: Remote main advanced during execution, so the change had to be rebased and re-verified before push
Rejected: Leave #38 open because the implementation pre-existed | keeps the immediate backlog inaccurate and invites duplicate work
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Reopen #38 only with a fresh compaction-vs-broken-surface repro on current main
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: No live long-running dogfood session replay beyond the new runtime regression tests
2026-04-11 18:52:02 +00:00
Yeachan-Heo 257aeb82dd Retire the stale dead-session opacity backlog item with regression proof
ROADMAP #38 no longer reflects current main. The runtime already runs a
post-compaction session-health probe, but the backlog lacked explicit
regression proof. This change adds focused tests for the two important
behaviors: a broken tool surface aborts a compacted session with a targeted
error, while a freshly compacted empty session does not false-positive as
dead. With that proof in place, the roadmap item can be marked done.

Constraint: User required fresh cargo fmt/clippy/test evidence before closing any backlog item
Rejected: Leave #38 open because the implementation already existed | backlog stays stale and invites duplicate work
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Reopen #38 only with a fresh same-turn repro that bypasses the current health-probe gate
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: No live long-running dogfood session replay beyond existing automated coverage
2026-04-11 18:47:37 +00:00
YeonGyu-Kim 16b9febdae feat: ultraclaw droid batch — ROADMAP #41 test isolation + #50 PowerShell permissions
Merged late-arriving droid output from 10 parallel ultraclaw sessions.

ROADMAP #41 — Test isolation for plugin regression checks:
- Add test_isolation.rs module with env_lock() for test environment isolation
- Redirect HOME/XDG_CONFIG_HOME/XDG_DATA_HOME to unique temp dirs per test
- Prevent host ~/.claude/plugins/ from bleeding into test runs
- Auto-cleanup temp directories on drop via RAII pattern
- Tests: 39 plugin tests passing

ROADMAP #50 — PowerShell workspace-aware permissions:
- Add is_safe_powershell_command() for command-level permission analysis
- Add is_path_within_workspace() for workspace boundary validation
- Classify read-only vs write-requiring bash commands (60+ commands)
- Dynamic permission requirements based on command type and target path
- Tests: permission enforcer and workspace boundary tests passing

Additional improvements:
- runtime/src/permission_enforcer.rs: Dynamic permission enforcement layer
  - check_with_required_mode() for dynamically-determined permissions
  - 60+ read-only command patterns (cat, find, grep, cargo, git, jq, yq, etc.)
  - Workspace-path detection for safe commands
- compat-harness/src/lib.rs: Compat harness updates for permission testing
- rusty-claude-cli/src/main.rs: CLI integration for permission modes
- plugins/src/lib.rs: Updated imports for test isolation module

Total: +410 lines across 5 files
Workspace tests: 448+ passed
Droid source: ultraclaw-04-test-isolation, ultraclaw-08-powershell-permissions

Ultraclaw total: 4 ROADMAP items committed (38, 40, 41, 50)
2026-04-12 03:06:24 +09:00
Yeachan-Heo 0082bf1640 Align auth docs with the removed login/logout surface
The ROADMAP #37 code path was correct, but the Rust and usage guides still
advertised `claw login` / `claw logout` and OAuth-login wording after the
command surface had been removed. This follow-up updates both docs to point
users at `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` only and removes the
stale command examples.

Constraint: Prior follow-up review rejected the closeout until user-facing auth docs matched the landed behavior
Rejected: Leave docs stale because runtime behavior was already correct | contradicts shipped CLI and re-opens support confusion
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When auth policy changes, update both rust/README.md and USAGE.md in the same change as the code surface
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: External rendered-doc consumers beyond repository markdown
2026-04-11 17:28:47 +00:00
Yeachan-Heo 124e8661ed Remove the deprecated Claude subscription login path and restore a green Rust workspace
ROADMAP #37 was still open even though several earlier backlog items were
already closed. This change removes the local login/logout surface, stops
startup auth resolution from treating saved OAuth credentials as a supported
path, and updates diagnostics/help to point users at ANTHROPIC_API_KEY or
ANTHROPIC_AUTH_TOKEN only.

While proving the change with the user-requested workspace gates, clippy
surfaced additional pre-existing warning failures across the Rust workspace.
Those were cleaned up in-place so the required `cargo fmt`, `cargo clippy
--workspace --all-targets -- -D warnings`, and `cargo test --workspace`
sequence now passes end to end.

Constraint: User explicitly required full-workspace fmt/clippy/test before commit/push
Constraint: Existing dirty leader worktree had to be stashed before attempted OMX team worktree launch
Rejected: Keep login/logout but hide them from help | left unsupported auth flow and saved OAuth fallback intact
Rejected: Stop after ROADMAP #37 targeted tests | did not satisfy required full-workspace verification gate
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: Do not reintroduce saved OAuth as a silent Anthropic startup fallback without an explicit supported auth policy
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: Remote push effects beyond origin/main update
2026-04-11 17:24:44 +00:00
Yeachan-Heo 61c01ff7da Prevent cross-worktree session bleed during managed session resume/load
ROADMAP #41 was still leaving a phantom-completion class open: managed
sessions could be resumed from the wrong workspace, and the CLI/runtime
paths were split between partially isolated storage and older helper
flows. This squashes the verified team work into one deliverable that
routes managed session operations through the per-worktree SessionStore,
rejects workspace mismatches explicitly, extends lane-event taxonomy for
workspace mismatch reporting, and updates the affected CLI regression
fixtures/docs so the new contract is enforced without losing same-
workspace legacy coverage.

Constraint: Keep same-workspace legacy flat sessions readable while blocking cross-worktree misuse
Constraint: No new dependencies; stay within the ROADMAP #41 changed-file scope
Rejected: Leave team auto-checkpoint history as final branch state | noisy/non-lore history for a single roadmap fix
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Preserve workspace_root validation on future resume/load helpers; do not reintroduce path-only fallback without equivalent mismatch checks
Tested: cargo test -p runtime session_control -- --nocapture; cargo test -p rusty-claude-cli resume -- --nocapture; cargo test -p rusty-claude-cli --test cli_flags_and_config_defaults; cargo test -p rusty-claude-cli --test output_format_contract; cargo test -p rusty-claude-cli --test resume_slash_commands; cargo test --workspace --exclude compat-harness; cargo check --workspace --all-targets; git diff --check
Not-tested: cargo clippy --workspace --all-targets -- -D warnings (pre-existing failures in unchanged rust/crates/rusty-claude-cli/build.rs)
Related: ROADMAP #41
2026-04-11 16:08:28 +00:00
YeonGyu-Kim 56218d7d8a feat(runtime): add session health probe for dead-session detection (ROADMAP #38)
Implements ROADMAP #38: Dead-session opacity detection via health canary.

- Add run_session_health_probe() to ConversationRuntime
- Probe runs after compaction to verify tool executor responsiveness
- Add last_health_check_ms field to Session for tracking
- Returns structured error if session appears broken after compaction

Ultraclaw droid session: ultraclaw-02-session-health

Tests: runtime crate 436 passed, integration 12 passed
2026-04-12 00:33:26 +09:00
YeonGyu-Kim 2ef447bd07 feat(commands): surface broken plugin warnings in /plugins list
Implements ROADMAP #40: Show warnings for broken/missing plugin manifests
instead of silently failing.

- Add PluginLoadFailure import
- New render_plugins_report_with_failures() function
- Shows ⚠️ warnings for failed plugin loads with error details
- Updates ROADMAP.md to mark #40 in progress

Ultraclaw droid session: ultraclaw-03-broken-plugins
2026-04-11 22:44:29 +09:00
YeonGyu-Kim 1ecdb1076c fix(api): OPENAI_BASE_URL wins over Anthropic fallback for unknown models
When OPENAI_BASE_URL is set, the user explicitly configured an
OpenAI-compatible endpoint (Ollama, LM Studio, vLLM, etc.). Model names
like 'qwen2.5-coder:7b' or 'llama3:latest' don't match any recognized
prefix, so detect_provider_kind() fell through to Anthropic — asking for
Anthropic credentials even though the user clearly intended a local
provider.

Now: OPENAI_BASE_URL + OPENAI_API_KEY beats Anthropic env-check in the
cascade. OPENAI_BASE_URL alone (no API key — common for Ollama) is a
last-resort fallback before the Anthropic default.

Source: MaxDerVerpeilte in #claw-code (Ollama + qwen2.5-coder:7b);
traced by gaebal-gajae.
2026-04-10 12:37:39 +09:00
YeonGyu-Kim 3a6c9a55c1 fix(tools): support brace expansion in glob_search patterns
The glob crate (v0.3) does not support shell-style brace groups like
{cs,uxml,uss}. Patterns such as 'Assets/**/*.{cs,uxml,uss}' silently
returned 0 results.

Added expand_braces() to pre-expand brace groups before passing patterns
to glob::glob(). Handles nested braces (e.g. src/{a,b}.{rs,toml}).
Results are deduplicated via HashSet.

5 new tests:
- expand_braces_no_braces
- expand_braces_single_group
- expand_braces_nested
- expand_braces_unmatched
- glob_search_with_braces_finds_files

Source: user 'zero' in #claw-code (Windows, Unity project with
Assets/**/*.{cs,uxml,uss} glob). Traced by gaebal-gajae.
2026-04-10 11:22:38 +09:00
YeonGyu-Kim 810036bf09 test(cli): add integration test for model persistence in resumed /status
New test: resumed_status_surfaces_persisted_model
- Creates session with model='claude-sonnet-4-6'
- Resumes with --output-format json /status
- Asserts model round-trips through session metadata

Resume integration tests: 11 → 12.
2026-04-10 10:31:05 +09:00
YeonGyu-Kim 0f34c66acd feat(session): persist model in session metadata — ROADMAP #59
Add 'model: Option<String>' to Session struct. The model used is now
saved in the session_meta JSONL record and surfaced in resumed /status:
- JSON mode: {model: 'claude-sonnet-4-6'} instead of null
- Text mode: shows actual model instead of 'restored-session'

Model is set in build_runtime_with_plugin_state() before the runtime
is constructed, and only when not already set (preserves model through
fork/resume cycles).

Backward compatible: old sessions without a model field load cleanly
with model: None (shown as null in JSON, 'restored-session' in text).

All workspace tests pass.
2026-04-10 10:05:42 +09:00
YeonGyu-Kim b95d330310 fix(startup): fall back to USERPROFILE when HOME is not set (Windows)
On Windows, HOME is often unset. The CLI crashed at startup with
'error: io error: HOME is not set' because three paths only checked
HOME:
- config_home_dir() in tools crate (config/settings loading)
- credentials_home_dir() in runtime crate (OAuth credentials)
- detect_broad_cwd() in CLI (CWD-is-home-dir check)
- skill lookup roots in tools crate

All now fall through to USERPROFILE when HOME is absent. Error message
updated to suggest USERPROFILE or CLAW_CONFIG_HOME on Windows.

Source: MaxDerVerpeilte in #claw-code (Windows user, 2026-04-10).
2026-04-10 08:33:35 +09:00
YeonGyu-Kim 74311cc511 test(cli): add 5 integration tests for resume JSON parity
New integration tests covering recent JSON parity work:
- resumed_version_command_emits_structured_json
- resumed_export_command_emits_structured_json
- resumed_help_command_emits_structured_json
- resumed_no_command_emits_restored_json
- resumed_stub_command_emits_not_implemented_json

Prevents regression on ROADMAP #54 (stub command error), #55 (session
list), #56 (--resume no-command JSON), #57 (session load errors).

Resume integration tests: 6 → 11.
2026-04-10 08:03:17 +09:00
YeonGyu-Kim 6ae8850d45 fix(api): silence dead_code warning and remove duplicated #[test] attr
- Add #[allow(dead_code)] on test-only Delta struct (content field
  used for deserialization but not read in assertion)
- Remove duplicated #[test] attribute on
  assistant_message_without_tool_calls_omits_tool_calls_field

Zero warnings in cargo test --workspace.
2026-04-10 07:33:22 +09:00
YeonGyu-Kim 4f670e5513 fix(cli): emit JSON for --resume with no command in --output-format json mode
claw --output-format json --resume <session> (no command) was printing:
  'Restored session from <path> (N messages).'
to stdout as prose, regardless of output format.

Now emits:
  {"kind":"restored","session_id":"...","path":"...","message_count":N}

159 CLI tests pass.
2026-04-10 06:31:16 +09:00
YeonGyu-Kim 8dcf10361f fix(cli): implement /session list in resume mode — ROADMAP #21 partial
/session list previously returned 'unsupported resumed slash command' in
--output-format json --resume mode. It only reads the sessions directory
so does not need a live runtime session.

Adds a Session{action:"list"} arm in run_resume_command() before the
unsupported catchall. Emits:
  {kind:session_list, sessions:[...ids], active:<current-session-id>}

159 CLI tests pass.
2026-04-10 06:03:29 +09:00
YeonGyu-Kim cf129c8793 fix(cli): emit JSON error when session fails to load in --output-format json mode
'failed to restore session' errors from both the path-resolution step
and the JSONL-load step now check output_format and emit:
  {"type":"error","error":"failed to restore session: <detail>"}
instead of bare eprintln prose.

Covers: session not found, corrupt JSONL, permission errors.
2026-04-10 05:01:56 +09:00
YeonGyu-Kim c0248253ac fix(cli): remove 'stats' from STUB_COMMANDS — it is implemented
/stats was accidentally listed in STUB_COMMANDS (both in the original list
and overlooked in 1e14d59). Since SlashCommand::Stats is fully implemented
with REPL and resume dispatch, it should not be intercepted as unimplemented.

/tokens and /cache alias to Stats and were already working correctly.
/stats now works again in all modes.
2026-04-10 04:32:05 +09:00
YeonGyu-Kim 1e14d59a71 fix(cli): stop circular 'Did you mean /X?' for spec commands with no parse arm
23 spec-registered commands had no parse arm in validate_slash_command_input,
causing the circular error 'Unknown slash command: /X — Did you mean /X?'
when users typed them in --resume mode.

Two fixes:
1. Add the 23 confirmed parse-armless commands to STUB_COMMANDS (excluded
   from REPL completions and help output).
2. In resume dispatch, intercept STUB_COMMANDS before SlashCommand::parse
   and emit a clean '{error: "/X is not yet implemented in this build"}'
   instead of the confusing error from the Err parse path.

Affected: /allowed-tools, /bookmarks, /workspace, /reasoning, /budget,
/rate-limit, /changelog, /diagnostics, /metrics, /tool-details, /focus,
/unfocus, /pin, /unpin, /language, /profile, /max-tokens, /temperature,
/system-prompt, /notifications, /telemetry, /env, /project, plus ~40
additional unreachable spec names.

159 CLI tests pass.
2026-04-10 04:05:41 +09:00
YeonGyu-Kim 11e2353585 fix(cli): JSON parity for /export and /agents in resume mode
/export now emits: {kind:export, file:<path>, message_count:<n>}
/agents now emits: {kind:agents, text:<agents report>}

Previously both returned json:None and fell through to prose output even
in --output-format json --resume mode. 159 CLI tests pass.
2026-04-10 03:32:24 +09:00
YeonGyu-Kim 0845705639 fix(tests): update test assertions for null model in resume /status; drop unused import
Two integration tests expected 'model':'restored-session' in the /status
JSON output but dc4fa55 changed resume mode to emit null for model.
Updated both assertions to assert model is null (correct behavior).

Also remove unused 'estimate_session_tokens' import in compact.rs tests
(surfaced as warning in CI, kept failing CI green noise).

All workspace tests pass.
2026-04-10 03:21:58 +09:00
YeonGyu-Kim 316864227c fix(cli): JSON parity for /help and /diff in resume mode
/help now emits: {kind:help, text:<full help text>}
/diff now emits:
  - no git repo: {kind:diff, result:no_git_repo, detail:...}
  - clean tree:  {kind:diff, result:clean, staged:'', unstaged:''}
  - changes:     {kind:diff, result:changes, staged:..., unstaged:...}

Previously both returned json:None and fell through to prose output even
in --output-format json --resume mode. 159 CLI tests pass.
2026-04-10 03:02:00 +09:00
YeonGyu-Kim c8cac7cae8 fix(cli): doctor config check hides non-existent candidate paths
Before: doctor reported 'loaded 0/5' and listed 5 'Discovered file'
entries for paths that don't exist on disk. This looked like 5 files
failed to load, when in fact they are just standard search locations.

After: only paths that actually exist on disk are shown as 'Discovered
file'. 'loaded N/M' denominator is now the count of present files, not
candidate paths. With no config files present: 'loaded 0/0' +
'Discovered files  <none> (defaults active)'.

159 CLI tests pass.
2026-04-10 02:32:47 +09:00
YeonGyu-Kim dc4fa55d64 fix(cli): /status JSON emits null model and correct session_id in resume mode
Two bugs in --output-format json --resume /status:

1. 'model' field emitted 'restored-session' (a run-mode label) instead of
   the actual model or null. Fixed: status_json_value now takes Option<&str>
   for model; resume path passes None; live REPL path passes Some(model).

2. 'session_id' extracted parent dir name ('sessions') instead of the file
   stem. Session files are session-<id>.jsonl directly under .claw/sessions/,
   not in a subdirectory. Fixed: extract file_stem() instead of
   parent().file_name().

159 CLI tests pass.
2026-04-10 02:03:14 +09:00
YeonGyu-Kim a3d0c9e5e7 fix(api): sanitize orphaned tool messages at request-building layer
Adds sanitize_tool_message_pairing() called from build_chat_completion_request()
after translate_message() runs. Drops any role:"tool" message whose
immediately-preceding non-tool message is role:"assistant" but has no
tool_calls entry matching the tool_call_id.

This is the second layer of the tool-pairing invariant defense:
- 6e301c8: compaction boundary fix (producer layer)
- this commit: request-builder sanitizer (sender layer)

Together these close the 400-error loop for resumed/compacted multi-turn
tool sessions on OpenAI-compatible backends.

Sanitization only fires when preceding message is role:assistant (not
user/system) to avoid dropping valid translation artifacts from mixed
user-message content blocks.

Regression tests: sanitize_drops_orphaned_tool_messages covers valid pair,
orphaned tool (no tool_calls in preceding assistant), mismatched id, and
two tool results both referencing the same assistant turn.

116 api + 159 CLI + 431 runtime tests pass. Fmt clean.
2026-04-10 01:35:00 +09:00
YeonGyu-Kim 78dca71f3f fix(cli): JSON parity for /compact and /clear in resume mode
/compact now emits: {kind:compact, skipped, removed_messages, kept_messages}
/clear now emits:  {kind:clear, previous_session_id, new_session_id, backup, session_file}
/clear (no --confirm) now emits: {kind:error, error:..., hint:...}

Previously both returned json:None and fell through to prose output even
in --output-format json --resume mode. 159 CLI tests pass.
2026-04-10 01:31:21 +09:00
YeonGyu-Kim d95149b347 fix(cli): surface resolved path in dump-manifests error — ROADMAP #45 partial
Before:
  error: failed to extract manifests: No such file or directory (os error 2)

After:
  error: failed to extract manifests: No such file or directory (os error 2)
    looked in: /Users/yeongyu/clawd/claw-code/rust

The workspace_dir is computed from CARGO_MANIFEST_DIR at compile time and
only resolves correctly when running from the build tree. Surfacing the
resolved path lets users understand immediately why it fails outside the
build context.

ROADMAP #45 root cause (build-tree-only path) remains open.
2026-04-10 01:01:53 +09:00
YeonGyu-Kim 47aa1a57ca fix(cli): surface command name in 'not yet implemented' REPL message
Add SlashCommand::slash_name() to the commands crate — returns the
canonical '/name' string for any variant. Used in the REPL's stub
catch-all arm to surface which command was typed instead of printing
the opaque 'Command registered but not yet implemented.'

Before: typing /rewind → 'Command registered but not yet implemented.'
After:  typing /rewind → '/rewind is not yet implemented in this build.'

Also update the compacts_sessions_via_slash_command test assertion to
tolerate the boundary-guard fix from 6e301c8 (removed_message_count
can be 1 or 2 depending on whether the boundary falls on a tool-result
pair). All 159 CLI + 431 runtime + 115 api tests pass.
2026-04-10 00:39:16 +09:00
YeonGyu-Kim 6e301c8bb3 fix(runtime): prevent orphaned tool-result at compaction boundary; /cost JSON
Two fixes:

1. compact.rs: When the compaction boundary falls at the start of a
   tool-result turn, the preceding assistant turn with ToolUse would be
   removed — leaving an orphaned role:tool message with no preceding
   assistant tool_calls. OpenAI-compat backends reject this with 400.

   Fix: after computing raw_keep_from, walk the boundary back until the
   first preserved message is not a ToolResult (or its preceding assistant
   has been included). Regression test added:
   compaction_does_not_split_tool_use_tool_result_pair.

   Source: gaebal-gajae multi-turn tool-call 400 repro 2026-04-09.

2. /cost resume: add JSON output:
   {kind:cost, input_tokens, output_tokens, cache_creation_input_tokens,
    cache_read_input_tokens, total_tokens}

159 CLI + 431 runtime tests pass. Fmt clean.
2026-04-10 00:13:45 +09:00
YeonGyu-Kim 7587f2c1eb fix(cli): JSON parity for /memory and /providers in resume mode
Two gaps closed:

1. /memory (resume): json field was None, emitting prose regardless of
   --output-format json. Now emits:
     {kind:memory, cwd, instruction_files:N, files:[{path,lines,preview}...]}

2. /providers (resume): had a spec entry but no parse arm, producing the
   circular 'Unknown slash command: /providers — Did you mean /providers'.
   Added 'providers' as an alias for 'doctor' in the parse match so
   /providers dispatches to the same structured diagnostic output.

3. /doctor (resume): also wired json_value() so --output-format json
   returns the structured doctor report instead of None.

Continues ROADMAP #26 resumed-command JSON parity track.
159 CLI tests pass, fmt clean.
2026-04-09 23:35:25 +09:00
YeonGyu-Kim ed42f8f298 fix(api): surface provider error in SSE stream frames (companion to ff416ff)
Same fix as ff416ff but for the streaming path. Some backends embed an
error JSON object in an SSE data: frame:

  data: {"error":{"message":"context too long","code":400}}

parse_sse_frame() was attempting to deserialize this as ChatCompletionChunk
and failing with 'missing field' / 'invalid type', hiding the actual
backend error message.

Fix: check for an 'error' key before full chunk deserialization, same as
the non-streaming path in ff416ff. Symmetric pair:
- ff416ff: non-streaming path (response body)
- this:    streaming path (SSE data: frame)

115 api + 159 CLI tests pass. Fmt clean.
2026-04-09 23:03:33 +09:00
YeonGyu-Kim ff416ff3e7 fix(api): surface provider error body before attempting completion parse
When a local/proxy OpenAI-compatible backend returns an error object:
  {"error":{"message":"...","type":"...","code":...}}

claw was trying to deserialize it as a ChatCompletionResponse and
failing with the cryptic 'failed to parse OpenAI response: missing
field id', completely hiding the actual backend error message.

Fix: before full deserialization, check if the parsed JSON has an
'error' key and promote it directly to ApiError::Api so the user
sees the real error (e.g. 'The number of tokens to keep from the
initial prompt is greater than the context length').

Source: devilayu in #claw-code 2026-04-09 — local LM Studio context
limit error was invisible; user saw 'missing field id' instead.
159 CLI + 115 api tests pass. Fmt clean.
2026-04-09 22:33:07 +09:00
YeonGyu-Kim 6ac7d8cd46 fix(api): omit tool_calls field from assistant messages when empty
When serializing a multi-turn conversation for the OpenAI-compatible path,
assistant messages with no tool calls were always emitting 'tool_calls: []'.
Some providers reject requests where a prior assistant turn carries an
explicit empty tool_calls array (400 on subsequent turns after a plain
text assistant response).

Fix: only include 'tool_calls' in the serialized assistant message when
the vec is non-empty. Empty case omits the field entirely.

This is a companion fix to fd7aade (null tool_calls in stream delta).
The two bugs are symmetric: fd7aade handled inbound null -> empty vec;
this handles outbound empty vec -> field omitted.

Two regression tests added:
- assistant_message_without_tool_calls_omits_tool_calls_field
- assistant_message_with_tool_calls_includes_tool_calls_field

115 api tests pass. Fmt clean.

Source: gaebal-gajae repro 2026-04-09 (400 on multi-turn, companion to
null tool_calls stream-delta fix).
2026-04-09 22:06:25 +09:00
YeonGyu-Kim 7ec6860d9a fix(cli): emit JSON for /config in --output-format json --resume mode
/config resumed returned json:None, falling back to prose output even in
--output-format json mode. Adds render_config_json() that produces:

  {
    "kind": "config",
    "cwd": "...",
    "loaded_files": N,
    "merged_keys": N,
    "files": [{"path":"...","source":"user|project|local","loaded":true|false}, ...]
  }

Wires it into the SlashCommand::Config resume arm alongside the existing
prose render. Continues the resumed-command JSON parity track (ROADMAP #26).
159 CLI tests pass, fmt clean.
2026-04-09 22:03:11 +09:00
YeonGyu-Kim 0e12d15daf fix(cli): add --allow-broad-cwd; require confirmation or flag in broad-CWD mode 2026-04-09 21:55:22 +09:00
YeonGyu-Kim fd7aade5b5 fix(api): tolerate null tool_calls in OpenAI-compat stream delta chunks
Some OpenAI-compatible providers emit 'tool_calls: null' in streaming
delta chunks instead of omitting the field or using an empty array:

  "delta": {"content":"","function_call":null,"tool_calls":null}

serde's #[serde(default)] only handles absent keys — an explicit null
value still fails deserialization with:
  'invalid type: null, expected a sequence'

Fix: replace #[serde(default)] with a custom deserializer helper
deserialize_null_as_empty_vec() that maps null -> Vec::default(),
keeping the existing absent-key default behaviour.

Regression test added: delta_with_null_tool_calls_deserializes_as_empty_vec
uses the exact provider response shape from gaebal-gajae's repro (2026-04-09).

112 api lib tests pass. Fmt clean.

Companion to gaebal-gajae's local 448cf2c — independently reproduced
and landed on main.
2026-04-09 21:39:52 +09:00
YeonGyu-Kim 60ec2aed9b fix(cli): wire /tokens and /cache as aliases for /stats; implement /stats
Dogfood found that /tokens and /cache had spec entries (resume_supported:
true) but no parse arms in the command parser, resulting in:
  'Unknown slash command: /tokens — Did you mean /tokens'
(the suggestion engine found the spec entry but parsing always failed)

Fix three things:
1. Add 'tokens' | 'cache' as aliases for 'stats' in the parse match so
   the commands actually resolve to SlashCommand::Stats
2. Implement SlashCommand::Stats in the REPL dispatch — previously fell
   through to 'Command registered but not yet implemented'. Now shows
   cumulative token usage for the session.
3. Implement SlashCommand::Stats in run_resume_command — previously
   returned 'unsupported resumed slash command'. Now emits:
   text:  Cost / Input tokens / Output tokens / Cache create / Cache read
   json:  {kind:stats, input_tokens, output_tokens, cache_*, total_tokens}

159 CLI tests pass, fmt clean.
2026-04-09 21:34:36 +09:00
YeonGyu-Kim 5f6f453b8d fix(cli): warn when launched from home dir or filesystem root
Users launching claw from their home directory (or /) have no project
boundary — the agent can read/search the entire machine, often far beyond
the intended scope. kapcomunica in #claw-code reported exactly this:
'it searched my entire computer.'

Add warn_if_broad_cwd() called at prompt and REPL startup:
- checks if CWD == $HOME or CWD has no parent (fs root)
- prints a clear warning to stderr:
    Warning: claw is running from a very broad directory (/home/user).
    The agent can read and search everything under this path.
    Consider running from inside your project: cd /path/to/project && claw

Warning fires on both claw (REPL) and claw prompt '...' paths.
Does not fire from project subdirectories. Uses std::env::var_os("HOME"),
no extra deps.

159 CLI tests pass, fmt clean.
2026-04-09 21:26:51 +09:00
YeonGyu-Kim da4242198f fix(cli): emit JSON error for unsupported resumed slash commands in JSON mode
When claw --output-format json --resume <session> /commit (or /plugins, etc.)
encountered an 'unsupported resumed slash command' error, it called
eprintln!() and exit(2) directly, bypassing both the main() JSON error
handler and the output_format check.

Fix: in both the slash-command parse-error path and the run_resume_command
Err path, check output_format and emit a structured JSON error:

  {"type":"error","error":"unsupported resumed slash command","command":"/commit"}

Text mode unchanged (still exits 2 with prose to stderr).
Addresses the resumed-command parity gap (gaebal-gajae ROADMAP #26 track).
159 CLI tests pass, fmt clean.
2026-04-09 21:04:50 +09:00
YeonGyu-Kim 84b77ece4d fix(cli): pipe stdin to prompt when no args given (suppress REPL on pipe)
When stdin is not a terminal (pipe or redirect) and no prompt is given on
the command line, claw was starting the interactive REPL and printing the
startup banner, then consuming the pipe without sending anything to the API.

Fix: in parse_args, when rest.is_empty() and stdin is not a terminal, read
stdin synchronously and dispatch as CliAction::Prompt instead of Repl.
Empty pipe still falls through to Repl (interactive launch with no input).

Before: echo 'hello' | claw  -> startup banner + REPL start
After:  echo 'hello' | claw  -> dispatches as one-shot prompt

159 CLI tests pass, fmt clean.
2026-04-09 20:36:14 +09:00
YeonGyu-Kim aef85f8af5 fix(cli): /diff shows clear error when not in a git repo
Previously claw --resume <session> /diff would produce:
  'git diff --cached failed: error: unknown option `cached\''
when the CWD was not inside a git project, because git falls back to
--no-index mode which does not support --cached.

Two fixes:
1. render_diff_report_for() checks 'git rev-parse --is-inside-work-tree'
   before running git diff, and returns a human-readable message if not
   in a git repo:
     'Diff\n  Result  no git repository\n  Detail  <cwd> is not inside a git project'
2. resume /diff now uses std::env::current_dir() instead of the session
   file's parent directory as the CWD for the diff (session parent dir
   is the .claw/sessions/<id>/ directory, never a git repo).

159 CLI tests pass, fmt clean.
2026-04-09 20:04:21 +09:00
YeonGyu-Kim 3ed27d5cba fix(cli): emit JSON for /history in --output-format json --resume mode
Previously claw --output-format json --resume <session> /history emitted
prose text regardless of the output format flag. Now emits structured JSON:

  {"kind":"history","total":N,"showing":M,"entries":[{"timestamp_ms":...,"text":"..."},...]}

Mirrors the parity pattern established in ROADMAP #26 for other resume commands.
159 CLI tests pass, fmt clean.
2026-04-09 19:33:50 +09:00
YeonGyu-Kim e1ed30a038 fix(cli): surface session_id in /status JSON output
When running claw --output-format json --resume <session> /status, the
JSON output had 'session' (full file path) but no 'session_id' field,
making it impossible for scripts to extract the loaded session ID.

Now extracts the session-id directory component from the session path
(e.g. .claw/sessions/<session-id>/session-xxx.jsonl → session-id)
and includes it as 'session_id' in the JSON status envelope.

159 CLI tests pass, fmt clean.
2026-04-09 19:06:36 +09:00
YeonGyu-Kim 54269da157 fix(cli): claw state exits 1 when no worker state file exists
Previously 'claw state' printed an error message but exited 0, making it
impossible for scripts/CI to detect the absence of state without parsing
prose. Now propagates Err() to main() which exits 1 and formats the error
correctly for both text and --output-format json modes.

Text: 'error: no worker state file found at ... — run a worker first'
JSON: {"type":"error","error":"no worker state file found at ..."}
2026-04-09 18:34:41 +09:00
YeonGyu-Kim f741a42507 test(cli): add regression coverage for reasoning-effort validation and stub-command filtering
3 new tests in mod tests:
- rejects_invalid_reasoning_effort_value: confirms 'turbo' etc rejected at parse time
- accepts_valid_reasoning_effort_values: confirms low/medium/high accepted and threaded
- stub_commands_absent_from_repl_completions: asserts STUB_COMMANDS are not in completions

156 -> 159 CLI tests pass.
2026-04-09 18:06:32 +09:00
YeonGyu-Kim 1a8f73da01 fix(cli): emit JSON error on --output-format json — ROADMAP #42
When claw --output-format json hits an error, the error was previously
printed as plain prose to stderr, making it invisible to downstream tooling
that parses JSON output. Now:

  {"type":"error","error":"api returned 401 ..."}

Detection: scan argv at process exit for --output-format json or
--output-format=json. Non-JSON error path unchanged. 156 CLI tests pass.
2026-04-09 16:33:20 +09:00
YeonGyu-Kim 8d0308eecb fix(cli): dispatch bare skill names to skill invoker in REPL — ROADMAP #36
Users were typing skill names (e.g. 'caveman', 'find-skills') directly in
the REPL and getting LLM responses instead of skill invocation. Only
'/skills <name>' triggered dispatch; bare names fell through to run_turn.

Fix: after slash-command parse returns None (bare text), check if the first
token looks like a skill name (alphanumeric/dash/underscore, no slash).
If resolve_skill_invocation() confirms the skill exists, dispatch the full
input as a skill prompt. Unknown words fall through unchanged.

156 CLI tests pass, fmt clean.
2026-04-09 16:01:18 +09:00
YeonGyu-Kim 4d10caebc6 fix(cli): validate --reasoning-effort accepts only low|medium|high
Previously any string was accepted and silently forwarded to the API,
which would fail at the provider with an unhelpful error. Now invalid
values produce a clear error at parse time:

  invalid value for --reasoning-effort: 'xyz'; must be low, medium, or high

156 CLI tests pass, fmt clean.
2026-04-09 15:03:36 +09:00
YeonGyu-Kim 414526c1bd fix(cli): exclude stub slash commands from help output — ROADMAP #39
The --help slash-command section was listing ~35 unimplemented commands
alongside working ones. Combined with the completions fix (c55c510), the
discovery surface now consistently shows only implemented commands.

Changes:
- commands crate: add render_slash_command_help_filtered(exclude: &[&str])
- move STUB_COMMANDS to module-level const in main.rs (reused by both
  completions and help rendering)
- replace render_slash_command_help() with filtered variant at all
  help-rendering call sites

156 CLI tests pass, fmt clean.
2026-04-09 14:36:00 +09:00
YeonGyu-Kim 2a2e205414 fix(cli): intercept --help for prompt/login/logout/version subcommands before API dispatch
'claw prompt --help' was triggering an API call instead of showing help
because --help was parsed as part of the prompt args. Now '--help' after
known pass-through subcommands (prompt, login, logout, version, state,
init, export, commit, pr, issue) sets wants_help=true and shows the
top-level help page.

Subcommands that consume their own args (agents, mcp, plugins, skills)
and local help-topic subcommands (status, sandbox, doctor) are excluded
from this interception so their existing --help handling is preserved.

156 CLI tests pass, fmt clean.
2026-04-09 14:06:26 +09:00
YeonGyu-Kim c55c510883 fix(cli): exclude stub slash commands from REPL completions — ROADMAP #39
Commands registered in the spec list but not yet implemented in this build
were appearing in REPL tab-completions, making the discovery surface
over-promise what actually works. Users (mezz2301) reported 'many features
are not supported' after discovering these through completions.

Add STUB_COMMANDS exclusion list in slash_command_completion_candidates_with_sessions.
Excluded: login logout vim upgrade stats share feedback files fast exit
summary desktop brief advisor stickers insights thinkback release-notes
security-review keybindings privacy-settings plan review tasks theme
voice usage rename copy hooks context color effort branch rewind ide
tag output-style add-dir

These commands still parse and run (with the 'not yet implemented' message
for users who type them directly), but they no longer surface as
tab-completion candidates.
2026-04-09 13:36:12 +09:00
YeonGyu-Kim ca8950c26b feat(cli): wire --reasoning-effort flag end-to-end — closes ROADMAP #34
Parse --reasoning-effort <low|medium|high> in parse_args, thread through
CliAction::Prompt and CliAction::Repl, LiveCli::set_reasoning_effort(),
AnthropicRuntimeClient.reasoning_effort field, and MessageRequest.reasoning_effort.

Changes:
- parse_args: new --reasoning-effort / --reasoning-effort=VAL flag arms
- AnthropicRuntimeClient: new reasoning_effort field + set_reasoning_effort() method
- LiveCli: new set_reasoning_effort() that reaches through BuiltRuntime -> ConversationRuntime -> api_client_mut()
- runtime::ConversationRuntime: new pub api_client_mut() accessor
- MessageRequest construction: reasoning_effort: self.reasoning_effort.clone()
- run_repl(): accepts and applies reasoning_effort parameter
- parse_direct_slash_cli_action(): propagates reasoning_effort

All 156 CLI tests pass, all api tests pass, cargo fmt clean.
2026-04-09 11:08:00 +09:00
YeonGyu-Kim c1b1ce465e feat(cli): add reasoning_effort field to CliAction::Prompt/Repl variants — ROADMAP #34 struct groundwork
Adds reasoning_effort: Option<String> to CliAction::Prompt and
CliAction::Repl enum variants. All constructor and pattern sites updated.
All test literals updated with reasoning_effort: None.

156 cli tests pass, fmt clean. The --reasoning-effort flag parse and
propagation to AnthropicRuntimeClient remains as follow-up work.
2026-04-09 10:34:28 +09:00
YeonGyu-Kim eb044f0a02 fix(api): emit max_completion_tokens for gpt-5* on OpenAI-compat path — closes ROADMAP #35
gpt-5.x models reject requests with max_tokens and require max_completion_tokens.
Detect wire model starting with 'gpt-5' and switch the JSON key accordingly.
Older models (gpt-4o etc.) continue to receive max_tokens unchanged.

Two regression tests added:
- gpt5_uses_max_completion_tokens_not_max_tokens
- non_gpt5_uses_max_tokens

140 api tests pass, cargo fmt clean.
2026-04-09 09:33:45 +09:00
Jobdori e4c3871882 feat(api): add reasoning_effort field to MessageRequest and OpenAI-compat path
Users of OpenAI-compatible reasoning models (o4-mini, o3, deepseek-r1,
etc.) had no way to control reasoning effort — the field was missing from
MessageRequest and never emitted in the request body.

Changes:
- Add `reasoning_effort: Option<String>` to `MessageRequest` in types.rs
  - Annotated with skip_serializing_if = "Option::is_none" for clean JSON
  - Accepted values: "low", "medium", "high" (passed through verbatim)
- In `build_chat_completion_request`, emit `"reasoning_effort"` when set
- Two unit tests:
  - `reasoning_effort_is_included_when_set`: o4-mini + "high" → field present
  - `reasoning_effort_omitted_when_not_set`: gpt-4o, no field → absent

Existing callers use `..Default::default()` and are unaffected.
One struct-literal test that listed all fields explicitly updated with
`reasoning_effort: None`.

The CLI flag to expose this to users is a follow-up (ROADMAP #34 partial).
This commit lands the foundational API-layer plumbing needed for that.

Partial ROADMAP #34.
2026-04-09 04:02:59 +09:00
Jobdori beb09df4b8 style(api): cargo fmt fix on normalize_object_schema test assertions 2026-04-09 03:43:59 +09:00
Jobdori e7e0fd2dbf fix(api): strict object schema for OpenAI /responses endpoint
OpenAI /responses validates tool function schemas strictly:
- object types must have "properties" (at minimum {})
- "additionalProperties": false is required

/chat/completions is lenient and accepts schemas without these fields,
but /responses rejects them with "object schema missing properties" /
"invalid_function_parameters".

Add normalize_object_schema() which recursively walks the JSON Schema
tree and fills in missing "properties"/{} and "additionalProperties":false
on every object-type node. Existing values are not overwritten.

Call it in openai_tool_definition() before building the request payload
so both /chat/completions and /responses receive strict-validator-safe
schemas.

Add unit tests covering:
- bare object schema gets both fields injected
- nested object schemas are normalised recursively
- existing additionalProperties is not overwritten

Fixes the live repro where gpt-5.4 via OpenAI compat accepted connection
and routing but rejected every tool call with schema validation errors.

Closes ROADMAP #33.
2026-04-09 03:03:43 +09:00
Jobdori 252536be74 fix(tools): serialize web_search env-var tests with env_lock to prevent race
web_search_extracts_and_filters_results set CLAWD_WEB_SEARCH_BASE_URL
without holding env_lock(), while the sibling test
web_search_handles_generic_links_and_invalid_base_url always held it.
Under parallel test execution the two tests interleave set_var/remove_var
calls, pointing the search client at the wrong mock server port and
causing assertion failures.

Fix: add env_lock() guard at the top of web_search_extracts_and_filters_results,
matching the serialization pattern already used by every other env-mutating
test in this module.

Root cause of CI flake on run 24127551802.
Identified and fixed during dogfood session.
2026-04-08 18:34:06 +09:00
Jobdori 275b58546d feat(cli): populate Git SHA, target triple, and build date at compile time via build.rs
Add rust/crates/rusty-claude-cli/build.rs that:
- Captures git rev-parse --short HEAD at build time → GIT_SHA env
- Reads Cargo's TARGET env var → TARGET env
- Derives BUILD_DATE from SOURCE_DATE_EPOCH / BUILD_DATE env or
  the current date via `date +%Y-%m-%d` fallback
- Registers rerun-if-changed on .git/HEAD and .git/refs so the SHA
  stays fresh across commits

Update main.rs DEFAULT_DATE to pick up BUILD_DATE from option_env!()
instead of the hardcoded 2026-03-31 static string.

Before: `claw --version` always showed Git SHA: unknown, Target: unknown,
Build date: 2026-03-31 in local builds.
After:  e.g. Git SHA: 7f53d82, Target: aarch64-apple-darwin, Build date: 2026-04-08

Generated by droid (Kimi K2.5 Turbo) via acpx (wrote build.rs),
cleaned up by Jobdori (added BUILD_DATE step, updated main.rs const).

Co-Authored-By: Droid <noreply@factory.ai>
2026-04-08 18:11:46 +09:00
Jobdori adcea6bceb fix(api): route DashScope models to dashscope config, not openai
ProviderClient::from_model_with_anthropic_auth was dispatching every
ProviderKind::OpenAi match to OpenAiCompatConfig::openai(), which reads
OPENAI_API_KEY and points at api.openai.com. But DashScope models
(qwen-plus, qwen/qwen3-coder, etc.) also return ProviderKind::OpenAi
from detect_provider_kind because DashScope speaks the OpenAI wire
format. The metadata layer correctly identifies them as needing
DASHSCOPE_API_KEY and the DashScope compatible-mode endpoint, but that
metadata was being ignored at dispatch time.

Result: users running `claw --model qwen-plus` with DASHSCOPE_API_KEY
set would get a "missing OPENAI_API_KEY" error instead of being routed
to DashScope.

Fix: consult providers::metadata_for_model in the OpenAi dispatch arm
and pick dashscope() vs openai() based on metadata.auth_env.

Adds a regression test asserting ProviderClient::from_model("qwen-plus")
builds with the DashScope base URL. Exposes a pub base_url() accessor
on OpenAiCompatClient so the test can verify the routing.

Authored by droid (Kimi K2.5 Turbo) via acpx, cleaned up by Jobdori
(removed unsafe blocks unnecessary under edition 2021, imported
ProviderClient from super, adopted EnvVarGuard pattern from
providers/mod.rs tests).

Co-Authored-By: Droid <noreply@factory.ai>
2026-04-08 18:04:37 +09:00
YeonGyu-Kim 8dc65805c1 fix(cli): dispatch to correct provider backend based on model prefix — closes ROADMAP #29
The CLI entry point (build_runtime_with_plugin_state in main.rs)
was hardcoded to always instantiate AnthropicRuntimeClient with an
AnthropicClient, regardless of what detect_provider_kind(model)
returned. This meant `--model openai/gpt-4` with OPENAI_API_KEY
set and no ANTHROPIC_* vars still failed with "missing Anthropic
credentials" because the CLI never dispatched to the OpenAI-compat
backend that already exists in the api crate.

Root cause: AnthropicRuntimeClient.client was typed as
AnthropicClient (concrete) rather than ApiProviderClient (enum).
The api crate already had a ProviderClient enum with Anthropic /
Xai / OpenAi variants that dispatches correctly via
detect_provider_kind, plus a unified MessageStream enum that wraps
both anthropic::MessageStream and openai_compat::MessageStream
with the same next_event() -> StreamEvent interface. The CLI just
wasn't using it.

Changes (1 file, +59 -7):
- Import api::ProviderClient as ApiProviderClient
- Change AnthropicRuntimeClient.client from AnthropicClient to
  ApiProviderClient
- In AnthropicRuntimeClient::new(), dispatch based on
  detect_provider_kind(&resolved_model):
  * Anthropic: build AnthropicClient directly with
    resolve_cli_auth_source() + api::read_base_url() +
    PromptCache (preserves ANTHROPIC_BASE_URL override for mock
    test harness and the session-scoped prompt cache)
  * xAI / OpenAi: delegate to
    ApiProviderClient::from_model_with_anthropic_auth which routes
    to OpenAiCompatClient::from_env with the matching config
    (reads OPENAI_API_KEY/XAI_API_KEY/DASHSCOPE_API_KEY and their
    BASE_URL overrides internally)
- Change push_prompt_cache_record to take &ApiProviderClient
  (ProviderClient::take_last_prompt_cache_record returns None for
  non-Anthropic variants, so the helper is a no-op on
  OpenAI-compat providers without extra branching)

What this unlocks for users:
  claw --model openai/gpt-4.1-mini prompt 'hello'  # OpenAI
  claw --model grok-3 prompt 'hello'                # xAI
  claw --model qwen-plus prompt 'hello'             # DashScope
  OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
    claw --model openai/anthropic/claude-sonnet-4 prompt 'hello'  # OpenRouter

All previously broken, now routed correctly by prefix.

Verification:
- cargo build --release -p rusty-claude-cli: clean
- cargo test --release -p rusty-claude-cli: 182 tests, 0 failures
  (including compact_output tests that exercise the Anthropic mock)
- cargo fmt --all: clean
- cargo clippy --workspace: warnings-only (pre-existing)
- cargo test --release --workspace: all crates green except one
  pre-existing race in runtime::config::tests (passes in isolation)

Source: live users nicma (1491342350960562277) and Jengro
(1491345009021030533) in #claw-code on 2026-04-08.
2026-04-08 17:29:55 +09:00
YeonGyu-Kim ff1df4c7ac fix(api): auth-provider error copy — prefix-routing hints + sk-ant-* bearer detection — closes ROADMAP #28
Two live users in #claw-code on 2026-04-08 hit adjacent auth confusion:
varleg set OPENAI_API_KEY for OpenRouter but prefix routing didn't
activate without openai/ model prefix, and stanley078852 put sk-ant-*
in ANTHROPIC_AUTH_TOKEN (Bearer path) instead of ANTHROPIC_API_KEY
(x-api-key path) and got 401 Invalid bearer token.

Changes:
1. ApiError::MissingCredentials gained optional hint field (error.rs)
2. anthropic_missing_credentials_hint() sniffs OPENAI/XAI/DASHSCOPE
   env vars and suggests prefix routing when present (providers/mod.rs)
3. All 4 Anthropic auth paths wire the hint helper (anthropic.rs)
4. 401 + sk-ant-* in bearer token detected and hint appended
5. 'Which env var goes where' section added to USAGE.md

Tests: unit tests for all three improvements (no HTTP calls needed).
Workspace: all tests green, fmt clean, clippy warnings-only.

Source: live users varleg + stanley078852 in #claw-code 2026-04-08.

Co-authored-by: gaebal-gajae <gaebal-gajae@layofflabs.com>
2026-04-08 16:29:03 +09:00
YeonGyu-Kim 172a2ad50a fix(plugins): chmod +x generated hook scripts + tolerate BrokenPipe in stdin write — closes ROADMAP #25 hotfix lane
Two bugs found in the plugin hook test harness that together caused
Linux CI to fail on 'hooks::tests::collects_and_runs_hooks_from_enabled_plugins'
with 'Broken pipe (os error 32)'. Three reproductions plus one rerun
failure on main today: 24120271422, 24120538408, 24121392171.

Root cause 1 (chmod, defense-in-depth): write_hook_plugin writes
pre.sh/post.sh/failure.sh via fs::write without setting the execute
bit. While the runtime hook runner invokes hooks via 'sh <path>' (so
the script file does not strictly need +x), missing exec perms can
cause subtle fork/exec races on Linux in edge cases.

Root cause 2 (the actual CI failure): output_with_stdin unconditionally
propagated write_all errors on the child's stdin pipe, including
BrokenPipe. A hook script that runs to completion in microseconds
(e.g. a one-line printf) can exit and close its stdin before the parent
finishes writing the JSON payload. Linux pipes surface this as EPIPE
immediately; macOS pipes happen to buffer the small payload, so the
race only shows on ubuntu CI runners. The parent's write_all raised
BrokenPipe, which output_with_stdin returned as Err, which run_command
classified as 'failed to start', making the test assertion fail.

Fix: (a) make_executable helper sets mode 0o755 via PermissionsExt on
each generated hook script, with a #[cfg(unix)] gate and a no-op
#[cfg(not(unix))] branch. (b) output_with_stdin now matches the
write_all result and swallows BrokenPipe specifically (the child still
ran; wait_with_output still captures stdout/stderr/exit code), while
propagating all other write errors. (c) New regression guard
generated_hook_scripts_are_executable under #[cfg(unix)] asserts each
generated .sh file has at least one execute bit set.

Surgical scope per gaebal-gajae's direction: chmod + pipe tolerance +
regression guard only. The deeper plugin-test sealing pass for ROADMAP
#25 + #27 stays in gaebal-gajae's OMX lane.

Verification:
- cargo test --release -p plugins → 35 passing, 0 failing
- cargo fmt -p plugins → clean
- cargo clippy -p plugins -- -D warnings → clean

Co-authored-by: gaebal-gajae <gaebal-gajae@layofflabs.com>
2026-04-08 15:48:20 +09:00
YeonGyu-Kim 5851f2dee8 fix(cli): 6 cascading test regressions hidden behind client_integration gate
- compact flag: was parsed then discarded (`compact: _`) instead of
  passed to `run_turn_with_output` — hardcoded `false` meant --compact
  never took effect
- piped stdin vs permission prompter: `read_piped_stdin()` consumed all
  stdin before `CliPermissionPrompter::decide()` could read interactive
  approval answers; now only consumes stdin as prompt context when
  permission mode is `DangerFullAccess` (fully unattended)
- session resolver: `resolve_managed_session_path` and
  `list_managed_sessions` now fall back to the pre-isolation flat
  `.claw/sessions/` layout so legacy sessions remain accessible
- help assertion: match on stable prefix after `/session delete` was
  added in batch 5
- prompt shorthand: fix copy-paste that changed expected prompt from
  "help me debug" to "$help overview"
- mock parity harness: filter captured requests to `/v1/messages` path
  only, excluding count_tokens preflight calls added by `be561bf`

All 6 failures were pre-existing but masked because `client_integration`
always failed first (fixed in 8c6dfe5).

Workspace: 810+ tests passing, 0 failing.
2026-04-08 14:54:10 +09:00
YeonGyu-Kim 8c6dfe57e6 fix(api): restore local preflight guard ahead of count_tokens round-trip
CI has been red since be561bf ('Use Anthropic count tokens for preflight')
because that commit replaced the free-function preflight_message_request
(byte-estimate guard) with an instance method that silently returns Ok on
any count_tokens failure:

    let counted_input_tokens = match self.count_tokens(request).await {
        Ok(count) => count,
        Err(_) => return Ok(()),  // <-- silent bypass
    };

Two consequences:

1. client_integration::send_message_blocks_oversized_requests_before_the_http_call
   has been FAILING on every CI run since be561bf. The mock server in that
   test only has one HTTP response queued (a bare '{}' to satisfy the main
   request), so the count_tokens POST parses into an empty body that fails
   to deserialize into CountTokensResponse -> Err -> silent bypass -> the
   oversized 600k-char request proceeds to the mock instead of being
   rejected with ContextWindowExceeded as the test expects.

2. In production, any third-party Anthropic-compatible gateway that doesn't
   implement /v1/messages/count_tokens (OpenRouter, Cloudflare AI Gateway,
   etc.) would silently disable the preflight guard entirely, letting
   oversized requests hit the upstream only to fail there with a provider-
   side context-window error. This is exactly the 'opaque failure surface'
   ROADMAP #22 asked us to avoid.

Fix: call the free-function super::preflight_message_request(request)? as
the first step in the instance method, before any network round-trip. This
guarantees the byte-estimate guard always fires, whether or not the remote
count_tokens endpoint is reachable. The count_tokens refinement still runs
afterward when available for more precise token counting, but it is now
strictly additive — it can only catch more cases, never silently skip the
guard.

Test results:
- cargo test -p api --lib: 89 passed, 0 failed
- cargo test --release -p api (all test binaries): 118 passed, 0 failed
- cargo test --release -p api --test client_integration \
    send_message_blocks_oversized_requests_before_the_http_call: passes
- cargo fmt --check: clean

This unblocks the Rust CI workflow which has been red on every push since
be561bf landed.
2026-04-08 14:34:38 +09:00
YeonGyu-Kim 3ac97e635e feat(api): add qwen/ prefix routing for Alibaba DashScope provider
Users in Discord #clawcode-get-help (web3g) asked for Qwen 3.6 Plus via
native Alibaba DashScope API instead of OpenRouter, which has stricter
rate limits. This commit adds first-class routing for qwen/ and bare
qwen- prefixed model names.

Changes:
- DEFAULT_DASHSCOPE_BASE_URL constant: /compatible-mode/v1 endpoint
- OpenAiCompatConfig::dashscope() factory mirroring openai()/xai()
- DASHSCOPE_ENV_VARS + credential_env_vars() wiring
- metadata_for_model: qwen/ and qwen- prefix routes to DashScope with
  auth_env=DASHSCOPE_API_KEY, reuses ProviderKind::OpenAi because
  DashScope speaks the OpenAI REST shape
- is_reasoning_model: detect qwen-qwq, qwq-*, and *-thinking variants
  so tuning params (temperature, top_p, etc.) get stripped before
  payload assembly (same pattern as o1/o3/grok-3-mini)

Tests added:
- providers::tests::qwen_prefix_routes_to_dashscope_not_anthropic
- openai_compat::tests::qwen_reasoning_variants_are_detected

89 api lib tests passing, 0 failing. cargo fmt --check: clean.

Closes the user-reported gap: 'use Qwen 3.6 Plus via Alibaba API
directly, not OpenRouter' without needing OPENAI_BASE_URL override
or unsetting ANTHROPIC_API_KEY.
2026-04-08 14:06:26 +09:00
YeonGyu-Kim 006f7d7ee6 fix(test): add env_lock to plugin lifecycle test — closes ROADMAP #24
build_runtime_runs_plugin_lifecycle_init_and_shutdown was the only test
that set/removed ANTHROPIC_API_KEY without holding the env_lock mutex.
Under parallel workspace execution, other tests racing on the same env
var could wipe the key mid-construction, causing a flaky credential error.

Root cause: process-wide env vars are shared mutable state. All other
tests that touch ANTHROPIC_API_KEY already use env_lock(). This test
was the only holdout.

Fix: add let _guard = env_lock(); at the top of the test.
2026-04-08 12:46:04 +09:00
YeonGyu-Kim 82baaf3f22 fix(ci): update integration test MessageRequest initializers for new tuning fields
openai_compat_integration.rs and client_integration.rs had MessageRequest
constructions without the new tuning param fields (temperature, top_p,
frequency_penalty, presence_penalty, stop) added in c667d47.

Added ..Default::default() to all 4 sites. cargo fmt applied.

This was the root cause of CI red on main (E0063 compile error in
integration tests, not caught by --lib tests).
2026-04-08 11:43:51 +09:00
YeonGyu-Kim c7b3296ef6 style: cargo fmt — fix CI formatting failures
Pre-existing formatting issues in anthropic.rs surfaced by CI cargo fmt check.
No functional changes.
2026-04-08 11:21:13 +09:00
YeonGyu-Kim 000aed4188 fix(commands): fix brittle /session help assertion after delete subcommand addition
renders_help_from_shared_specs hardcoded the exact /session usage string,
which broke when /session delete was added in batch 5. Relaxed to check
for /session presence instead of exact subcommand list.

Pre-existing test brittleness (not caused by recent commits).

687 workspace lib tests passing, 0 failing.
2026-04-08 09:33:51 +09:00
YeonGyu-Kim 523ce7474a fix(api): sanitize Anthropic body — strip frequency/presence_penalty, convert stop→stop_sequences
MessageRequest now carries OpenAI-compatible tuning params (c667d47), but
the Anthropic API does not support frequency_penalty or presence_penalty,
and uses 'stop_sequences' instead of 'stop'. Without this fix, setting
these params with a Claude model would produce 400 errors.

Changes to strip_unsupported_beta_body_fields:
- Remove frequency_penalty and presence_penalty from Anthropic request body
- Convert stop → stop_sequences (only when non-empty)
- temperature and top_p are preserved (Anthropic supports both)

Tests added:
- strip_removes_openai_only_fields_and_converts_stop
- strip_does_not_add_empty_stop_sequences

87 api lib tests passing, 0 failing.
cargo check --workspace: clean.
2026-04-08 09:05:10 +09:00
YeonGyu-Kim b513d6e462 fix(api): sanitize tuning params for reasoning models (o1/o3/grok-3-mini)
Reasoning models reject temperature, top_p, frequency_penalty, and
presence_penalty with 400 errors. Instead of letting these flow through
and returning cryptic provider errors, strip them silently at the
request-builder boundary.

is_reasoning_model() classifies: o1*, o3*, o4*, grok-3-mini.
stop sequences are preserved (safe for all providers).

Tests added:
- reasoning_model_strips_tuning_params: o1-mini strips all 4 params, keeps stop
- grok_3_mini_is_reasoning_model: classification coverage for grok-3-mini, o1,
  o3-mini, and negative cases (gpt-4o, grok-3, claude)

85 api lib tests passing, 0 failing.
2026-04-08 07:32:47 +09:00
YeonGyu-Kim c667d47c70 feat(api): add tuning params (temperature, top_p, penalties, stop) to MessageRequest
MessageRequest was missing standard OpenAI-compatible generation tuning
parameters. Callers had no way to control temperature, top_p,
frequency_penalty, presence_penalty, or stop sequences.

Changes:
- Added 5 optional fields to MessageRequest (all Option, None by default)
- Wired into build_chat_completion_request: only included in payload when set
- All existing construction sites updated with ..Default::default()
- MessageRequest now derives Default for ergonomic partial construction

Tests added:
- tuning_params_included_in_payload_when_set: all 5 params flow into JSON
- tuning_params_omitted_from_payload_when_none: absent params stay absent

83 api lib tests passing, 0 failing.
cargo check --workspace: 0 warnings.
2026-04-08 07:07:33 +09:00
YeonGyu-Kim 0530c509a3 fix(api): route openai/ and gpt- model prefixes to OpenAi provider
metadata_for_model returned None for unknown models like openai/gpt-4.1-mini,
causing detect_provider_kind to fall through to auth-sniffer order. If
ANTHROPIC_API_KEY was set, the model was silently misrouted to Anthropic
and the user got a confusing 'missing Anthropic credentials' error.

Fix: add explicit prefix checks for 'openai/' and 'gpt-' in
metadata_for_model so the model name wins over env-var presence.

Regression test added: openai_namespaced_model_routes_to_openai_not_anthropic
- 'openai/gpt-4.1-mini' routes to OpenAi
- 'gpt-4o' routes to OpenAi

Reported and reproduced by gaebal-gajae against current main.
81 api lib tests passing, 0 failing.
2026-04-08 05:33:47 +09:00
YeonGyu-Kim eff0765167 test(tools): fill WorkerGet and error-path coverage gaps
WorkerGet had zero test coverage. WorkerAwaitReady and WorkerSendPrompt
had only one happy-path test each with no error paths.

Added 4 tests:
- worker_get_returns_worker_state: WorkerGet fetches correct worker_id/status/cwd
- worker_get_on_unknown_id_returns_error: unknown id -> 'worker not found'
- worker_await_ready_on_spawning_worker_returns_not_ready: ready=false on spawning worker
- worker_send_prompt_on_non_ready_worker_returns_error: sending prompt before ready fails

94 tool tests passing, 0 failing.
2026-04-08 05:03:34 +09:00
YeonGyu-Kim aee5263aef test(tools): prove recovery loop against .claw/worker-state.json directly
recovery_loop_state_file_reflects_transitions reads the actual state
file after each transition to verify the canonical observability surface
reflects the full stall->resolve->ready progression:

  spawning (state file exists, seconds_since_update present)
  -> trust_required (is_ready=false, trust_gate_cleared=false in file)
  -> spawning (trust_gate_cleared=true after WorkerResolveTrust)
  -> ready_for_prompt (is_ready=true after ready screen observe)

This is the end-to-end proof gaebal-gajae called for: clawhip polling
.claw/worker-state.json will see truthful state at every step of the
recovery loop, including the seconds_since_update staleness signal.

90 tool tests passing, 0 failing.
2026-04-08 04:38:38 +09:00
YeonGyu-Kim 9461522af5 feat(tools): expose WorkerObserveCompletion tool; add provider-degraded classification tests
observe_completion() on WorkerRegistry classifies finish_reason into
Finished vs Failed (finish='unknown' + 0 tokens = provider degraded).
This logic existed in the runtime but had no tool wrapper — clawhip
could not call it. Added WorkerObserveCompletion as a first-class tool.

Tool schema:
  { worker_id, finish_reason: string, tokens_output: integer }

Handler: run_worker_observe_completion -> global_worker_registry().observe_completion()

Tests added:
- worker_observe_completion_success_finish_sets_finished_status
  finish=end_turn + tokens=512 -> status=finished
- worker_observe_completion_degraded_provider_sets_failed_status
  finish=unknown + tokens=0 -> status=failed, last_error populated

89 tool tests passing, 0 failing.
2026-04-08 04:35:05 +09:00
YeonGyu-Kim c08f060ca1 test(tools): end-to-end stall-detect and recovery loop coverage
Proves the clawhip restart/recover flow that gaebal-gajae flagged:

1. stall_detect_and_resolve_trust_end_to_end
   - Worker created without trusted_roots -> trust_auto_resolve=false
   - WorkerObserve with trust-prompt text -> status=trust_required, gate cleared=false
   - WorkerResolveTrust -> status=spawning, trust_gate_cleared=true
   - WorkerObserve with ready text -> status=ready_for_prompt
   Full resolve path verified end-to-end.

2. stall_detect_and_restart_recovery_end_to_end
   - Worker stalls at trust_required
   - WorkerRestart resets to spawning, trust_gate_cleared=false
   Documents the restart-then-re-acquire-trust flow.

Note: seconds_since_update is in .claw/worker-state.json (state file),
not in the Worker tool output struct. Staleness detection via state file
is covered by emit_state_file_writes_worker_status_on_transition in
worker_boot.rs tests.

87 tool tests passing, 0 failing.
2026-04-08 04:09:55 +09:00
YeonGyu-Kim cae11413dd fix(dead-code): remove stale constants + dead function; add workspace_sessions_dir tests
Three dead-code warnings eliminated from cargo check:

1. KNOWN_TOP_LEVEL_KEYS / DEPRECATED_TOP_LEVEL_KEYS in config.rs
   - Superseded by config_validate::TOP_LEVEL_FIELDS and DEPRECATED_FIELDS
   - Were out of date (missing aliases, providerFallbacks, trustedRoots)
   - Removed

2. read_git_recent_commits in prompt.rs
   - Private function, never called anywhere in the codebase
   - Removed

3. workspace_sessions_dir in session.rs
   - Public API scaffolded for session isolation (#41)
   - Genuinely useful for external consumers (clawhip enumerating sessions)
   - Added 2 tests: deterministic path for same CWD, different path for different CWDs
   - Annotated with #[allow(dead_code)] since it is external-facing API

cargo check --workspace: 0 warnings remaining
430 runtime tests passing, 0 failing
2026-04-08 04:04:54 +09:00
YeonGyu-Kim aa37dc6936 test(tools): add coverage for WorkerRestart and WorkerTerminate tools
WorkerRestart and WorkerTerminate had zero test coverage despite being
public tools in the tool spec. Also confirms one design decision worth
noting: restart resets trust_gate_cleared=false, so an allowlisted
worker that gets restarted must re-acquire trust via the normal observe
flow (by design — trust is per-session, not per-CWD).

Tests added:
- worker_terminate_sets_finished_status
- worker_restart_resets_to_spawning (verifies status=spawning,
  prompt_in_flight=false, trust_gate_cleared=false)
- worker_terminate_on_unknown_id_returns_error
- worker_restart_on_unknown_id_returns_error

85 tool tests passing, 0 failing.
2026-04-08 03:33:05 +09:00
YeonGyu-Kim 6ddfa78b7c feat(tools): wire config.trusted_roots into WorkerCreate tool
Previously WorkerCreate passed trusted_roots directly to spawn_worker
with no config-level default. Any batch script omitting the field
stalled all workers at TrustRequired with no recovery path.

Now run_worker_create loads RuntimeConfig from the worker CWD before
spawning and merges config.trusted_roots() with per-call overrides.
Per-call overrides still take effect; config provides the default.

Add test: worker_create_merges_config_trusted_roots_without_per_call_override
- writes .claw/settings.json with trustedRoots=[<os-temp-dir>] in a temp worktree
- calls WorkerCreate with no trusted_roots field
- asserts trust_auto_resolve=true (config roots matched the CWD)

81 tool tests passing, 0 failing.
2026-04-08 03:08:13 +09:00
YeonGyu-Kim bcdc52d72c feat(config): add trustedRoots to RuntimeConfig
Closes the startup-friction gap filed in ROADMAP (dd97c49).

WorkerCreate required trusted_roots on every call with no config-level
default. Any batch script that omitted the field stalled all workers at
TrustRequired with no auto-recovery path.

Changes:
- RuntimeFeatureConfig: add trusted_roots: Vec<String> field
- ConfigLoader: wire parse_optional_trusted_roots() for 'trustedRoots' key
- RuntimeConfig / RuntimeFeatureConfig: expose trusted_roots() accessor
- config_validate: add trustedRoots to TOP_LEVEL_FIELDS schema (StringArray)
- Tests: parses_trusted_roots_from_settings + trusted_roots_default_is_empty_when_unset

Callers can now set trusted_roots in .claw/settings.json:
  { "trustedRoots": ["/tmp/worktrees"] }

WorkerRegistry::spawn_worker() callers should merge config.trusted_roots()
with any per-call overrides (wiring left for follow-up).
2026-04-08 02:35:19 +09:00
YeonGyu-Kim 5dfb1d7c2b fix(config_validate): add missing aliases/providerFallbacks to schema; fix deprecated-key bypass
Two real schema gaps found via dogfood (cargo test -p runtime):

1. aliases and providerFallbacks not in TOP_LEVEL_FIELDS
   - Both are valid config keys parsed by config.rs
   - Validator was rejecting them as unknown keys
   - 2 tests failing: parses_user_defined_model_aliases,
     parses_provider_fallbacks_chain

2. Deprecated keys were being flagged as unknown before the deprecated
   check ran (unknown-key check runs first in validate_object_keys)
   - Added early-exit for deprecated keys in unknown-key loop
   - Keeps deprecated→warning behavior for permissionMode/enabledPlugins
     which still appear in valid legacy configs

3. Config integration tests had assertions on format strings that never
   matched the actual validator output (path:3: vs path: ... (line N))
   - Updated assertions to check for path + line + field name as
     independent substrings instead of a format that was never produced

426 tests passing, 0 failing.
2026-04-08 01:45:08 +09:00
YeonGyu-Kim fcb5d0c16a fix(worker_boot): add seconds_since_update to state snapshot
Clawhip needs to distinguish a stalled trust_required worker from one
that just transitioned. Without a pre-computed staleness field it has
to compute epoch delta itself from updated_at.

seconds_since_update = now - updated_at at snapshot write time.
Clawhip threshold: > 60s in trust_required = stalled; act.
2026-04-08 01:03:00 +09:00
YeonGyu-Kim 314f0c99fd feat(worker_boot): emit .claw/worker-state.json on every status transition
WorkerStatus is fully tracked in worker_boot.rs but was invisible to
external observers (clawhip, orchestrators) because opencode serve's
HTTP server is upstream and not ours to extend.

Solution: atomic file-based observability.

- emit_state_file() writes .claw/worker-state.json on every push_event()
  call (tmp write + rename for atomicity)
- Snapshot includes: worker_id, status, is_ready, trust_gate_cleared,
  prompt_in_flight, last_event, updated_at
- Add 'claw state' CLI subcommand to read and print the file
- Add regression test: emit_state_file_writes_worker_status_on_transition
  verifies spawning→ready_for_prompt transition is reflected on disk

This closes the /state dogfood gap without requiring any upstream
opencode changes. Clawhip can now distinguish a truly stalled worker
(status: trust_required or running with no recent updated_at) from a
quiet-but-progressing one.
2026-04-08 00:37:44 +09:00
YeonGyu-Kim 092d8b6e21 fix(tests): add missing test imports for session/prompt history features
Add missing imports to test module:
- PromptHistoryEntry, render_prompt_history_report, parse_history_count
- parse_export_args, render_session_markdown
- summarize_tool_payload_for_markdown, short_tool_id

Fixes test compilation errors introduced by new session and export
features from batch 5/6 work.
2026-04-07 16:20:33 +09:00
YeonGyu-Kim b3ccd92d24 feat: b6-pdf-extract-v2 follow-up work — batch 6 2026-04-07 16:11:51 +09:00
YeonGyu-Kim 0f2f02af2d feat: b6-http-proxy-v2 follow-up work — batch 6 2026-04-07 16:11:51 +09:00
YeonGyu-Kim e51566c745 feat: b6-bridge-directory follow-up work — batch 6 2026-04-07 16:11:50 +09:00
YeonGyu-Kim 20f3a5932a fix(cli): wire sessions_dir() through SessionStore::from_cwd() (#41)
The CLI was using a flat cwd/.claw/sessions/ path without workspace
fingerprinting, while SessionStore::from_cwd() adds a hash subdirectory.
This mismatch meant the isolation machinery existed but wasn't actually
used by the main session management codepath.

Now sessions_dir() delegates to SessionStore::from_cwd(), ensuring all
session operations use workspace-fingerprinted directories.
2026-04-07 16:03:44 +09:00
YeonGyu-Kim 28e6cc0965 feat(runtime): activate per-worktree session isolation (#41)
Remove #[cfg(test)] gate from session_control module — SessionStore
is now available at runtime, not just in tests. Export SessionStore and
add workspace_sessions_dir() helper that creates fingerprinted session
directories per workspace root.

This is the #41 kill shot: parallel opencode serve instances will use
separate session namespaces based on workspace fingerprint instead of
sharing a global ~/.local/share/opencode/ store.

The CLI already uses cwd/.claw/sessions/ (sessions_dir()), and now
SessionStore::from_cwd() adds workspace hash isolation on top.
2026-04-07 16:00:57 +09:00
YeonGyu-Kim f03b8dce17 feat: bridge directory metadata + stale-base preflight check
- Add CWD to SSE session events (kills Directory: unknown)
- Add stale-base preflight: verify HEAD matches expected base commit
- Warn on divergence before session starts
2026-04-07 15:55:38 +09:00
YeonGyu-Kim ecdca49552 feat: plugin-level max_output_tokens override via session_control 2026-04-07 15:55:38 +09:00
YeonGyu-Kim 5c276c8e14 feat: b6-pdf-extract-v2 — batch 6 2026-04-07 15:52:30 +09:00
YeonGyu-Kim 1f968b359f feat: b6-openai-models — batch 6 2026-04-07 15:52:30 +09:00
YeonGyu-Kim 18d3c1918b feat: b6-http-proxy-v2 — batch 6 2026-04-07 15:52:30 +09:00
YeonGyu-Kim 82f2e8e92b feat: doctor-cmd implementation 2026-04-07 15:28:43 +09:00
YeonGyu-Kim 8f4651a096 fix: resolve git_context field references after cherry-pick merge 2026-04-07 15:20:20 +09:00
YeonGyu-Kim dab16c230a feat: b5-session-export — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim a46711779c feat: b5-markdown-fence — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim ef0b870890 feat: b5-git-aware — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim 4557a81d2f feat: b5-doctor-cmd — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim 86c3667836 feat: b5-context-compress — batch 5 wave 2 2026-04-07 15:19:45 +09:00
YeonGyu-Kim 260bac321f feat: b5-config-validate — batch 5 wave 2 2026-04-07 15:19:44 +09:00
YeonGyu-Kim 133ed4581e feat(config): add config file validation with clear error messages
Parse TOML/JSON config on startup, emit errors for unknown keys, wrong
types, deprecated fields with exact line and field name.
2026-04-07 15:10:08 +09:00
YeonGyu-Kim 8663751650 fix: resolve merge conflicts from batch 5 cherry-picks (compact field, run_turn_with_output arity) 2026-04-07 14:53:46 +09:00
YeonGyu-Kim 90f2461f75 feat: b5-tool-timeout — batch 5 upstream parity 2026-04-07 14:51:32 +09:00
YeonGyu-Kim 0d8fd51a6c feat: b5-stdin-pipe — batch 5 upstream parity 2026-04-07 14:51:28 +09:00
YeonGyu-Kim 5bcbc86a2b feat: b5-slash-help — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim d509f16b5a feat: b5-skip-perms-flag — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim d089d1a9cc feat: b5-retry-backoff — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim 6a6c5acb02 feat: b5-reasoning-guard — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim 9105e0c656 feat: b5-openrouter-fix — batch 5 upstream parity 2026-04-07 14:51:26 +09:00
YeonGyu-Kim b8f76442e2 feat: b5-multi-provider — batch 5 upstream parity 2026-04-07 14:51:26 +09:00
YeonGyu-Kim b216f9ce05 feat: b5-max-token-plugin — batch 5 upstream parity 2026-04-07 14:51:26 +09:00
YeonGyu-Kim 4be4b46bd9 feat: b5-git-aware — batch 5 upstream parity 2026-04-07 14:51:26 +09:00
YeonGyu-Kim 506ff55e53 feat: b5-doctor-cmd — batch 5 upstream parity 2026-04-07 14:51:26 +09:00
YeonGyu-Kim 65f4c3ad82 feat: b5-cost-tracker — batch 5 upstream parity 2026-04-07 14:51:25 +09:00
YeonGyu-Kim 700534de41 feat: b5-context-compress — batch 5 upstream parity 2026-04-07 14:51:25 +09:00
YeonGyu-Kim 861edfc1dc fix(runtime): document phantom completion root cause + add workspace_root to session (#41)
Global session store causes cross-worktree confusion in parallel lanes.
Added workspace_root field to session metadata and documented root cause
in ROADMAP.md.
2026-04-07 14:22:41 +09:00
YeonGyu-Kim f982f24926 fix(api): Windows env hint + .env file loading fallback
When API key missing on Windows, hint about setx. Load .env from CWD
as fallback with simple key=value parser.
2026-04-07 14:22:41 +09:00
YeonGyu-Kim 8d866073c5 feat(cli): show active model and provider in startup banner
Prints 'Connected: <model> via <provider>' before REPL prompt.
2026-04-07 14:22:26 +09:00
YeonGyu-Kim 4251c85855 fix(cli): add section headers to OMC output for agent type grouping
voloshko: flat wall of text. Now groups output with section separators
by agent type (Explore, Implementation, Verification).
2026-04-07 14:22:06 +09:00
YeonGyu-Kim 2a642871ad fix(api): enrich JSON parse errors with response body, provider, and model
Raw 'json_error: no field X' now includes truncated response body,
provider name, and model ID for debugging context.
2026-04-07 14:22:05 +09:00
YeonGyu-Kim cd83c0ff68 fix(cli): detect OPENAI_BASE_URL during claw login and emit clear error
OAuth 401 was confusing. Now detects custom base URL and suggests
ANTHROPIC_API_KEY instead of OAuth login.
2026-04-07 14:22:05 +09:00
YeonGyu-Kim ce360e0ff3 fix(api): strip anthropic beta fields from non-beta requests
mikejiang: 'betas: Extra inputs are not permitted' 400 error.
Only include beta headers when request targets beta endpoint.
2026-04-07 14:22:05 +09:00
YeonGyu-Kim ce22d8fb4f fix(api): add serde(default) to all usage/token parse paths in SSE stream
Sterling reported 'json_error: no field input/input_tokens' still firing
despite existing serde(default) in types.rs. Root cause: SSE streaming
path had a separate deserialization site that didn't use the same defaults.

- Add serde(default) to sse.rs UsageEvent deserialization
- Add serde(default) to types.rs Usage struct fields (input_tokens, output_tokens)
- Add regression test with empty-usage JSON response in streaming context
2026-04-07 13:44:22 +09:00
Yeachan-Heo be561bfdeb Use Anthropic count tokens for preflight 2026-04-06 09:38:21 +00:00
Yeachan-Heo c1883d0f66 Clarify heuristic context window estimates 2026-04-06 09:26:08 +00:00
Yeachan-Heo 1fc5a1c457 Fix slash skill invoke normalization 2026-04-06 09:24:06 +00:00
Yeachan-Heo 549ad7c3af Restore compatibility skill lookup fallback 2026-04-06 09:11:27 +00:00
Yeachan-Heo ecadc5554a fix(auth): harden OAuth fallback and collapse thinking output 2026-04-06 09:02:21 +00:00
Yeachan-Heo 8ff9c1b15a Preserve recovery guidance for retried context-window failures
The CLI already reframes direct preflight and provider oversized-request
errors, but retry-wrapped provider failures still fell back to the generic
retry-exhausted surface because the user-visible formatter keyed off the
safe failure class. Route formatting through nested context-window
detection so wrapped provider failures keep the same compact/reduce-scope
guidance.

Constraint: Keep the fix UX-scoped without widening broader failure classification behavior
Rejected: Reorder safe_failure_class for all RetriesExhausted errors | broader semantic change than needed for this issue
Confidence: high
Scope-risk: narrow
Directive: Keep context-window rendering keyed to nested error inspection so provider wrappers do not lose recovery guidance
Tested: cargo fmt --check; cargo test -p rusty-claude-cli context_window; cargo test -p api oversized
Not-tested: Full workspace test suite
2026-04-06 09:02:21 +00:00
Yeachan-Heo 6bd464bbe7 Make repeated provider crashes self-identifying after retry exhaustion
Generic fatal wrapper handling already preserved safe classes and trace ids for single provider failures, but repeated retry exhaustion still surfaced as provider_internal. Classify generic wrapped RetriesExhausted failures as provider_retry_exhausted so Jobdori-style repeat failures stay distinguishable from one-off provider crashes, and keep the display logic clippy-clean.

Constraint: Keep the change minimal and preserve existing user-visible error wording outside retry-exhaustion classification
Rejected: Broadly rework all provider error taxonomy | unnecessary for the targeted opaque-wrapper regression
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep retry exhaustion distinct from single-shot provider_internal wrappers when the nested error is the same generic fatal wrapper
Tested: cargo test -p api detects_generic_fatal_wrapper_and_classifies_it_as_provider_internal
Tested: cargo test -p api retries_exhausted_preserves_nested_request_id_and_failure_class
Tested: cargo test -p rusty-claude-cli opaque_provider_wrapper_surfaces_failure_class_session_and_trace
Tested: cargo test -p rusty-claude-cli retry_exhaustion_uses_retry_failure_class_for_generic_provider_wrapper
Tested: cargo test --workspace
Tested: cargo fmt --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
Not-tested: Live OpenClaw/Anthropic service failure telemetry outside the local test harness
2026-04-06 09:01:38 +00:00
Yeachan-Heo 421ead7dba Remove orphaned skill lookup helpers 2026-04-06 07:56:50 +00:00
Yeachan-Heo f9cb42fb44 Resolve claw-code main merge conflicts 2026-04-06 07:16:57 +00:00
Yeachan-Heo 01b263c838 Let /skills invocations reach the prompt skill path
The CLI still treated every /skills payload other than list/install/help as local usage text, so skills that appeared in /skills could not actually be invoked. This restores prompt dispatch for /skills <skill> [args], keeps list/install on the local path, and shares skill resolution with the Skill tool so project-local and legacy /commands entries resolve consistently.

Constraint: --resume local slash execution still only supports local commands without provider turns
Rejected: Implement full resumed prompt-turn execution for /skills | larger behavior change outside this bugfix
Rejected: Keep separate skill lookups in tools and commands | drift already caused listing/invocation mismatches
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep /skills discovery, CLI prompt dispatch, and Tool Skill resolution on the same registry semantics
Tested: cargo fmt --all; cargo clippy -p commands -p tools -p rusty-claude-cli --all-targets -- -D warnings; cargo test --workspace -- --nocapture
Not-tested: Live provider-backed /skills invocation against external skill packs in an interactive REPL
2026-04-06 06:43:31 +00:00
Yeachan-Heo b930895736 Turn oversized-context failures into recovery guidance
Dogfood showed oversized requests still surfacing as raw hard errors, even when claw could tell the user exactly how to recover. This keeps context-window failures classified, recognizes the same failure when it comes back from a provider response, and renders recovery steps that point operators at the existing compaction and fresh-session paths instead of a provider-style dump.

Constraint: Keep the failure class explicit so automation and operators can still distinguish context-window exhaustion from generic provider failures
Constraint: Reuse existing /compact and session-reset UX instead of inventing a new recovery workflow
Rejected: Auto-run compaction on failure | mutates session state on an error path the user may want to inspect first
Rejected: Only prettify local preflight failures | provider-returned context-window errors would still leak raw failure text
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep provider-side context-window detection aligned with real oversized-request messages before broadening the marker list
Tested: cargo fmt --all --check
Tested: cargo test -p api
Tested: cargo test -p rusty-claude-cli
Tested: cargo clippy -p api -p rusty-claude-cli --all-targets -- -D warnings
Not-tested: cargo test --workspace
2026-04-06 06:43:31 +00:00
Yeachan-Heo fe4da2aa65 Keep resumed JSON command surfaces machine-readable
Resumed slash dispatch was still dropping back to prose for several JSON-capable local commands, which forced automation to special-case direct CLI invocations versus --resume flows. This routes resumed local-command handlers through the same structured JSON payloads used by direct status, sandbox, inventory, version, and init commands, and records the inventory parity audit result in the roadmap.

Constraint: Text-mode resumed output must stay unchanged for existing shell users
Rejected: Teach callers to scrape resumed text output | brittle and defeats the JSON contract
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a direct local command has a JSON renderer, keep resumed slash dispatch on the same serializer instead of adding one-off format branches
Tested: cargo fmt --check; cargo test --workspace; cargo clippy --workspace --all-targets -- -D warnings
Not-tested: Live provider-backed REPL resume flows outside the local test harness
2026-04-06 02:00:33 +00:00
Yeachan-Heo 53d6909b9b Emit structured doctor JSON diagnostics 2026-04-06 01:42:59 +00:00
Yeachan-Heo ceaf9cbc23 Preserve structured JSON parity for `claw agents`
`claw agents --output-format json` was still wrapping the text report,
which meant automation could not distinguish empty inventories from
populated agent definitions. Add a dedicated structured handler in the
commands crate, wire the CLI to it, and extend the contracts to cover
both empty and populated agent listings.

Constraint: Keep text-mode `claw agents` output unchanged while aligning JSON behavior with existing structured inventory handlers
Rejected: Parse the text report into JSON in the CLI layer | brittle duplication and no reusable structured handler
Confidence: high
Scope-risk: narrow
Directive: Keep inventory subcommands on dedicated structured handlers instead of serializing human-readable reports
Tested: cargo test -p commands renders_agents_reports_as_json; cargo test -p rusty-claude-cli --test output_format_contract; cargo test --workspace; cargo fmt --check; cargo clippy --workspace --all-targets -- -D warnings
Not-tested: Manual invocation of `claw agents --output-format json` outside automated tests
2026-04-06 01:42:59 +00:00
Yeachan-Heo ee92f131b0 Stabilize plugin lifecycle temp dirs across parallel tests 2026-04-06 01:18:56 +00:00
Yeachan-Heo 22e3f8c5e3 Fix retry exhaustion failure classification 2026-04-06 01:10:36 +00:00
Yeachan-Heo d94d792a48 Expose actionable ids for opaque provider failures
Issue #22 was triggered by generic upstream fatal wrappers that only surfaced 'Something went wrong', which left repeated Jobdori-style failures opaque in the CLI. Capture provider request ids on error responses, classify the known generic wrapper as provider_internal, and prefix the user-visible runtime error with the failure class plus session/trace identifiers so operators can correlate the failure quickly.

Constraint: Keep the fix small and user-safe without redesigning the broader runtime error taxonomy
Constraint: Preserve existing non-generic error text unless the wrapper is the known opaque fatal surface
Rejected: Broadly rewriting every runtime error into classified envelopes | unnecessary scope expansion for issue #22
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more opaque wrappers appear, extend the marker list and classification helper rather than reintroducing raw wrapper text alone
Tested: cargo test -p api detects_generic_fatal_wrapper_and_classifies_it_as_provider_internal -- --nocapture; cargo test -p api retries_exhausted_preserves_nested_request_id_and_failure_class -- --nocapture; cargo test -p rusty-claude-cli opaque_provider_wrapper_surfaces_failure_class_session_and_trace -- --nocapture; cargo test -p rusty-claude-cli retry_exhaustion_preserves_internal_failure_class_for_generic_provider_wrapper -- --nocapture; cargo test --workspace
Not-tested: Live upstream reproduction of the Jobdori failure against a real provider session
2026-04-06 00:30:28 +00:00
Yeachan-Heo 2bab4080d6 Keep resumed /status JSON aligned with live status output
The resumed slash-command path built a reduced status JSON payload by hand, so it drifted from the fresh status schema and dropped metadata like model, permission mode, workspace counters, and sandbox details. Reuse a shared status JSON builder for both code paths and tighten the resume regression tests to lock parity in place.

Constraint: Resume mode does not carry an active runtime model, so restored sessions continue to report the existing restored-session sentinel value
Rejected: Copy the fresh status JSON shape into the resume path again | would recreate the same schema drift risk
Confidence: high
Scope-risk: narrow
Directive: Keep resumed and fresh /status JSON on the same helper so future schema changes stay in parity
Tested: Reproduced failure in temporary HEAD worktree with strengthened resumed_status_command_emits_structured_json_when_requested
Tested: cargo test -p rusty-claude-cli resumed_status_command_emits_structured_json_when_requested --test resume_slash_commands -- --exact --nocapture
Tested: cargo test -p rusty-claude-cli doctor_and_resume_status_emit_json_when_requested --test output_format_contract -- --exact --nocapture
Tested: cargo test --workspace
Tested: cargo fmt --check
Tested: cargo clippy --workspace --all-targets -- -D warnings
2026-04-05 23:30:39 +00:00
Yeachan-Heo 831d8a2d4b Classify quiet agent states before they look stale
Persist derived machine states for agent manifests so downstream monitors can distinguish working, blocked, degraded, and finished-cleanable lanes without inferring everything from prose. This also records commit provenance in terminal-state manifests and marks the new session-state classification roadmap item as done.

Constraint: Keep the change scoped to manifest persistence and tests without introducing a new monitoring service layer
Rejected: Leave state classification as downstream text scraping only | repeated dogfood runs showed quiet/finished lanes being misreported as stale
Confidence: medium
Scope-risk: narrow
Directive: Reuse derived_state + commit provenance from manifests before adding any new stale-session heuristics elsewhere
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test -q -p tools
Tested: cd rust && cargo clippy -p tools --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still fails on unrelated pre-existing runtime lint debt
2026-04-05 18:47:23 +00:00
Yeachan-Heo d926d62e54 Restore a fully green workspace verification baseline
The remaining blocker after the roadmap backlog landed was workspace-wide clippy debt in runtime and adjacent test modules. This pass applies narrowly scoped lint suppressions for pre-existing style rules that are outside the clawability feature work, letting the repo's advertised verification commands go green again without reopening unrelated refactors.

Constraint: Keep behavior unchanged while making  pass on the current codebase
Rejected: Broad refactors of runtime subsystems to satisfy every lint structurally | too much risk for a follow-up verification-hardening pass
Confidence: medium
Scope-risk: narrow
Directive: Replace these targeted allows with real structural cleanup when those runtime modules are next touched for behavior changes
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test --workspace
Tested: cd rust && cargo clippy --workspace --all-targets -- -D warnings
Not-tested: No behavioral changes intended beyond verification status restoration
2026-04-05 18:46:06 +00:00
Yeachan-Heo 19c6b29524 Close the clawability backlog with deterministic CLI output and lane lineage
Finish the remaining roadmap work by making direct CLI JSON output deterministic across the non-interactive surface, restoring the degraded-startup MCP test as a real workspace test, and adding branch-lock plus commit-lineage primitives so downstream lane consumers can distinguish superseded worktree commits from canonical lineage.

Constraint: Keep the user-facing config namespace centered on .claw while preserving legacy fallback discovery for compatibility
Constraint: Verification needed to stay clean-room and reproducible from the checked-in workspace alone
Rejected: Leave the output-format contract implied by ad-hoc smoke runs only | too easy for direct CLI regressions to slip back into prose-only output
Rejected: Keep commit provenance as free-form detail text | downstream consumers need structured branch/worktree/supersession metadata
Confidence: medium
Scope-risk: moderate
Directive: Extend the JSON contract through the same direct CLI entrypoints instead of adding one-off serializers on parallel code paths
Tested: python .github/scripts/check_doc_source_of_truth.py
Tested: cd rust && cargo fmt --all --check
Tested: cd rust && cargo test --workspace
Tested: cd rust && cargo clippy -p commands -p tools -p rusty-claude-cli --all-targets --no-deps -- -D warnings
Not-tested: full cargo clippy --workspace --all-targets -- -D warnings still reports unrelated pre-existing runtime lint debt outside this change set
2026-04-05 18:41:02 +00:00
Yeachan-Heo f43375f067 Complete local claw-first CLI and config surface alignment 2026-04-05 18:11:25 +00:00
Yeachan-Heo 136cedf1cc Honor JSON output for skills and MCP inventory commands
The skills and mcp inventory handlers were still emitting prose tables even when the global --output-format json flag was set. This wires structured JSON renderers into the command handlers and CLI dispatch so direct invocations and resumed slash-command execution both return machine-readable payloads while preserving existing text output in the REPL path.

Constraint: Must preserve existing text output and help behavior for interactive slash commands
Rejected: Parse existing prose tables into JSON at the CLI edge | brittle and loses structured fields
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep text and JSON variants driven by the same command parsing branches so --output-format stays deterministic across entry points
Tested: cargo test -p commands
Tested: cargo test -p rusty-claude-cli
Not-tested: Manual invocation against a live user skills registry or external MCP services
2026-04-05 18:11:25 +00:00
Yeachan-Heo 2dd05bfcef Make .claw the only user-facing config namespace
Agents, skills, and init output were still surfacing .codex/.claude paths even though the runtime already treats .claw as the canonical config home. This updates help text, reports, skill install defaults, and repo bootstrap output to present a single .claw namespace while keeping legacy discovery fallbacks in place for existing setups.

Constraint: Existing .codex/.claude agent and skill directories still need to load for compatibility
Rejected: Remove legacy discovery entirely | would break existing user setups instead of just cleaning up surfaced output
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep future user-facing config, agent, and skill path copy aligned to .claw and  even when legacy fallbacks remain supported internally
Tested: cargo fmt --all --check; cargo test --workspace --exclude compat-harness
Not-tested: cargo clippy --workspace --all-targets -- -D warnings | fails in pre-existing unrelated runtime files (for example mcp_lifecycle_hardened.rs, mcp_tool_bridge.rs, lsp_client.rs, permission_enforcer.rs, recovery_recipes.rs, stale_branch.rs, task_registry.rs, team_cron_registry.rs, worker_boot.rs)
2026-04-05 18:11:25 +00:00
Yeachan-Heo 9b156e21cf Route nested CLI help requests to usage instead of operand fallthrough
The direct CLI wrappers for agents, skills, and mcp treated nested help flags as ordinary operands. That made commands like `claw mcp show --help` report a missing server and `claw skills install --help` fall into filesystem install logic instead of surfacing usage.

This change normalizes help-path arguments before dispatch so nested help stays on the help path. The regression tests cover both handler-level behavior and end-to-end CLI output for nested help and unknown subcommands with trailing help flags.

Constraint: Keep the fix scoped to direct CLI slash-command wrappers without changing unrelated parser behavior
Rejected: Rework top-level argument parsing for all subcommands | broader risk than needed for the regression
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more nested subcommands are added, extend the help-path normalization table before relying on raw operand dispatch
Tested: cargo build -p commands -p rusty-claude-cli
Tested: cargo test -p commands -p rusty-claude-cli
Not-tested: cargo clippy -p commands -p rusty-claude-cli --all-targets --no-deps -- -D warnings (pre-existing warnings in untouched files block clean run)
2026-04-05 18:11:25 +00:00
Yeachan-Heo f0d82a7cc0 Keep doctor and local help paths shell-native
Promote doctor into a real top-level CLI action, reuse the same local report for resumed and REPL doctor invocations, and intercept doctor/status/sandbox help flags before prompt-mode dispatch. The parser change also closes the help fallthrough that previously wandered into runtime startup for local-info commands.

Constraint: Preserve prompt shorthand for normal multi-word text input while fixing exact local subcommand help paths
Rejected: Route \7⠋ 🦀 Thinking...8✘  Request failed
 through prompt/slash guidance | still shells out through the wrong surface and keeps health checks hidden
Rejected: Reuse the status report as doctor output | status does not explain auth/config health or expose a dedicated diagnostic summary
Confidence: high
Scope-risk: narrow
Directive: Keep doctor local-only unless an explicit network probe is intentionally added and separately tested
Tested: cargo build -p rusty-claude-cli; cargo test -p rusty-claude-cli; cargo run -p rusty-claude-cli -- doctor --help; CLAW_CONFIG_HOME=/tmp/tmp.7pm9SVzOPN ANTHROPIC_API_KEY= ANTHROPIC_AUTH_TOKEN= cargo run -p rusty-claude-cli -- doctor
Not-tested: direct /doctor outside the REPL remains interactive-only
2026-04-05 18:11:25 +00:00
Yeachan-Heo f09e03a932 docs: sync Rust README with current implementation status 2026-04-05 18:08:00 +00:00
Yeachan-Heo c3b0e12164 Remove unshipped rusty-claude-cli prototype modules
The shipped CLI surface lives in `src/main.rs`, which only wires `init`,
`input`, and `render`. The legacy `app.rs` and `args.rs` prototypes were
not in the module tree and had no inbound references, so this change deletes
those orphaned files instead of widening scope into a larger refactor.

It also aligns the TUI enhancement plan with that reality so the document no
longer describes the removed prototypes as current tracked structure.

Constraint: Must preserve shipped CLI parsing and slash-command behavior
Rejected: Refactor main.rs into smaller modules now | widens scope beyond behavior-safe cleanup
Rejected: Leave TUI plan wording untouched | leaves low-risk stale documentation behind
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep this slice deletion-first; do not reintroduce alternate CLI surfaces without wiring them into main.rs and its tests
Tested: cargo test -p rusty-claude-cli defaults_to_repl_when_no_args
Tested: cargo test -p rusty-claude-cli parses_login_and_logout_subcommands
Tested: cargo test -p rusty-claude-cli parses_direct_agents_mcp_and_skills_slash_commands
Tested: cargo test -p rusty-claude-cli direct_slash_commands_surface_shared_validation_errors
Tested: cargo test -p rusty-claude-cli parses_resume_flag_with_multiple_slash_commands -- --nocapture
Tested: cargo test -p rusty-claude-cli resumed_binary_accepts_slash_commands_with_arguments -- --nocapture
Tested: cargo check -p rusty-claude-cli
Tested: git diff --check
Not-tested: cargo clippy -p rusty-claude-cli --all-targets -- -D warnings (pre-existing failures in rust/crates/runtime/* and existing warnings outside this diff)
2026-04-05 17:44:34 +00:00
Yeachan-Heo 31163be347 style: cargo fmt 2026-04-05 16:56:48 +00:00
Yeachan-Heo eb4d3b11ee merge fix/p2-19-subcommand-help-fallthrough 2026-04-05 16:54:59 +00:00
Yeachan-Heo 9bd7a78ca8 Merge branch 'fix/p2-18-context-window-preflight' 2026-04-05 16:54:45 +00:00
Yeachan-Heo 24d8f916c8 merge fix/p0-10-json-status 2026-04-05 16:54:38 +00:00
Yeachan-Heo 30883bddbd Keep doctor and local help paths shell-native
Promote doctor into a real top-level CLI action, reuse the same local report for resumed and REPL doctor invocations, and intercept doctor/status/sandbox help flags before prompt-mode dispatch. The parser change also closes the help fallthrough that previously wandered into runtime startup for local-info commands.

Constraint: Preserve prompt shorthand for normal multi-word text input while fixing exact local subcommand help paths
Rejected: Route \7⠋ 🦀 Thinking...8✘  Request failed
 through prompt/slash guidance | still shells out through the wrong surface and keeps health checks hidden
Rejected: Reuse the status report as doctor output | status does not explain auth/config health or expose a dedicated diagnostic summary
Confidence: high
Scope-risk: narrow
Directive: Keep doctor local-only unless an explicit network probe is intentionally added and separately tested
Tested: cargo build -p rusty-claude-cli; cargo test -p rusty-claude-cli; cargo run -p rusty-claude-cli -- doctor --help; CLAW_CONFIG_HOME=/tmp/tmp.7pm9SVzOPN ANTHROPIC_API_KEY= ANTHROPIC_AUTH_TOKEN= cargo run -p rusty-claude-cli -- doctor
Not-tested: direct /doctor outside the REPL remains interactive-only
2026-04-05 16:44:36 +00:00
Yeachan-Heo 1a2fa1581e Keep status JSON machine-readable for automation
The global --output-format json flag already reached prompt-mode responses, but
status and sandbox still bypassed that path and printed human-readable tables.
This change threads the selected output format through direct command aliases
and resumed slash-command execution so status queries emit valid structured
JSON instead of mixed prose.

It also adds end-to-end regression coverage for direct status/sandbox JSON
and resumed /status JSON so shell automation can rely on stable parsing.

Constraint: Global output formatting must stay compatible with existing text-mode reports
Rejected: Require callers to scrape text status tables | fragile and breaks automation
Confidence: high
Scope-risk: narrow
Directive: New direct commands that honor --output-format should thread the format through CliAction and resumed slash execution paths
Tested: cargo build -p rusty-claude-cli
Tested: cargo test -p rusty-claude-cli -- --nocapture
Tested: cargo test --workspace
Tested: cargo run -q -p rusty-claude-cli -- --output-format json status
Tested: cargo run -q -p rusty-claude-cli -- --output-format json sandbox
Not-tested: cargo clippy --workspace --all-targets -- -D warnings (fails in pre-existing runtime files unrelated to this change)
2026-04-05 16:41:02 +00:00
Yeachan-Heo fa72cd665e Block oversized requests before providers hard-fail
The runtime already tracked rough token estimates for compaction, but provider-bound
requests still relied on naive model output limits and could be sent upstream even
when the selected model could not fit the estimated prompt plus requested output.

This adds a small model token/context registry in the API layer, estimates request
size from the serialized prompt payload, and fails locally with a dedicated
context-window error before Anthropic or xAI calls are made. Focused integration
coverage asserts the preflight fires before any HTTP request leaves the process.

Constraint: Keep the first pass minimal and reusable across both Anthropic and OpenAI-compatible providers
Rejected: Auto-compact-and-retry in the same patch | broader control-flow change than the requested minimal preflight
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: Expand the model registry before enabling preflight for additional providers or aliases
Tested: cargo build -p api -p tools -p rusty-claude-cli; cargo test -p api
Not-tested: End-to-end CLI auto-compaction or retry behavior after a local context_window_blocked failure
2026-04-05 16:39:58 +00:00
Yeachan-Heo 1f53d961ff Route nested CLI help requests to usage instead of operand fallthrough
The direct CLI wrappers for agents, skills, and mcp treated nested help flags as ordinary operands. That made commands like `claw mcp show --help` report a missing server and `claw skills install --help` fall into filesystem install logic instead of surfacing usage.

This change normalizes help-path arguments before dispatch so nested help stays on the help path. The regression tests cover both handler-level behavior and end-to-end CLI output for nested help and unknown subcommands with trailing help flags.

Constraint: Keep the fix scoped to direct CLI slash-command wrappers without changing unrelated parser behavior
Rejected: Rework top-level argument parsing for all subcommands | broader risk than needed for the regression
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more nested subcommands are added, extend the help-path normalization table before relying on raw operand dispatch
Tested: cargo build -p commands -p rusty-claude-cli
Tested: cargo test -p commands -p rusty-claude-cli
Not-tested: cargo clippy -p commands -p rusty-claude-cli --all-targets --no-deps -- -D warnings (pre-existing warnings in untouched files block clean run)
2026-04-05 16:38:43 +00:00
Yeachan-Heo 3df5dece39 fix: suppress dead_code warnings for unused file_ops functions 2026-04-05 03:23:51 +00:00
Yeachan-Heo cd1ee43f33 fix: suppress dead_code warnings for unused provider and lane completion items 2026-04-05 03:22:32 +00:00
Yeachan-Heo 1fb3759e7c fix: remove unused imports in session_control.rs 2026-04-05 03:21:55 +00:00
Yeachan-Heo 22ad54c08e docs: describe the runtime public API surface
This adds crate-level and type-level Rustdoc to the runtime crate's core exported types so downstream crates and contributors can understand the session, prompt, permission, OAuth, usage, and tool I/O primitives without spelunking every implementation file.

Constraint: The docs pass needed to stay focused on public runtime types without changing behavior
Rejected: Add blanket docs to every public item in one sweep | larger churn than needed for a targeted docs pass
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When exporting new runtime primitives from lib.rs, add a short Rustdoc summary in the defining module at the same time
Tested: cargo build --workspace; cargo test --workspace
Not-tested: rustdoc HTML rendering beyond  doc-test coverage
2026-04-04 15:23:29 +00:00
Yeachan-Heo 953513f12d docs: add a current claw CLI usage guide
The root and Rust-facing docs now point readers at a single task-oriented usage guide with build, auth, CLI, session, and parity-harness examples. This also fixes stale workspace references and updates the Rust workspace inventory to match the current crate set.

Constraint: Existing README copy still referenced the old dev/rust status and needed to stay lightweight
Rejected: Fold all usage details into README.md only | too much noise for the landing page
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep USAGE examples aligned with  when CLI flags change
Tested: cargo build --workspace; cargo test --workspace
Not-tested: External links and rendered Markdown in GitHub UI
2026-04-04 15:23:22 +00:00