Closes#624, #622, #601, #599, #577
## LLM JSON parsing (#624 / #622 / #601)
- New `_parse_llm_json()` in llm_client.py with 5-stage fallback:
1. Strip markdown fences (existing)
2. Strict json.loads (fast path)
3. json.JSONDecoder.raw_decode (handles trailing prose after JSON)
4. Balanced-brace extraction (leading prose + embedded JSON)
5. Strip control chars + retry
- Replaces strict json.loads in chat_json() that was failing on any LLM
appending text after the JSON (common with qwen-plus, ollama, gemma even
with response_format=json_object).
- Logs which fallback was used so problematic LLMs are visible.
- 8 unit-test cases covering each strategy.
## Report section tool_call leak (#599)
- New `_sanitize_section_content()` in report_agent.py detects when a
section's "final answer" is actually an unexecuted tool_call JSON
(e.g. `{"name":"quick_search","parameters":{...}}`) and replaces it
with a clear fallback message instead of writing the raw artifact to
the report.
- Applied at all 3 places where final_answer is returned in
write_section(): the Final Answer path, the no-prefix fallback, and
the force-finalize path.
## Chat history duplicate user message (#577)
- In report_agent.py chat(), defensively dedupe chat_history:
- Only keep {role, content} from history items
- Skip entries that match the current message exactly
- This prevents LLM from seeing a duplicate trailing user message and
echoing back the previous answer.
- Added debug log of constructed messages array for diagnostics.