Bug 1: chat_json() was passing response_format={'type': 'json_object'}
to the LLM, which enforces JSON grammar from token 0. Reasoning
models (Qwen3, DeepSeek-R1, etc.) generate <think>...</think> blocks
before JSON output, causing garbled results. The fix removes the
response_format parameter since the system prompt already requests
JSON output and the existing <think> cleanup handles any remaining
tags.
Bug 2: ontology_generator hardcoded max_tokens=4096, causing
truncation for models with larger context windows. Increased to
16384 to accommodate reasoning model outputs.
Fixes #642
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| file_parser.py | ||
| llm_client.py | ||
| locale.py | ||
| logger.py | ||
| retry.py | ||
| zep_paging.py | ||