fix: remove response_format=json_object from chat_json, increase ontology max_tokens

Bug 1: chat_json() was passing response_format={'type': 'json_object'} to the LLM, which enforces JSON grammar from token 0. Reasoning models (Qwen3, DeepSeek-R1, etc.) generate <think>...</think> blocks before JSON output, causing garbled results. The fix removes the response_format parameter since the system prompt already requests JSON output and the existing <think> cleanup handles any remaining tags. Bug 2: ontology_generator hardcoded max_tokens=4096, causing truncation for models with larger context windows. Increased to 16384 to accommodate reasoning model outputs. Fixes #642
2026-05-27 02:36:58 +00:00 · 2026-05-27 02:36:58 +00:00 · 0a3272197b
parent 96096ea0ff
commit 0a3272197b
2 changed files with 1 additions and 2 deletions
--- a/backend/app/services/ontology_generator.py
+++ b/backend/app/services/ontology_generator.py
@ -217,7 +217,7 @@ class OntologyGenerator:
        result = self.llm_client.chat_json(
            messages=messages,
            temperature=0.3,
-            max_tokens=4096
+            max_tokens=16384
        )
        # 验证和后处理
--- a/backend/app/utils/llm_client.py
+++ b/backend/app/utils/llm_client.py
@ -88,7 +88,6 @@ class LLMClient:
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens,
            response_format={"type": "json_object"}
        )
        # 清理markdown代码块标记
        cleaned_response = response.strip()