11 KiB
11 KiB
Implementation Gap Analysis — graphiti-ollama-reranker
1. Current State Investigation
Domain Assets
| Asset | Location | Current behavior |
|---|---|---|
_PassthroughReranker |
backend/app/services/graphiti_adapter.py:38-51 |
Subclass of graphiti_core.cross_encoder.client.CrossEncoderClient. rank(query, passages) returns (passage, 1.0 - 0.01 * i) tuples in input order — no model call. |
| Graphiti factory | backend/app/services/graphiti_adapter.py:142-162 (_get_graphiti) |
Double-checked-locking singleton. Branches on Config.GRAPHITI_LLM_PROVIDER (openai / gemini). Always injects _PassthroughReranker() as cross_encoder. Runs g.build_indices_and_constraints() on the persistent event loop. |
| LLM/embedder builder | backend/app/services/graphiti_adapter.py:92-139 (_build_llm_and_embedder) |
Lazy-imports provider-specific Graphiti classes. Reads Config.LLM_* and Config.EMBEDDING_*. |
| Config surface | backend/app/config.py:33-53 |
Single class with class attrs; each is os.environ.get('KEY', 'default'). Has EMBEDDING_MODEL, EMBEDDING_BASE_URL, EMBEDDING_API_KEY defaults aligned with local Ollama. |
| Graph-search callers | _GraphNamespace.search at graphiti_adapter.py:488-517; consumed by zep_tools.py:491 (ZepToolsService.search_graph) and oasis_profile_generator.py:313, 337. |
All call sites already dropped the misleading reranker= kwarg in graphiti-neo4j-finalize. They invoke client.graph.search(graph_id, query, limit, scope) only. |
| Existing LLM wrapper | backend/app/utils/llm_client.py |
Uses synchronous OpenAI() client. Includes reasoning-model <think> stripping and a JSON-mode retry. Not directly relevant to the reranker but documents the in-house OpenAI-SDK pattern. |
| Async-loop helper | graphiti_adapter.py:54-79 (_get_loop, _run) |
Persistent dedicated event-loop thread used for all Graphiti async calls. The reranker's rank is already awaited by Graphiti itself, not by _run, so the new client can use plain await on openai.AsyncOpenAI. |
Conventions Observed
- 4-space indent, snake_case, double quotes; English + Chinese mixed in comments — preserve both styles.
- New env vars go into
backend/app/config.pyas class attrs reading fromos.environ.getwith a sensible default. Validation is centralized inConfig.validate(). - New backend modules live under
backend/app/services/with module-levellogger = get_logger('mirofish.<topic>'). - The OpenAI SDK is the only LLM client. New providers do not add a second SDK — they add a base-URL + model knob.
- No tests for graph code beyond
scripts/test_profile_format.py; the project explicitly discourages adding a heavy test harness.
Integration Surfaces
- Upstream contract:
CrossEncoderClientis consumed bygraphiti_coreduringGraphiti.search()execution; the framework callsawait reranker.rank(query, passages)on whatever event loop the caller is using. - Inbound integration: only one wire point — the
cross_encoder=kwarg onGraphiti(...)in_get_graphiti()(graphiti_adapter.py:156). - Outbound integration: the reranker calls Ollama via
http://localhost:11434/v1/chat/completions(OpenAI-compatible). Already proven byEMBEDDING_BASE_URLfor embeddings; Ollama's chat endpoint follows the same surface.
2. Requirements Feasibility Analysis
Requirement-to-Asset Map
| Requirement | Existing assets | New assets needed | Gap tag |
|---|---|---|---|
| R1: Default is Ollama, not OpenAI default | _get_graphiti() already injects an explicit reranker (no default fallthrough). |
Switch the injected client class based on RERANKER_PROVIDER. |
Missing (selection logic). |
R2: Real CrossEncoderClient calling Ollama via OpenAI SDK |
Pattern proven in llm_client.py; openai already in pyproject.toml. |
New OllamaReranker class — subclass of CrossEncoderClient, uses openai.AsyncOpenAI for rank(). |
Missing. |
R3: Env knobs (RERANKER_PROVIDER/MODEL/BASE_URL/API_KEY) |
Config pattern is established (EMBEDDING_* etc.). |
Four new Config attrs, with defaults falling back to embedding settings where stated. |
Missing. |
R4: none provider preserves passthrough |
_PassthroughReranker already exists. |
Branch in _get_graphiti() to pick passthrough when provider == none. |
Missing (small). |
| R5: Graceful degradation when Ollama is down | _GraphNamespace.search (lines 515-517) already catches all exceptions and returns empty results with a warning log. |
Reranker rank must catch its own network/parse errors, log them, and return the original passages with synthetic scores so search still returns something. |
Missing (within new class). |
R6: Docs (.env.example, CLAUDE.md, README) |
Existing docs already document EMBEDDING_* in three places — pattern is clear. |
Add 4 new env lines + Ollama pull note. | Missing (text). |
| R7: Report tools get reranked output transparently | _GraphNamespace.search is the single chokepoint already used by all 4 tools (SearchResult, InsightForge, Panorama, Interview). |
None — wiring change in factory propagates automatically. | None (verification only). |
Constraints
- Async contract:
CrossEncoderClient.rankisasync def. The new client must be async. The OpenAI SDK providesopenai.AsyncOpenAIfor this. - Ollama model output shape: A small chat model (
qwen2.5:3b,llama3.2:3b) can be prompted to emit a numeric score; we cannot rely onlogprobsbecause Ollama's OpenAI-compatible surface does not always exposelogprobs/logit_biasconsistently. Therefore the scoring strategy is "ask the model for a 0–10 (or 0–1) relevance score per passage and parse it from the text response." - No new dependency allowed. Reranker must reuse
openaiSDK (already installed) — confirmed inbackend/.venv/lib/python3.13/site-packages/openai-2.35.1.dist-info/. - Boot must not fail when Ollama is unreachable (R5.4). Construction is cheap (build an
AsyncOpenAIclient; no network call). The model availability check happens lazily on firstrank().
Complexity Signals
- Mostly a single file plus config plus docs change. Algorithmic logic is local to the new class (prompt + parse). No data model changes, no API surface changes, no UI changes.
Research Needed (Carry into Design)
- Model choice: pick a small Ollama chat model that (a) is widely pulled, (b) reliably emits a numeric score in a 1–2 token answer, (c) is small enough to run on a typical dev machine. Candidates:
qwen2.5:3b,llama3.2:3b,phi3:3.8b. Design phase will fix the default. - Scoring strategy: per-passage call (N calls per query, simple to parse) vs. batched single-call (one prompt with all passages, harder to align). The per-passage approach is simpler and parallelizable via
asyncio.gather; latency is bounded by the slowest passage. Design will fix the strategy. - Output parsing: prefer JSON output (
{"score": 0.83}) with markdown-fence stripping (project convention fromllm_client.chat_json); fall back to regex-extract first float on parse failure.
3. Implementation Approach Options
Option A — Extend graphiti_adapter.py In Place
Add the OllamaReranker class directly to graphiti_adapter.py next to _PassthroughReranker, and branch in _get_graphiti().
- Trade-offs:
- ✅ Same module owns all reranker wiring and the singleton; one file to read.
- ✅ Smallest diff; matches the file's existing role as "everything Graphiti".
- ❌ Adds prompt/parse logic to an already long (≈545-line) adapter module.
- ❌ Harder to reuse the reranker outside Graphiti (unlikely, but precludes it).
Option B — Separate Module backend/app/services/ollama_reranker.py
New module owns the class and its prompt/parse helpers; graphiti_adapter.py imports it and selects it in _get_graphiti().
- Trade-offs:
- ✅ Clear single-responsibility module; mirrors the structure suggested in the source ticket #39.
- ✅ Adapter file stays focused on wiring; reranker can be unit-tested in isolation if testing is later added.
- ❌ Slightly more navigation; one extra file in
services/. - ❌ The provider-selection branch still lives in the adapter, so two files must agree on the provider string.
Option C — Hybrid: Provider Registry
Introduce a small _RERANKER_PROVIDERS map ("ollama" -> _build_ollama_reranker, "none" -> _PassthroughReranker) inside graphiti_adapter.py, with the actual class still living in a separate ollama_reranker.py.
- Trade-offs:
- ✅ Adding a future provider (e.g.
sentence_transformers) is a one-line registry change. - ✅ Keeps reranker class out of the adapter.
- ❌ Slight over-engineering for two providers (
ollama+none); ticket #39 explicitly scopes only the Ollama path.
- ✅ Adding a future provider (e.g.
4. Implementation Complexity & Risk
- Effort: S (1–3 days)
- One new class (~80–120 lines), four new config attrs (~10 lines), one factory branch (~10 lines), three doc updates (~30 lines). No schema or API changes.
- Risk: Low
- Established patterns (config, OpenAI SDK, logger).
_PassthroughRerankeris preserved exactly for thenonefallback, so the worst-case behavior is identical to today.- The graceful-failure path (R5) requires care, but the existing
_GraphNamespace.searchexception handling already insulates HTTP callers from reranker errors.
5. Recommendations for Design Phase
- Preferred approach: Option B (separate
ollama_reranker.pymodule). Best alignment with #39's "implement inbackend/app/services/", keepsgraphiti_adapter.pyfocused on Graphiti wiring, and matches the project's "one concern per module" pattern inservices/. - Key decisions to lock in design:
- Default
RERANKER_MODELvalue (recommendqwen2.5:3b— small, broadly available on Ollama, reliable at structured short outputs). - Per-passage scoring strategy with
asyncio.gatherparallelism (simpler, deterministic). - Prompt + parse format: ask for JSON
{"score": <0.0..1.0>}, strip fences, regex-fallback to first float. - Failure mode for a single passage: assign deterministic low score (e.g.
0.0 - 0.001 * i) so passage still appears once. - Failure mode for whole
rank()call: log warning, return original-order tuples with passthrough scores (no exception bubbles up). - Update
.kiro/specs/graphiti-neo4j-finalize/research.md"follow-up" note to point at this spec (R6.4).
- Default
- Research items carried forward:
- Confirm
qwen2.5:3bproduces stable JSON scores in benchmark prompts (or pick alternative). - Decide whether to expose
RERANKER_MAX_PARALLELfor concurrency limit (defaultlen(passages)— likely small, ≤10).
- Confirm