MicroFish/.kiro/specs/graphiti-ollama-reranker/tasks.md

8.8 KiB

Implementation Plan

Foundation tasks introduce the four RERANKER_* configuration knobs. Core tasks add the new OllamaReranker and the factory selection branch. Integration tasks wire documentation parity. Validation closes the loop with a structural sweep.

Foundation

  • 1. Add reranker configuration surface
  • 1.1 Introduce four RERANKER_* settings on the Config class
    • Add RERANKER_PROVIDER with default ollama, read via os.environ.get('RERANKER_PROVIDER', 'ollama').
    • Add RERANKER_MODEL with default qwen2.5:3b, read via os.environ.get('RERANKER_MODEL', 'qwen2.5:3b').
    • Add RERANKER_BASE_URL with default that chains to the embedding host: os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1')). Do not reference Config.EMBEDDING_BASE_URL directly; use the env-lookup form so behaviour stays consistent under reload patterns.
    • Add RERANKER_API_KEY with default that chains to the embedding key the same way (os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))).
    • Do not add the reranker to Config.validate(); the provider has no mandatory credentials.
    • Observable completion: a Python REPL that imports Config shows the four attributes with the documented defaults, and overriding EMBEDDING_BASE_URL in the environment is visible on Config.RERANKER_BASE_URL too.
    • Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6

Core

  • 2. Implement the Ollama-backed reranker
  • 2.1 Create the new reranker module with the CrossEncoderClient subclass
    • Define a new module under backend/app/services/ that hosts the reranker class. The class subclasses graphiti_core.cross_encoder.client.CrossEncoderClient and implements only the async rank method.
    • Constructor accepts model, base_url, api_key as keyword arguments; it instantiates openai.AsyncOpenAI(base_url=..., api_key=...) but performs no network I/O so the Flask app can boot when Ollama is unreachable.
    • rank(query, passages) short-circuits on empty passages and returns [] without any model call.
    • For each passage, send a single chat-completion request with temperature=0.0 and a deterministic system prompt asking for a JSON object {"score": <0.0..1.0>} describing the passage's relevance to the query. Use asyncio.gather to run all per-passage requests concurrently.
    • Parse each model response defensively: strip any <think>...</think> block, strip markdown code fences, attempt json.loads, fall back to regex-extract the first floating-point number, clip the value to [0.0, 1.0]. On any per-passage failure, assign a deterministic fallback score of -0.001 * passage_index and log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome.
    • Wrap the whole call in a try/except. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)] so search remains functional. The method must not raise.
    • Sort the returned list by score descending before returning.
    • Observable completion: instantiating the new class with a deliberately bad base_url does not raise; an async call to rank("q", []) returns []; an async call with two non-empty passages against a reachable Ollama returns two (passage, float) tuples in descending-score order, with every input passage byte-identical in the output.
    • Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1
    • Boundary: OllamaReranker module

Integration

  • 3. Wire the new reranker into the Graphiti factory

  • 3.1 Select the reranker inside _get_graphiti() based on Config.RERANKER_PROVIDER

    • Introduce a small allow-list constant alongside _ALLOWED_GRAPHITI_PROVIDERS enumerating ("ollama", "none").
    • Read Config.RERANKER_PROVIDER, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raise ValueError with a message that names the offending value and lists the accepted values — same shape as the existing GRAPHITI_LLM_PROVIDER validation.
    • For ollama, construct the new OllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY) and pass it as the cross_encoder= argument to Graphiti(...).
    • For none, continue to pass _PassthroughReranker() as today; do not change the passthrough class.
    • Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log).
    • Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime.
    • Observable completion: with RERANKER_PROVIDER unset, app startup logs Initializing Graphiti reranker (provider=ollama)... and Graphiti is constructed with the OllamaReranker. With RERANKER_PROVIDER=none, the log reports none and Graphiti uses _PassthroughReranker. With RERANKER_PROVIDER=banana, _get_graphiti() raises ValueError listing ('ollama', 'none').
    • Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3
    • Depends: 1.1, 2.1
  • 4. Update operator-facing documentation

  • 4.1 (P) Add the new env knobs to .env.example (deferred — sandbox hook blocks all .env* access; see HANDOFF.md)

    • Insert a four-line RERANKER_* block adjacent to the existing EMBEDDING_* block, mirroring the comment style (default, accepted values, and a one-line note that RERANKER_PROVIDER=none disables reranking).
    • Observable completion: opening .env.example shows the four new variables with documented defaults, positioned next to the embedding block.
    • Requirements: 6.1
    • Boundary: .env.example
    • Depends: 1.1
  • 4.2 (P) Extend the Required Environment Variables snippet in CLAUDE.md

    • Add the four RERANKER_* variables to the existing fenced code block under "Required Environment Variables" in CLAUDE.md, keeping the same comment style used for the EMBEDDING_* block.
    • Observable completion: CLAUDE.md documents the four reranker variables next to the embedding block and includes a note that RERANKER_PROVIDER=none keeps the previous passthrough behaviour.
    • Requirements: 6.2
    • Boundary: CLAUDE.md
    • Depends: 1.1
  • 4.3 (P) Document the Ollama pull prerequisite and env block in README.md

    • In the existing "Install Ollama and pull the default embedding model" section, add a parallel ollama pull qwen2.5:3b step (or note that the model used for reranking must be pulled, using the documented default).
    • In the .env snippet under "Configure Environment Variables", add the four RERANKER_* lines with brief comments mirroring the embedding-block style.
    • Treat README-EN.md and README-ZH.md translations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift.
    • Observable completion: README.md shows the ollama pull qwen2.5:3b step and the four reranker env lines in the .env snippet.
    • Requirements: 6.3
    • Boundary: README.md
    • Depends: 1.1
  • 4.4 (P) Update the stale follow-up claim in the prior spec

    • In .kiro/specs/graphiti-neo4j-finalize/research.md, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped under graphiti-ollama-reranker. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough.
    • Observable completion: a grep for "real per-provider reranker is a follow-up" across .kiro/specs/ returns either zero hits or a pointer note to graphiti-ollama-reranker.
    • Requirements: 6.4
    • Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md

Validation

  • 5. Structural verification sweep
  • 5.1 Grep for legacy reranker references and verify the new wiring is reachable
    • Grep backend/app/services/ for gpt-4.1-nano and OpenAIRerankerClient; both must return zero hits in code paths owned by this spec.
    • Grep backend/app/services/graphiti_adapter.py for the symbol of the new reranker class; confirm there is exactly one import site and one use site (the _get_graphiti() branch).
    • Confirm the four ReportAgent tools (SearchResult, InsightForge, Panorama, Interview) require no source changes by grepping for client.graph.search( call sites and verifying the kwarg shape is unchanged.
    • Confirm _GraphNamespace.search still filters by group_id (no regression to project isolation).
    • Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged.
    • Requirements: 1.4, 7.1, 7.2, 7.3
    • Depends: 3.1