MicroFish/.kiro/specs/graphiti-ollama-reranker/tasks.md

# Implementation Plan

> Foundation tasks introduce the four `RERANKER_*` configuration knobs.
> Core tasks add the new `OllamaReranker` and the factory selection branch.
> Integration tasks wire documentation parity.
> Validation closes the loop with a structural sweep.

## Foundation

- [x] 1. Add reranker configuration surface
- [x] 1.1 Introduce four `RERANKER_*` settings on the `Config` class
  - Add `RERANKER_PROVIDER` with default `ollama`, read via `os.environ.get('RERANKER_PROVIDER', 'ollama')`.
  - Add `RERANKER_MODEL` with default `qwen2.5:3b`, read via `os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')`.
  - Add `RERANKER_BASE_URL` with default that chains to the embedding host: `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`. Do not reference `Config.EMBEDDING_BASE_URL` directly; use the env-lookup form so behaviour stays consistent under reload patterns.
  - Add `RERANKER_API_KEY` with default that chains to the embedding key the same way (`os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))`).
  - Do not add the reranker to `Config.validate()`; the provider has no mandatory credentials.
  - Observable completion: a Python REPL that imports `Config` shows the four attributes with the documented defaults, and overriding `EMBEDDING_BASE_URL` in the environment is visible on `Config.RERANKER_BASE_URL` too.
  - _Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6_

## Core

- [x] 2. Implement the Ollama-backed reranker
- [x] 2.1 Create the new reranker module with the `CrossEncoderClient` subclass
  - Define a new module under `backend/app/services/` that hosts the reranker class. The class subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements only the async `rank` method.
  - Constructor accepts `model`, `base_url`, `api_key` as keyword arguments; it instantiates `openai.AsyncOpenAI(base_url=..., api_key=...)` but performs no network I/O so the Flask app can boot when Ollama is unreachable.
  - `rank(query, passages)` short-circuits on empty `passages` and returns `[]` without any model call.
  - For each passage, send a single chat-completion request with `temperature=0.0` and a deterministic system prompt asking for a JSON object `{"score": <0.0..1.0>}` describing the passage's relevance to the query. Use `asyncio.gather` to run all per-passage requests concurrently.
  - Parse each model response defensively: strip any `<think>...</think>` block, strip markdown code fences, attempt `json.loads`, fall back to regex-extract the first floating-point number, clip the value to `[0.0, 1.0]`. On any per-passage failure, assign a deterministic fallback score of `-0.001 * passage_index` and log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome.
  - Wrap the whole call in a `try/except`. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` so search remains functional. The method must not raise.
  - Sort the returned list by score descending before returning.
  - Observable completion: instantiating the new class with a deliberately bad `base_url` does not raise; an async call to `rank("q", [])` returns `[]`; an async call with two non-empty passages against a reachable Ollama returns two `(passage, float)` tuples in descending-score order, with every input passage byte-identical in the output.
  - _Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1_
  - _Boundary: OllamaReranker module_

## Integration

- [x] 3. Wire the new reranker into the Graphiti factory
- [x] 3.1 Select the reranker inside `_get_graphiti()` based on `Config.RERANKER_PROVIDER`
  - Introduce a small allow-list constant alongside `_ALLOWED_GRAPHITI_PROVIDERS` enumerating `("ollama", "none")`.
  - Read `Config.RERANKER_PROVIDER`, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raise `ValueError` with a message that names the offending value and lists the accepted values — same shape as the existing `GRAPHITI_LLM_PROVIDER` validation.
  - For `ollama`, construct the new `OllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY)` and pass it as the `cross_encoder=` argument to `Graphiti(...)`.
  - For `none`, continue to pass `_PassthroughReranker()` as today; do not change the passthrough class.
  - Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log).
  - Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime.
  - Observable completion: with `RERANKER_PROVIDER` unset, app startup logs `Initializing Graphiti reranker (provider=ollama)...` and Graphiti is constructed with the `OllamaReranker`. With `RERANKER_PROVIDER=none`, the log reports `none` and Graphiti uses `_PassthroughReranker`. With `RERANKER_PROVIDER=banana`, `_get_graphiti()` raises `ValueError` listing `('ollama', 'none')`.
  - _Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3_
  - _Depends: 1.1, 2.1_

- [ ] 4. Update operator-facing documentation
- [ ] 4.1 (P) Add the new env knobs to `.env.example`  *(deferred — sandbox hook blocks all `.env*` access; see HANDOFF.md)*
  - Insert a four-line `RERANKER_*` block adjacent to the existing `EMBEDDING_*` block, mirroring the comment style (default, accepted values, and a one-line note that `RERANKER_PROVIDER=none` disables reranking).
  - Observable completion: opening `.env.example` shows the four new variables with documented defaults, positioned next to the embedding block.
  - _Requirements: 6.1_
  - _Boundary: .env.example_
  - _Depends: 1.1_

- [x] 4.2 (P) Extend the `Required Environment Variables` snippet in `CLAUDE.md`
  - Add the four `RERANKER_*` variables to the existing fenced code block under "Required Environment Variables" in `CLAUDE.md`, keeping the same comment style used for the `EMBEDDING_*` block.
  - Observable completion: `CLAUDE.md` documents the four reranker variables next to the embedding block and includes a note that `RERANKER_PROVIDER=none` keeps the previous passthrough behaviour.
  - _Requirements: 6.2_
  - _Boundary: CLAUDE.md_
  - _Depends: 1.1_

- [x] 4.3 (P) Document the Ollama pull prerequisite and env block in `README.md`
  - In the existing "Install Ollama and pull the default embedding model" section, add a parallel `ollama pull qwen2.5:3b` step (or note that the model used for reranking must be pulled, using the documented default).
  - In the `.env` snippet under "Configure Environment Variables", add the four `RERANKER_*` lines with brief comments mirroring the embedding-block style.
  - Treat `README-EN.md` and `README-ZH.md` translations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift.
  - Observable completion: `README.md` shows the `ollama pull qwen2.5:3b` step and the four reranker env lines in the `.env` snippet.
  - _Requirements: 6.3_
  - _Boundary: README.md_
  - _Depends: 1.1_

- [x] 4.4 (P) Update the stale follow-up claim in the prior spec
  - In `.kiro/specs/graphiti-neo4j-finalize/research.md`, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped under `graphiti-ollama-reranker`. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough.
  - Observable completion: a grep for "real per-provider reranker is a follow-up" across `.kiro/specs/` returns either zero hits or a pointer note to `graphiti-ollama-reranker`.
  - _Requirements: 6.4_
  - _Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md_

## Validation

- [x] 5. Structural verification sweep
- [x] 5.1 Grep for legacy reranker references and verify the new wiring is reachable
  - Grep `backend/app/services/` for `gpt-4.1-nano` and `OpenAIRerankerClient`; both must return zero hits in code paths owned by this spec.
  - Grep `backend/app/services/graphiti_adapter.py` for the symbol of the new reranker class; confirm there is exactly one import site and one use site (the `_get_graphiti()` branch).
  - Confirm the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) require no source changes by grepping for `client.graph.search(` call sites and verifying the kwarg shape is unchanged.
  - Confirm `_GraphNamespace.search` still filters by `group_id` (no regression to project isolation).
  - Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged.
  - _Requirements: 1.4, 7.1, 7.2, 7.3_
  - _Depends: 3.1_