Merge pull request #40 from salestech-group/fix/39-ollama-reranker

fix(graph): replace passthrough reranker with ollama-backed cross-encoder
2026-05-11 12:43:02 +02:00 · 2026-05-11 12:43:02 +02:00 · 04a00ac437
parent 091f70f964 fb0ac4b5fe
commit 04a00ac437
15 changed files with 1130 additions and 8 deletions
--- a/.kiro/specs/graphiti-neo4j-finalize/HANDOFF.md
+++ b/.kiro/specs/graphiti-neo4j-finalize/HANDOFF.md
@ -62,7 +62,7 @@ Same upload+build flow; expect identical behaviour to pre-change implementation.
 ## Notes for reviewers

 - **Default provider flipped** from Gemini (de-facto) to OpenAI-compatible (documented). Existing Gemini deployments must add `GRAPHITI_LLM_PROVIDER=gemini` to `.env` after pulling. Documented in the new `.env.example` and design.md migration section.
- **Reranker is still passthrough** — same behavioural state as before (no real reranking). A real per-provider reranker is intentionally deferred; explanation in `research.md` → "Reranker default behaviour".
+- **Reranker is still passthrough** — same behavioural state as before (no real reranking). _Update:_ this was deferred from this spec and has since shipped in follow-up spec `graphiti-ollama-reranker` (ticket #39): the default is now an Ollama-backed `CrossEncoderClient`; `RERANKER_PROVIDER=none` preserves the passthrough behaviour described here.
 - **`.env.example` write went through Python heredoc** because `pre_tool_env_guard.sh` blocks `cat > .env*` patterns. Worth confirming the file content is what you expect; the new content mirrors the README env section verbatim.

 ## Spec artefacts
--- a/.kiro/specs/graphiti-neo4j-finalize/design.md
+++ b/.kiro/specs/graphiti-neo4j-finalize/design.md
@ -16,7 +16,7 @@
 - `.env.example` matches what the code reads; the README is unchanged (already correct).

 ### Non-Goals
- Implementing a real per-provider reranker (deferred to a follow-up).
+- Implementing a real per-provider reranker (deferred to a follow-up — shipped in `graphiti-ollama-reranker`, ticket #39).
 - Pagination cleanup of `_NodeNamespace.get_by_graph_id` / `_EdgeNamespace.get_by_graph_id` (low priority, deferred).
 - Renaming `zep_*` files (tracked separately).
 - Migrating data from existing Zep Cloud deployments (project is local-only by design now).
@ -336,7 +336,7 @@ class _PassthroughReranker(CrossEncoderClient):
 **Implementation Notes**
 - Integration: Always injected by `_get_graphiti()` regardless of provider.
 - Validation: None.
- Risks: Search results are still un-reranked. Same behaviour as today; future ticket may introduce a real per-provider reranker.
+- Risks: Search results are still un-reranked. Same behaviour as today; superseded by follow-up spec `graphiti-ollama-reranker` (ticket #39), which introduces a real Ollama-backed reranker and keeps this passthrough only when `RERANKER_PROVIDER=none`.

 #### `_get_graphiti()` (refactored)

--- a/.kiro/specs/graphiti-neo4j-finalize/research.md
+++ b/.kiro/specs/graphiti-neo4j-finalize/research.md
@ -24,7 +24,7 @@
 - **Context**: Ticket suggests dropping `_GeminiReranker` and "letting Graphiti use its sane default." Verify the default is sane for Qwen.
 - **Sources Consulted**: `graphiti_core/graphiti.py:154`, `graphiti_core/cross_encoder/openai_reranker_client.py`.
 - **Findings**: Default is `OpenAIRerankerClient()` with no config → tries `AsyncOpenAI(api_key=None, base_url=None)` → 401 against any non-OpenAI key. Reranker model is fixed to `gpt-4.1-nano`, which Dashscope does not host.
- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker is out of scope (would need a custom OpenAI-compatible logprobs implementation, which Dashscope/Qwen does not reliably support).
+- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker was out of scope for this spec; follow-up spec `graphiti-ollama-reranker` (ticket #39) replaces the passthrough with an Ollama-backed `CrossEncoderClient` and keeps `_PassthroughReranker` only when `RERANKER_PROVIDER=none`.

 ### Env-guard hook scope
 - **Context**: First Read of `.env.example` was blocked.
--- a/.kiro/specs/graphiti-ollama-reranker/HANDOFF.md
+++ b/.kiro/specs/graphiti-ollama-reranker/HANDOFF.md
@ -0,0 +1,53 @@
+# Handoff — graphiti-ollama-reranker
+
+## What shipped
+
+| Task | Status | Notes |
+|------|--------|-------|
+| 1.1 — Config knobs | ✅ | Four `RERANKER_*` attrs added; `BASE_URL`/`API_KEY` chain to `EMBEDDING_*`. |
+| 2.1 — `OllamaReranker` | ✅ | New `backend/app/services/ollama_reranker.py`. Construction is side-effect-free; `rank()` never raises; per-passage parse falls back to deterministic low score; whole-call failure degrades to passthrough order with a single WARNING log. |
+| 3.1 — Factory wiring | ✅ | `_get_graphiti()` selects the reranker via new `_build_reranker()`. INFO log announces selection. `ValueError` raised for unknown providers. |
+| 4.1 — `.env.example` | ⚠️ Deferred | The `pre_tool_env_guard.sh` Claude hook blocks all `.env*` access (Read, Write, Edit, Bash). Cannot be performed inside this autonomous sandbox. **Reviewer action required** — see snippet below. |
+| 4.2 — `CLAUDE.md` | ✅ | New `RERANKER_*` block added under "Required Environment Variables". |
+| 4.3 — `README.md` | ✅ | Adds `ollama pull qwen2.5:3b` to the prerequisites and a `RERANKER_*` block in the `.env` snippet. `README-EN.md` / `README-ZH.md` left out per design scope (i18n is its own workstream). |
+| 4.4 — Prior-spec follow-up note | ✅ | Updated `graphiti-neo4j-finalize`'s `research.md`, `design.md`, and `HANDOFF.md` to point at this spec; updated the `_PassthroughReranker` docstring in `graphiti_adapter.py`. |
+| 5.1 — Structural sweep | ✅ | `gpt-4.1-nano` / `OpenAIRerankerClient` referenced only in docstring text. `OllamaReranker` has exactly one import + one use site. `_GraphNamespace.search` still filters by `group_id`. |
+
+## Reviewer action required: `.env.example`
+
+Please paste the following block into `.env.example` alongside the existing `EMBEDDING_*` section:
+
+```env
+# Reranker — reorders Graphiti search results before the report tools see them.
+# Default targets the same local Ollama host used for embeddings.
+# Pre-requisite for the default: `ollama pull qwen2.5:3b`.
+# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI /
+# slim containers that cannot pull a reranker model).
+RERANKER_PROVIDER=ollama
+RERANKER_MODEL=qwen2.5:3b
+# Optional — both default to the EMBEDDING_* equivalents when unset.
+# RERANKER_BASE_URL=http://localhost:11434/v1
+# RERANKER_API_KEY=ollama
+```
+
+This block matches what `CLAUDE.md` and `README.md` document. After paste, R6.1 is satisfied and ticket #39's acceptance-criteria checkbox "Configuration is overridable via env vars and documented in `.env.example`" becomes green.
+
+## Verification performed
+
+- `Config` loads with the documented defaults; `EMBEDDING_BASE_URL` override propagates to `RERANKER_BASE_URL`.
+- `OllamaReranker` constructs without network I/O; empty `passages` returns `[]`; whole-call failure logs WARNING and returns passthrough-ordered tuples.
+- `_build_reranker("ollama")` → `OllamaReranker`; `("none")` → `_PassthroughReranker`; `("banana")` → `ValueError` naming the offender and listing `("ollama", "none")`.
+- Grep sweep matches design expectations (see Tasks 5.1 in `tasks.md`).
+
+## Smoke test (recommended before merge)
+
+With Ollama running and the reranker model pulled:
+
+```bash
+ollama pull qwen2.5:3b
+RERANKER_PROVIDER=ollama npm run backend
+# In another shell, exercise a graph build + report tool and confirm:
+#   - Startup log shows "Initializing Graphiti reranker (provider=ollama)..."
+#   - Search-backed report tool results differ from `RERANKER_PROVIDER=none` output
+#   - No WARNING about reranker failure in `backend/logs/`
+```
--- a/.kiro/specs/graphiti-ollama-reranker/design.md
+++ b/.kiro/specs/graphiti-ollama-reranker/design.md
@ -0,0 +1,395 @@
+# Design — graphiti-ollama-reranker
+
+## Overview
+**Purpose**: Replace the no-op `_PassthroughReranker` injected into Graphiti with a real Ollama-backed `CrossEncoderClient`, so that hybrid search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (`RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY`) with Ollama-aligned defaults; an explicit `RERANKER_PROVIDER=none` preserves the passthrough for CI and slim containers.
+
+**Users**: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.
+
+**Impact**: Adds one new module under `backend/app/services/`, four `Config` attributes, a small selection branch in `_get_graphiti()`, and documentation in `.env.example`, `CLAUDE.md`, `README.md`. No data schema, no API, no UI changes. Behavior under `RERANKER_PROVIDER=none` is identical to today.
+
+### Goals
+- Default Ollama-backed reranker producing one `(passage, score)` tuple per input passage, sorted descending by score.
+- Env-driven configuration with sensible Ollama defaults inherited from existing `EMBEDDING_*` settings.
+- Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
+- Documentation parity with `EMBEDDING_*` knobs in `.env.example`, `CLAUDE.md`, and `README.md`.
+
+### Non-Goals
+- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
+- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
+- Upstream contributions to `graphiti-core`.
+- Adding a `sentence-transformers` or other non-`openai` reranker dependency.
+
+## Boundary Commitments
+
+### This Spec Owns
+- The Ollama reranker implementation and its prompt/parse logic.
+- The `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY` settings and their defaults.
+- The branch in `_get_graphiti()` that selects between the Ollama reranker and the passthrough.
+- The startup INFO log line that announces the selected reranker.
+- Documentation entries in `.env.example`, `CLAUDE.md` "Required Environment Variables", and `README.md` Ollama prerequisites.
+
+### Out of Boundary
+- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
+- Per-passage retrieval (still owned by `_GraphNamespace.search` and Graphiti).
+- The `group_id` scoping rules.
+- Any change to the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) — they receive reranked output transparently.
+- Implementation of additional reranker providers; this design covers only `ollama` and `none`.
+
+### Allowed Dependencies
+- Upstream library: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0).
+- In-repo: `Config` (`backend/app/config.py`), `get_logger` (`backend/app/utils/logger.py`), `openai.AsyncOpenAI` (already installed).
+- Existing factory: `_get_graphiti()` continues to be the singleton chokepoint.
+
+### Revalidation Triggers
+- If `graphiti-core` changes the `CrossEncoderClient.rank` signature, this design must be revisited.
+- If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in `research.md`).
+- If `Config.GRAPHITI_LLM_PROVIDER` semantics change in a way that re-couples LLM and reranker, this design must be checked.
+
+## Architecture
+
+### Existing Architecture Analysis
+- `_get_graphiti()` already injects an explicit `cross_encoder=_PassthroughReranker()` (line 156). The pattern of double-checked-locking singleton with provider switch (`GRAPHITI_LLM_PROVIDER`) is mature and must be preserved.
+- The persistent event loop (`_get_loop`, `_run`) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does **not** need to schedule work onto `_get_loop()`.
+- All four ReportAgent tools call `_GraphNamespace.search`, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.
+
+### Architecture Pattern & Boundary Map
+
+```mermaid
+graph LR
+    subgraph Config
+        EnvVars[RERANKER_*\nenv vars]
+        ConfigCls[Config attributes]
+        EnvVars --> ConfigCls
+    end
+
+    subgraph Adapter
+        Factory[_get_graphiti]
+        Passthrough[_PassthroughReranker]
+        OllamaCls[OllamaReranker]
+        Factory -->|provider=none| Passthrough
+        Factory -->|provider=ollama| OllamaCls
+    end
+
+    subgraph Graphiti
+        GraphitiCore[Graphiti instance]
+        Search[_GraphNamespace.search]
+        Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
+    end
+
+    ConfigCls --> Factory
+    Passthrough -->|injected as cross_encoder| GraphitiCore
+    OllamaCls -->|injected as cross_encoder| GraphitiCore
+    GraphitiCore --> Search
+    Search --> Tools
+
+    OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]
+```
+
+**Architecture Integration**:
+- **Selected pattern**: Strategy pattern with two implementations selected at factory time. Same shape as the existing `GRAPHITI_LLM_PROVIDER` branch.
+- **Domain/feature boundaries**: Reranker construction and prompt/parse live in `ollama_reranker.py`. Wiring lives in `graphiti_adapter.py`. Config lives in `config.py`. No overlap.
+- **Existing patterns preserved**: Double-checked-locking singleton; explicit `cross_encoder` injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; `Config` reads via `os.environ.get(..., default)`.
+- **New components rationale**: `OllamaReranker` is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
+- **Steering compliance**: Single OpenAI-SDK convention preserved; per-project `group_id` scoping unaffected; no new dependency.
+
+### Technology Stack
+
+| Layer | Choice / Version | Role in Feature | Notes |
+|-------|------------------|-----------------|-------|
+| Backend / Services | Python ≥3.11, async via `asyncio` | Hosts the new reranker class. | Inherits project minimum. |
+| LLM client | `openai` SDK (already pinned, v2.x) | `AsyncOpenAI` chat completions against Ollama's `/v1`. | No new dependency. |
+| Model | Ollama-served chat model, default `qwen2.5:3b` | Produces a numeric relevance score per passage. | Operator may override via `RERANKER_MODEL`. |
+| Endpoint | Ollama's OpenAI-compatible `/v1` | Default `http://localhost:11434/v1`. | Reuses `EMBEDDING_BASE_URL` semantics. |
+| Graph layer | `graphiti-core ≥ 0.3` | Consumes the new `CrossEncoderClient`. | No upstream change. |
+
+## File Structure Plan
+
+### Directory Structure
+```
+backend/app/
+├── services/
+│   ├── graphiti_adapter.py        # MODIFIED — factory branches on RERANKER_PROVIDER
+│   └── ollama_reranker.py         # NEW — OllamaReranker(CrossEncoderClient)
+├── config.py                      # MODIFIED — adds RERANKER_* attrs
+└── utils/
+    └── logger.py                  # unchanged
+
+repo-root/
+├── .env.example                   # MODIFIED — adds RERANKER_* block
+├── CLAUDE.md                      # MODIFIED — Required Environment Variables
+└── README.md                      # MODIFIED — Ollama prerequisites note
+```
+
+### Modified Files
+- `backend/app/services/graphiti_adapter.py` — Add small branch in `_get_graphiti()` that picks `OllamaReranker()` or `_PassthroughReranker()` based on `Config.RERANKER_PROVIDER`. Log the selection at INFO. `_PassthroughReranker` class is unchanged.
+- `backend/app/config.py` — Add four new class attributes with documented defaults. No change to existing `validate()` (reranker has no mandatory key).
+- `.env.example` — Add a four-line `RERANKER_*` block with comments mirroring the `EMBEDDING_*` style.
+- `CLAUDE.md` — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
+- `README.md` — Update the Ollama prerequisite section to mention `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large`.
+
+> `_PassthroughReranker` stays in `graphiti_adapter.py` (unchanged contract); only the wiring around it changes.
+
+## System Flows
+
+```mermaid
+sequenceDiagram
+    participant Search as _GraphNamespace.search
+    participant Graphiti as graphiti-core
+    participant Reranker as OllamaReranker.rank
+    participant Ollama as Ollama /v1/chat/completions
+
+    Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
+    Graphiti->>Graphiti: hybrid retrieval (RRF)
+    Graphiti->>Reranker: rank(query, [p1..pN])
+    par per-passage scoring
+        Reranker->>Ollama: chat.completions(prompt p1, temp=0)
+        Reranker->>Ollama: chat.completions(prompt p2, temp=0)
+        Reranker->>Ollama: chat.completions(prompt pN, temp=0)
+    end
+    alt all scores parsed
+        Reranker-->>Graphiti: sorted [(p, score), ...]
+    else any failure
+        Reranker->>Reranker: log WARNING, return passthrough order
+        Reranker-->>Graphiti: original order with synthetic scores
+    end
+    Graphiti-->>Search: ranked edges/nodes
+    Search-->>Tools: ranked results
+```
+
+**Decision points after diagram**:
+- `temperature=0.0` makes the score deterministic per (query, passage, model) tuple.
+- Per-passage failures (one bad parse out of N) downrank that passage to `0.0 - 0.001 * index` and continue; only whole-call exceptions degrade to passthrough.
+- The reranker never raises; this isolates Graphiti from upstream noise even when `_GraphNamespace.search`'s existing exception swallow is removed in a future refactor.
+
+## Requirements Traceability
+
+| Requirement | Summary | Components | Interfaces | Flows |
+|-------------|---------|------------|------------|-------|
+| 1.1 | Default reranker is Ollama-backed | `_get_graphiti()`, `OllamaReranker` | Inline factory branch | Adapter init |
+| 1.2 | No dependency on `OpenAIRerankerClient` | `_get_graphiti()` | Explicit `cross_encoder=` injection (unchanged behavior) | — |
+| 1.3 | Unset → defaults to `ollama` | `Config.RERANKER_PROVIDER` | `os.environ.get('RERANKER_PROVIDER', 'ollama')` | — |
+| 1.4 | No `gpt-4.1-nano` reference | All new files | — | — |
+| 2.1 | Subclass `CrossEncoderClient.rank` | `OllamaReranker` | `async rank(query, passages) -> list[tuple[str, float]]` | Per-passage scoring |
+| 2.2 | Uses `openai.AsyncOpenAI` | `OllamaReranker.__init__` | `AsyncOpenAI(base_url, api_key)` | — |
+| 2.3 | Returns passages sorted descending | `OllamaReranker.rank` | Postcondition: descending by score | — |
+| 2.4 | Empty input → empty output, no model call | `OllamaReranker.rank` | Guard at method entry | — |
+| 2.5 | Preserves passage strings byte-for-byte | `OllamaReranker.rank` | Strings are echoed, never rewritten | — |
+| 2.6 | Unparseable score → deterministic low fallback | `OllamaReranker.rank` | Internal `_parse_score` helper | Failure branch |
+| 3.1 | `RERANKER_PROVIDER` env knob | `Config` | Class attr, default `ollama`, validated `{ollama, none}` | Adapter init |
+| 3.2 | `RERANKER_MODEL` env knob | `Config` | Class attr, default `qwen2.5:3b` | — |
+| 3.3 | `RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL` | `Config` | Class attr resolves at read time | — |
+| 3.4 | `RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY` | `Config` | Class attr | — |
+| 3.5 | Unknown value → `ValueError` | `_get_graphiti()` | `_ALLOWED_RERANKER_PROVIDERS` validation | Adapter init |
+| 3.6 | Reads via `os.environ.get` only | `Config` | — | — |
+| 4.1 | `none` keeps `_PassthroughReranker` | `_get_graphiti()` | Factory branch | Adapter init |
+| 4.2 | Graph search remains functional under `none` | `_PassthroughReranker.rank` (unchanged) | — | — |
+| 4.3 | INFO log announces selected provider | `_get_graphiti()` | `logger.info` line | Adapter init |
+| 5.1 | WARNING log on rerank failure | `OllamaReranker.rank` | `logger.warning` with model + error class | Failure branch |
+| 5.2 | No exception propagation to HTTP callers | `OllamaReranker.rank` (never raises) | — | — |
+| 5.3 | Original order on whole-call failure | `OllamaReranker.rank` | Passthrough fallback inside method | Failure branch |
+| 5.4 | `__init__` never raises | `OllamaReranker.__init__` | `AsyncOpenAI()` lazy I/O | Adapter init |
+| 6.1 | `.env.example` documents the four vars | `.env.example` | — | — |
+| 6.2 | `CLAUDE.md` lists the four vars | `CLAUDE.md` | — | — |
+| 6.3 | `README.md` mentions `ollama pull <model>` | `README.md` | — | — |
+| 6.4 | Old "follow-up" claim updated | `graphiti-neo4j-finalize/research.md` (or design.md) | — | — |
+| 7.1 | Reranked order reaches `_GraphNamespace.search` | `OllamaReranker`, `_get_graphiti()` | Through Graphiti's own `search()` | End-to-end |
+| 7.2 | No changes to report tools | n/a | n/a | — |
+| 7.3 | `group_id` scoping unchanged | `_GraphNamespace.search` (unchanged) | — | — |
+
+## Components and Interfaces
+
+| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
+|-----------|--------------|--------|--------------|--------------------------|-----------|
+| `OllamaReranker` | Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1) | Service |
+| `Config` (extended) | Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.1–3.6, 4.1 | `os.environ.get` (P0) | State (configuration) |
+| `_get_graphiti()` (extended) | Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | `Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0) | Service |
+| `.env.example`, `CLAUDE.md`, `README.md` | Docs | Communicate new knobs and Ollama prerequisite. | 6.1–6.4 | — | — |
+
+---
+
+### Backend / Services
+
+#### `OllamaReranker`
+
+| Field | Detail |
+|-------|--------|
+| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. |
+| Requirements | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 |
+
+**Responsibilities & Constraints**
+- Subclass `graphiti_core.cross_encoder.client.CrossEncoderClient`; implement only `rank`.
+- Use `openai.AsyncOpenAI`; no second SDK; no top-level network I/O in `__init__`.
+- Preserve passage strings byte-for-byte; never rewrite or truncate.
+- Never raise from `rank()`. On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
+- Deterministic scoring: `temperature=0.0`, no randomness in fallback scores.
+- Thread-safety: stateless beyond the immutable `AsyncOpenAI` client and string config; safe under Graphiti's concurrent search.
+
+**Dependencies**
+- Inbound: `_get_graphiti()` — instantiates a single instance and passes it as `cross_encoder=` to `Graphiti(...)` (P0).
+- Outbound: `Ollama /v1/chat/completions` via `openai.AsyncOpenAI` (P0).
+- External: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai` SDK (P0).
+
+**Contracts**: Service [x]
+
+##### Service Interface
+
+```python
+class OllamaReranker(CrossEncoderClient):
+    def __init__(
+        self,
+        *,
+        model: str,
+        base_url: str,
+        api_key: str,
+    ) -> None: ...
+
+    async def rank(
+        self,
+        query: str,
+        passages: list[str],
+    ) -> list[tuple[str, float]]:
+        """
+        Score each passage's relevance to `query` and return
+        `(passage, score)` tuples sorted in descending order of score.
+
+        Preconditions:
+            - `passages` is a (possibly empty) list of strings.
+
+        Postconditions:
+            - len(return) == len(passages).
+            - return is sorted by score descending.
+            - For all i, return[i][0] is byte-identical to one of the inputs.
+            - For any rank() call, this method does not raise.
+
+        Invariants:
+            - Successfully-parsed scores fall in [0.0, 1.0].
+            - Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
+              and are strictly less than every successfully-parsed score.
+        """
+```
+
+**Implementation Notes**
+- **Integration**: Constructed inside `_get_graphiti()` when `Config.RERANKER_PROVIDER == "ollama"`; injected into `Graphiti(..., cross_encoder=...)`.
+- **Validation**:
+  - Reject empty `passages` immediately with `return []`.
+  - Clip parsed `score` to `[0.0, 1.0]`.
+  - Treat any uncaught per-passage exception as parse failure and assign deterministic fallback `-0.001 * passage_index`.
+  - Treat any whole-call exception (e.g. connection refused) as graceful degrade: return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]`.
+- **Risks**: Default `qwen2.5:3b` must be `ollama pull`-ed by operators; documented in README. If absent, R5 path kicks in.
+
+---
+
+### Backend / Config
+
+#### `Config` (extended)
+
+| Field | Detail |
+|-------|--------|
+| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. |
+| Requirements | 1.3, 3.1–3.6, 4.1 |
+
+**Responsibilities & Constraints**
+- Read from `os.environ.get` only; no new dependency.
+- `RERANKER_PROVIDER` default `ollama`; valid values: `ollama`, `none`.
+- `RERANKER_MODEL` default `qwen2.5:3b`.
+- `RERANKER_BASE_URL` default = `EMBEDDING_BASE_URL` value at module load time.
+- `RERANKER_API_KEY` default = `EMBEDDING_API_KEY` value at module load time.
+- Validation of `RERANKER_PROVIDER` happens in `_get_graphiti()` (not `Config.validate()`) to keep the validate-at-boot list focused on credential presence.
+
+**Contracts**: State [x]
+
+##### State Management
+- **State model**: Read-only class attributes resolved once at import.
+- **Persistence & consistency**: None; values come from environment.
+- **Concurrency strategy**: Immutable after import; safe.
+
+**Implementation Notes**
+- **Integration**: Defaults for `RERANKER_BASE_URL` / `RERANKER_API_KEY` should reference the corresponding `EMBEDDING_*` env vars (not the resolved `Config.EMBEDDING_BASE_URL` constant) so an operator setting only `EMBEDDING_BASE_URL` still gets the reranker pointed at the same Ollama host without needing to set `RERANKER_BASE_URL` explicitly. Implementation reads `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`.
+- **Validation**: None at config-load time. Provider value is validated by `_get_graphiti()`.
+- **Risks**: An operator who overrides `EMBEDDING_BASE_URL` but not `RERANKER_BASE_URL` will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.
+
+---
+
+### Backend / Adapter
+
+#### `_get_graphiti()` (extended)
+
+| Field | Detail |
+|-------|--------|
+| Intent | Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice. |
+| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 |
+
+**Responsibilities & Constraints**
+- Preserve double-checked locking and singleton semantics exactly.
+- Read `Config.RERANKER_PROVIDER` once at construction; do not re-read.
+- For `ollama`: construct `OllamaReranker(model=..., base_url=..., api_key=...)`.
+- For `none`: construct `_PassthroughReranker()` (current behavior preserved).
+- For any other value: raise `ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")` — mirrors the existing `_ALLOWED_GRAPHITI_PROVIDERS` validation pattern.
+- Log at INFO once: `f"Initializing Graphiti reranker (provider={provider})..."`.
+
+**Contracts**: Service [x]
+
+##### Service Interface
+
+```python
+def _get_graphiti() -> Graphiti:
+    """Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""
+```
+
+**Implementation Notes**
+- **Integration**: Replaces the unconditional `cross_encoder=_PassthroughReranker()` at `graphiti_adapter.py:156` with a `cross_encoder=_build_reranker(provider)` call. The factory helper lives next to `_build_llm_and_embedder` in the same file.
+- **Validation**: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
+- **Risks**: A typo such as `RERANKER_PROVIDER=Ollama` (capitalized) would raise; the helper lowercases the value before comparison, matching `_get_graphiti`'s existing `(... or "openai").lower()` pattern.
+
+---
+
+### Documentation
+
+| File | Change | Requirements |
+|------|--------|--------------|
+| `.env.example` | Add commented block with the four `RERANKER_*` vars and their defaults. Position adjacent to the existing `EMBEDDING_*` block. | 6.1 |
+| `CLAUDE.md` | Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`. | 6.2 |
+| `README.md` | In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments. | 6.3 |
+| `.kiro/specs/graphiti-neo4j-finalize/research.md` | Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 |
+
+> README also has `README-EN.md` and `README-ZH.md` — the canonical user-facing README is `README.md` per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.
+
+## Data Models
+Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the `list[tuple[str, float]]` already defined by `CrossEncoderClient.rank`.
+
+## Error Handling
+
+### Error Strategy
+- **Construction errors**: None possible (no network in `__init__`; no required keys to validate).
+- **Per-passage errors**: Caught inside `OllamaReranker.rank`. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
+- **Whole-call errors** (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost `try/except` in `rank`. Logged at WARNING with model name and error class. Returns `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` — same shape as `_PassthroughReranker` so consumers cannot tell the difference structurally.
+- **Configuration errors**: `_get_graphiti()` raises `ValueError` at startup if `RERANKER_PROVIDER` is unknown. The Flask app fails to boot — preferred over silent misconfiguration.
+
+### Error Categories and Responses
+| Category | Trigger | Response |
+|----------|---------|----------|
+| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. |
+| User input (4xx-equivalent) | Unknown `RERANKER_PROVIDER` value | `ValueError` at startup; clear message naming allowed values. |
+| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. |
+
+### Monitoring
+- INFO log at startup states the selected provider.
+- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
+- No metrics surface yet; can be added if the reranker becomes a hot path.
+
+## Testing Strategy
+
+This project intentionally keeps the test surface minimal (`backend/scripts/test_profile_format.py` is the lone pytest target). Per `steering/tech.md`, do **not** add a heavy test harness.
+
+- **Unit-level verification** (manual, by the implementer, no committed test files unless small and clearly worth keeping):
+  1. Constructing `OllamaReranker` with a bad host does not raise; first `rank()` call logs WARNING and returns passthrough output.
+  2. `rank(query, [])` returns `[]` and does not call the client.
+  3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
+  4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
+- **Integration smoke** (manual): With `qwen2.5:3b` pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. `RERANKER_PROVIDER=none`.
+- **Boundary verification**: Grep that `gpt-4.1-nano` and `OpenAIRerankerClient` do not appear in any new code path.
+
+## Supporting References
+- `research.md` — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
+- `gap-analysis.md` — Requirement-to-asset map.
+- `.ticket/39.md` — Source ticket text.
--- a/.kiro/specs/graphiti-ollama-reranker/gap-analysis.md
+++ b/.kiro/specs/graphiti-ollama-reranker/gap-analysis.md
@ -0,0 +1,111 @@
+# Implementation Gap Analysis — graphiti-ollama-reranker
+
+## 1. Current State Investigation
+
+### Domain Assets
+
+| Asset | Location | Current behavior |
+|-------|----------|------------------|
+| `_PassthroughReranker` | `backend/app/services/graphiti_adapter.py:38-51` | Subclass of `graphiti_core.cross_encoder.client.CrossEncoderClient`. `rank(query, passages)` returns `(passage, 1.0 - 0.01 * i)` tuples in input order — no model call. |
+| Graphiti factory | `backend/app/services/graphiti_adapter.py:142-162` (`_get_graphiti`) | Double-checked-locking singleton. Branches on `Config.GRAPHITI_LLM_PROVIDER` (`openai` / `gemini`). Always injects `_PassthroughReranker()` as `cross_encoder`. Runs `g.build_indices_and_constraints()` on the persistent event loop. |
+| LLM/embedder builder | `backend/app/services/graphiti_adapter.py:92-139` (`_build_llm_and_embedder`) | Lazy-imports provider-specific Graphiti classes. Reads `Config.LLM_*` and `Config.EMBEDDING_*`. |
+| Config surface | `backend/app/config.py:33-53` | Single class with class attrs; each is `os.environ.get('KEY', 'default')`. Has `EMBEDDING_MODEL`, `EMBEDDING_BASE_URL`, `EMBEDDING_API_KEY` defaults aligned with local Ollama. |
+| Graph-search callers | `_GraphNamespace.search` at `graphiti_adapter.py:488-517`; consumed by `zep_tools.py:491` (`ZepToolsService.search_graph`) and `oasis_profile_generator.py:313, 337`. | All call sites already dropped the misleading `reranker=` kwarg in `graphiti-neo4j-finalize`. They invoke `client.graph.search(graph_id, query, limit, scope)` only. |
+| Existing LLM wrapper | `backend/app/utils/llm_client.py` | Uses synchronous `OpenAI()` client. Includes reasoning-model `<think>` stripping and a JSON-mode retry. Not directly relevant to the reranker but documents the in-house OpenAI-SDK pattern. |
+| Async-loop helper | `graphiti_adapter.py:54-79` (`_get_loop`, `_run`) | Persistent dedicated event-loop thread used for all Graphiti async calls. The reranker's `rank` is **already** awaited by Graphiti itself, not by `_run`, so the new client can use plain `await` on `openai.AsyncOpenAI`. |
+
+### Conventions Observed
+
+- 4-space indent, snake_case, double quotes; English + Chinese mixed in comments — preserve both styles.
+- New env vars go into `backend/app/config.py` as class attrs reading from `os.environ.get` with a sensible default. Validation is centralized in `Config.validate()`.
+- New backend modules live under `backend/app/services/` with module-level `logger = get_logger('mirofish.<topic>')`.
+- The OpenAI SDK is the only LLM client. New providers do not add a second SDK — they add a base-URL + model knob.
+- No tests for graph code beyond `scripts/test_profile_format.py`; the project explicitly discourages adding a heavy test harness.
+
+### Integration Surfaces
+
+- **Upstream contract**: `CrossEncoderClient` is consumed by `graphiti_core` during `Graphiti.search()` execution; the framework calls `await reranker.rank(query, passages)` on whatever event loop the caller is using.
+- **Inbound integration**: only one wire point — the `cross_encoder=` kwarg on `Graphiti(...)` in `_get_graphiti()` (`graphiti_adapter.py:156`).
+- **Outbound integration**: the reranker calls Ollama via `http://localhost:11434/v1/chat/completions` (OpenAI-compatible). Already proven by `EMBEDDING_BASE_URL` for embeddings; Ollama's chat endpoint follows the same surface.
+
+## 2. Requirements Feasibility Analysis
+
+### Requirement-to-Asset Map
+
+| Requirement | Existing assets | New assets needed | Gap tag |
+|-------------|-----------------|-------------------|---------|
+| R1: Default is Ollama, not OpenAI default | `_get_graphiti()` already injects an explicit reranker (no default fallthrough). | Switch the injected client class based on `RERANKER_PROVIDER`. | Missing (selection logic). |
+| R2: Real `CrossEncoderClient` calling Ollama via OpenAI SDK | Pattern proven in `llm_client.py`; `openai` already in `pyproject.toml`. | New `OllamaReranker` class — subclass of `CrossEncoderClient`, uses `openai.AsyncOpenAI` for `rank()`. | Missing. |
+| R3: Env knobs (`RERANKER_PROVIDER/MODEL/BASE_URL/API_KEY`) | Config pattern is established (`EMBEDDING_*` etc.). | Four new `Config` attrs, with defaults falling back to embedding settings where stated. | Missing. |
+| R4: `none` provider preserves passthrough | `_PassthroughReranker` already exists. | Branch in `_get_graphiti()` to pick passthrough when provider == `none`. | Missing (small). |
+| R5: Graceful degradation when Ollama is down | `_GraphNamespace.search` (lines 515-517) already catches all exceptions and returns empty results with a warning log. | Reranker `rank` must catch its own network/parse errors, log them, and return the original passages with synthetic scores so search still returns *something*. | Missing (within new class). |
+| R6: Docs (`.env.example`, `CLAUDE.md`, README) | Existing docs already document `EMBEDDING_*` in three places — pattern is clear. | Add 4 new env lines + Ollama pull note. | Missing (text). |
+| R7: Report tools get reranked output transparently | `_GraphNamespace.search` is the single chokepoint already used by all 4 tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`). | None — wiring change in factory propagates automatically. | None (verification only). |
+
+### Constraints
+
+- **Async contract**: `CrossEncoderClient.rank` is `async def`. The new client must be async. The OpenAI SDK provides `openai.AsyncOpenAI` for this.
+- **Ollama model output shape**: A small chat model (`qwen2.5:3b`, `llama3.2:3b`) can be prompted to emit a numeric score; we cannot rely on `logprobs` because Ollama's OpenAI-compatible surface does not always expose `logprobs`/`logit_bias` consistently. Therefore the scoring strategy is "ask the model for a 0–10 (or 0–1) relevance score per passage and parse it from the text response."
+- **No new dependency** allowed. Reranker must reuse `openai` SDK (already installed) — confirmed in `backend/.venv/lib/python3.13/site-packages/openai-2.35.1.dist-info/`.
+- **Boot must not fail** when Ollama is unreachable (R5.4). Construction is cheap (build an `AsyncOpenAI` client; no network call). The model availability check happens lazily on first `rank()`.
+
+### Complexity Signals
+
+- Mostly a **single file plus config plus docs** change. Algorithmic logic is local to the new class (prompt + parse). No data model changes, no API surface changes, no UI changes.
+
+### Research Needed (Carry into Design)
+
+- **Model choice**: pick a small Ollama chat model that (a) is widely pulled, (b) reliably emits a numeric score in a 1–2 token answer, (c) is small enough to run on a typical dev machine. Candidates: `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Design phase will fix the default.
+- **Scoring strategy**: per-passage call (N calls per query, simple to parse) vs. batched single-call (one prompt with all passages, harder to align). The per-passage approach is simpler and parallelizable via `asyncio.gather`; latency is bounded by the slowest passage. Design will fix the strategy.
+- **Output parsing**: prefer JSON output (`{"score": 0.83}`) with markdown-fence stripping (project convention from `llm_client.chat_json`); fall back to regex-extract first float on parse failure.
+
+## 3. Implementation Approach Options
+
+### Option A — Extend `graphiti_adapter.py` In Place
+Add the `OllamaReranker` class directly to `graphiti_adapter.py` next to `_PassthroughReranker`, and branch in `_get_graphiti()`.
+
+- **Trade-offs**:
+  - ✅ Same module owns all reranker wiring and the singleton; one file to read.
+  - ✅ Smallest diff; matches the file's existing role as "everything Graphiti".
+  - ❌ Adds prompt/parse logic to an already long (≈545-line) adapter module.
+  - ❌ Harder to reuse the reranker outside Graphiti (unlikely, but precludes it).
+
+### Option B — Separate Module `backend/app/services/ollama_reranker.py`
+New module owns the class and its prompt/parse helpers; `graphiti_adapter.py` imports it and selects it in `_get_graphiti()`.
+
+- **Trade-offs**:
+  - ✅ Clear single-responsibility module; mirrors the structure suggested in the source ticket #39.
+  - ✅ Adapter file stays focused on wiring; reranker can be unit-tested in isolation if testing is later added.
+  - ❌ Slightly more navigation; one extra file in `services/`.
+  - ❌ The provider-selection branch still lives in the adapter, so two files must agree on the provider string.
+
+### Option C — Hybrid: Provider Registry
+Introduce a small `_RERANKER_PROVIDERS` map (`"ollama" -> _build_ollama_reranker`, `"none" -> _PassthroughReranker`) inside `graphiti_adapter.py`, with the actual class still living in a separate `ollama_reranker.py`.
+
+- **Trade-offs**:
+  - ✅ Adding a future provider (e.g. `sentence_transformers`) is a one-line registry change.
+  - ✅ Keeps reranker class out of the adapter.
+  - ❌ Slight over-engineering for two providers (`ollama` + `none`); ticket #39 explicitly scopes only the Ollama path.
+
+## 4. Implementation Complexity & Risk
+
+- **Effort**: **S (1–3 days)**
+  - One new class (~80–120 lines), four new config attrs (~10 lines), one factory branch (~10 lines), three doc updates (~30 lines). No schema or API changes.
+- **Risk**: **Low**
+  - Established patterns (config, OpenAI SDK, logger).
+  - `_PassthroughReranker` is preserved exactly for the `none` fallback, so the worst-case behavior is identical to today.
+  - The graceful-failure path (R5) requires care, but the existing `_GraphNamespace.search` exception handling already insulates HTTP callers from reranker errors.
+
+## 5. Recommendations for Design Phase
+
+- **Preferred approach**: **Option B (separate `ollama_reranker.py` module)**. Best alignment with #39's "implement in `backend/app/services/`", keeps `graphiti_adapter.py` focused on Graphiti wiring, and matches the project's "one concern per module" pattern in `services/`.
+- **Key decisions to lock in design**:
+  1. Default `RERANKER_MODEL` value (recommend `qwen2.5:3b` — small, broadly available on Ollama, reliable at structured short outputs).
+  2. Per-passage scoring strategy with `asyncio.gather` parallelism (simpler, deterministic).
+  3. Prompt + parse format: ask for JSON `{"score": <0.0..1.0>}`, strip fences, regex-fallback to first float.
+  4. Failure mode for a single passage: assign deterministic low score (e.g. `0.0 - 0.001 * i`) so passage still appears once.
+  5. Failure mode for whole `rank()` call: log warning, return original-order tuples with passthrough scores (no exception bubbles up).
+  6. Update `.kiro/specs/graphiti-neo4j-finalize/research.md` "follow-up" note to point at this spec (R6.4).
+- **Research items carried forward**:
+  - Confirm `qwen2.5:3b` produces stable JSON scores in benchmark prompts (or pick alternative).
+  - Decide whether to expose `RERANKER_MAX_PARALLEL` for concurrency limit (default `len(passages)` — likely small, ≤10).
--- a/.kiro/specs/graphiti-ollama-reranker/requirements.md
+++ b/.kiro/specs/graphiti-ollama-reranker/requirements.md
@ -0,0 +1,95 @@
+# Requirements Document
+
+## Project Description (Input)
+Replace the no-op `_PassthroughReranker` in `backend/app/services/graphiti_adapter.py` with a real reranker that uses an Ollama-available model, so Graphiti search results are properly reranked for the SearchResult / InsightForge / Panorama / Interview report tools. Add `RERANKER_PROVIDER` / `RERANKER_MODEL` / `RERANKER_BASE_URL` env knobs (defaults: ollama / a small Ollama chat model / EMBEDDING_BASE_URL), keep `_PassthroughReranker` only when `RERANKER_PROVIDER=none`, and update `.env.example`, `CLAUDE.md`, and the README accordingly. Source ticket: #39 (.ticket/39.md).
+
+## Introduction
+
+The Graphiti adapter currently injects a `_PassthroughReranker` into the `Graphiti(...)` constructor to bypass the upstream default (`OpenAIRerankerClient` with a hard-coded `gpt-4.1-nano` and OpenAI-specific `logprobs`/`logit_bias`), which would 401 against Qwen/Dashscope keys and is unavailable through Ollama. The passthrough is a no-op: it returns passages in original order with synthetic descending scores, so search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are not actually reranked.
+
+This feature replaces the no-op with a real reranker backed by a model available through the local Ollama stack (matching the existing `EMBEDDING_MODEL=mxbai-embed-large` precedent). A small set of environment variables makes the provider, model, and endpoint overridable. An explicit `none` provider preserves the passthrough behavior for CI / lightweight setups that cannot pull the reranker model.
+
+## Boundary Context
+
+- **In scope**:
+  - A new `CrossEncoderClient` implementation in `backend/app/services/` that scores passages against a query by calling an Ollama model through its OpenAI-compatible endpoint.
+  - New `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY` settings in `backend/app/config.py`, with sensible Ollama defaults.
+  - Provider selection inside `_get_graphiti()` so `ollama` selects the new client and `none` keeps `_PassthroughReranker`.
+  - Documentation updates in `.env.example`, `CLAUDE.md` (Required Environment Variables), and the project `README.md` (Ollama prerequisites).
+  - Graceful failure when the configured reranker model is not pulled (clear error, no Flask crash; graph search either falls back to original order or surfaces a logged warning consistent with the existing `_GraphNamespace.search` exception path).
+- **Out of scope**:
+  - Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
+  - Building OpenAI-only or Dashscope-only reranker clients; this spec is specifically the Ollama path (plus the `none` escape hatch).
+  - Upstream changes to `graphiti-core`.
+  - Adding any non-Python reranker library (e.g. `sentence-transformers`); the new client must reuse the OpenAI SDK already in the dependency set.
+- **Adjacent expectations**:
+  - `graphiti_adapter._get_graphiti()` continues to be the single Graphiti factory; the new reranker must be wired through it, not at call sites.
+  - All Graphiti reads remain scoped by `group_id` — the reranker operates on passages already filtered per project; it does not change isolation rules.
+  - The reranker integrates with `_GraphNamespace.search`, which is the path used by `SearchResult`, `InsightForge`, `Panorama`, and `Interview` tools; behavior changes propagate to those tools automatically and do not need per-tool code changes.
+
+## Requirements
+
+### Requirement 1: Default reranker is Ollama-backed, not the OpenAI default
+**Objective:** As a backend developer running MiroFish against the default local Ollama stack, I want Graphiti to rerank search results without requiring an OpenAI key, so that report-tool relevance reflects a real model and not an arbitrary insertion order.
+
+#### Acceptance Criteria
+1. The Graphiti Adapter shall instantiate Graphiti with a non-passthrough `CrossEncoderClient` whenever `RERANKER_PROVIDER` resolves to `ollama` (the default).
+2. The Graphiti Adapter shall not depend on `graphiti_core.cross_encoder.openai_reranker_client.OpenAIRerankerClient` for the default code path.
+3. When `RERANKER_PROVIDER` is unset, the Graphiti Adapter shall behave as if `RERANKER_PROVIDER=ollama`.
+4. The Graphiti Adapter shall not reference the model name `gpt-4.1-nano` in any reranker code path.
+
+### Requirement 2: Ollama-backed reranker scores passages via an OpenAI-compatible chat endpoint
+**Objective:** As a backend developer, I want a reranker that talks to a locally hosted model so that the local-first stack stays self-contained and no remote LLM key is required.
+
+#### Acceptance Criteria
+1. The Ollama Reranker shall expose a class that subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements the asynchronous `rank(query, passages) -> list[tuple[passage, score]]` contract.
+2. The Ollama Reranker shall call its configured chat-completions endpoint through the `openai` SDK using `RERANKER_BASE_URL` and `RERANKER_API_KEY`, so no second SDK is introduced.
+3. The Ollama Reranker shall return passages sorted by descending score (highest relevance first) with one score per input passage.
+4. When `passages` is empty, the Ollama Reranker shall return an empty list without issuing any model call.
+5. The Ollama Reranker shall preserve passage strings byte-for-byte; it shall not rewrite, truncate, or reorder content within an individual passage.
+6. If the model response cannot be parsed into a numeric score for a passage, the Ollama Reranker shall assign that passage a deterministic fallback score lower than every successfully-parsed score so the passage still appears in the output exactly once.
+
+### Requirement 3: Reranker is configurable via environment variables
+**Objective:** As an operator deploying MiroFish, I want to override the reranker provider, model, and endpoint via environment variables so that I can target a different Ollama host, a different model, or disable reranking entirely.
+
+#### Acceptance Criteria
+1. The Configuration module shall expose `RERANKER_PROVIDER` with default `ollama` and accept the values `ollama` and `none`.
+2. The Configuration module shall expose `RERANKER_MODEL` whose default is a small Ollama-available chat model selected during design (e.g. `qwen2.5:3b` or `llama3.2:3b`).
+3. The Configuration module shall expose `RERANKER_BASE_URL` whose default is the value of `EMBEDDING_BASE_URL` (so the same Ollama host is reused by default).
+4. The Configuration module shall expose `RERANKER_API_KEY` whose default is the value of `EMBEDDING_API_KEY` (so Ollama's ignored-token default `ollama` works without explicit configuration).
+5. If `RERANKER_PROVIDER` is set to a value other than `ollama` or `none`, the Graphiti Adapter shall raise a clear `ValueError` at startup naming the offending value and listing accepted values.
+6. The Configuration module shall read all four reranker variables from the process environment via the same `os.environ.get` pattern used by the surrounding settings, with no additional dependencies.
+
+### Requirement 4: `none` provider preserves the passthrough fallback for CI / lightweight setups
+**Objective:** As a developer running tests or a slim container that cannot pull the reranker model, I want to disable reranking explicitly so the Flask app still boots and graph search still works.
+
+#### Acceptance Criteria
+1. Where `RERANKER_PROVIDER=none`, the Graphiti Adapter shall continue to inject `_PassthroughReranker` and shall not attempt any model call at startup.
+2. While `RERANKER_PROVIDER=none`, graph search shall return results in the order Graphiti supplies them with the existing synthetic-descending-score behavior.
+3. The Graphiti Adapter shall log at INFO level the selected reranker provider during initialization so operators can confirm whether reranking is active.
+
+### Requirement 5: Graceful degradation when the configured Ollama model is unreachable
+**Objective:** As an operator who forgot to run `ollama pull <model>` (or whose Ollama service is down), I want the Flask backend to keep serving requests with a clear log signal rather than crashing.
+
+#### Acceptance Criteria
+1. If the Ollama Reranker fails to score passages for a given query (e.g. connection refused, 404 model not found, timeout, or unparseable response), the Graphiti Adapter shall log a warning that names the failing model and the error class.
+2. If the Ollama Reranker raises during a `rank` call, the calling `_GraphNamespace.search` shall not propagate the exception to HTTP callers; existing search-error handling already swallows reranker errors into a logged warning, and this behavior shall be preserved.
+3. When the Ollama Reranker fails for a query, the rerank-failure path shall return the passages in their original Graphiti order so search remains functional.
+4. The Ollama Reranker shall not raise during construction (i.e. `_get_graphiti()` must succeed even if the Ollama service is unavailable); failures are deferred until the first `rank` call.
+
+### Requirement 6: Documentation reflects the new reranker configuration
+**Objective:** As a new contributor reading the docs, I want the reranker env vars, defaults, and prerequisites documented in the same places the other LLM/embedder settings live so configuration is discoverable.
+
+#### Acceptance Criteria
+1. The Environment Example file (`.env.example`) shall include entries for `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY`, each commented with its default and accepted values.
+2. The CLAUDE.md document shall list the four reranker variables in its "Required Environment Variables" section with the same level of detail used for `EMBEDDING_MODEL`.
+3. The README.md document shall mention the `ollama pull <reranker model>` prerequisite alongside the existing `ollama pull mxbai-embed-large` note (or wherever Ollama setup is documented).
+4. Where the `.kiro/specs/graphiti-neo4j-finalize` documents state that the reranker is a passthrough no-op, those documents shall either be updated to point at this spec or left untouched (decided in design); the constraint is that no documentation shall continue to claim "a real per-provider reranker is a follow-up" once this spec is implemented.
+
+### Requirement 7: Report-tool integration verifies reranked output reaches consumers
+**Objective:** As a developer using the ReportAgent tools, I want `SearchResult`, `InsightForge`, `Panorama`, and `Interview` to receive properly reranked edges/nodes so their report output reflects model-judged relevance, not Graphiti's hybrid-search ordering alone.
+
+#### Acceptance Criteria
+1. When `RERANKER_PROVIDER=ollama` is active and the configured model is available, the `_GraphNamespace.search` shall return passages whose order is determined by the Ollama Reranker, not Graphiti's default RRF ordering.
+2. The ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) shall require no changes for this feature; the rerank improvement reaches them transparently through `_GraphNamespace.search`.
+3. While the Ollama Reranker is active, the per-project `group_id` scoping of all Graphiti queries shall remain unchanged.
--- a/.kiro/specs/graphiti-ollama-reranker/research.md
+++ b/.kiro/specs/graphiti-ollama-reranker/research.md
@ -0,0 +1,112 @@
+# Research & Design Decisions — graphiti-ollama-reranker
+
+## Summary
+- **Feature**: `graphiti-ollama-reranker`
+- **Discovery Scope**: Extension (one new service module + factory branch + config + docs).
+- **Key Findings**:
+  - `CrossEncoderClient.rank(query, passages) -> list[tuple[str, float]]` is the only abstract contract Graphiti requires of the reranker. The existing `_PassthroughReranker` already exercises this contract correctly.
+  - Ollama's OpenAI-compatible `/v1/chat/completions` endpoint does not reliably expose `logprobs` / `logit_bias`, so Graphiti's default OpenAI scoring approach (binary YES/NO over token logits) cannot be ported. The reranker must use **prompted numeric scoring** with text-output parsing.
+  - The `openai` SDK already shipped in `backend/.venv` (v2.35.1) exposes `AsyncOpenAI`, which is the right client for the async `rank()` method without introducing any new dependency.
+
+## Research Log
+
+### Graphiti's `CrossEncoderClient` contract
+- **Context**: Need to confirm the precise shape of the `rank` interface and any other abstract members.
+- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:38-51` (`_PassthroughReranker`); `.kiro/specs/graphiti-neo4j-finalize/research.md` and `gap-analysis.md` (which captured the upstream contract on first integration); ticket #39 narrative.
+- **Findings**:
+  - `_PassthroughReranker` subclasses `CrossEncoderClient` and only overrides `async def rank(query: str, passages: list[str]) -> list[tuple[str, float]]`.
+  - Graphiti's internal call site (`graphiti_core/graphiti.py:154`) constructs the reranker once and calls `rank` per search. There is no separate batch interface to satisfy.
+  - Passages are short text snippets (entity-edge facts / node summaries). Typical N per search ≤ 10 (limit defaulted in `_GraphNamespace.search`).
+- **Implications**: A drop-in subclass that implements `rank` is sufficient. No additional abstract methods to wire.
+
+### Ollama OpenAI-compatible scoring surface
+- **Context**: Decide how to obtain a relevance score per passage from a small Ollama-served chat model.
+- **Sources Consulted**: Project-internal `backend/app/utils/llm_client.py` (uses `openai.OpenAI` + `chat.completions.create` against Dashscope / OpenAI / Ollama uniformly); ticket #39 "Proposed approach" section enumerating Ollama chat-model scoring vs. embedding cosine.
+- **Findings**:
+  - Ollama supports `/v1/chat/completions` for chat models like `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Pulling a model is required (`ollama pull <model>`).
+  - JSON-mode (`response_format={"type": "json_object"}`) is honored by recent Ollama versions but not universally; project convention is to fall back gracefully (cf. `LLMClient.chat_json`).
+  - Embedding-cosine reranker is feasible (re-embed query and passages with `mxbai-embed-large`) but produces a weaker ordering signal than an LLM that can reason about the question. Picking LLM scoring matches the ticket's preferred path.
+- **Implications**:
+  - Use a chat-completion call per passage with a deterministic temperature (0.0) and a tight system prompt asking for a JSON score in [0.0, 1.0].
+  - Parse with the same defensive strategy used elsewhere: strip `<think>` blocks, strip markdown fences, attempt `json.loads`, regex-fallback to first float, deterministic low score on hard failure.
+
+### Concurrency strategy
+- **Context**: Decide between per-passage parallel calls vs. one batched call.
+- **Findings**:
+  - Per-passage with `asyncio.gather` is simpler to align outputs and resilient — a single bad output only loses one passage's score.
+  - Single batched prompt requires the model to emit aligned scores (often by index); LLMs occasionally drop entries or misorder them, demanding additional validation.
+  - With typical `limit ≤ 10`, parallel per-passage calls hit Ollama briefly; on a 3B model this is < 5s for 10 passages.
+- **Implications**: Default to per-passage `asyncio.gather`. Expose no extra concurrency knob initially (avoid premature configuration surface; YAGNI per project guidelines).
+
+### Failure semantics
+- **Context**: Required by R5 — Flask must keep serving on Ollama outage, and graph search should remain functional.
+- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:515-517` (`_GraphNamespace.search` swallows all exceptions and logs a warning); `_get_graphiti()` runs once at first call.
+- **Findings**:
+  - Construction of an `openai.AsyncOpenAI` client does not perform any network I/O. Therefore `OllamaReranker.__init__` can be safe at startup even when Ollama is down.
+  - If `rank()` itself raises, the upstream `Graphiti.search` may surface the exception. The new reranker should therefore catch its own errors and degrade to passthrough behavior in-method rather than relying on the outer `try/except` in `_GraphNamespace.search`.
+- **Implications**: `OllamaReranker.rank` should never raise. On exception or unparseable output it returns the input passages in the original order with passthrough-style synthetic scores and emits a single WARNING log per failure (rate-limited by intent: one log per rank() call).
+
+## Architecture Pattern Evaluation
+
+| Option | Description | Strengths | Risks / Limitations | Notes |
+|--------|-------------|-----------|---------------------|-------|
+| A: Add class to `graphiti_adapter.py` | Define `OllamaReranker` next to `_PassthroughReranker` in the same file. | Minimal diff; single file to read. | Bloats an already-long adapter; mixes wiring with provider-specific logic. | — |
+| B: New `services/ollama_reranker.py` module | Dedicated module owns prompt + parse + async client; adapter only selects it. | Single-responsibility module; matches ticket suggestion; reusable in isolation. | One extra import in adapter. | **Selected.** Aligns with project pattern of one concern per `services/*` file. |
+| C: Hybrid provider registry | Map `RERANKER_PROVIDER → builder` in adapter; class still in B's module. | Future providers are a one-line registry change. | Over-engineering for two providers (`ollama` + `none`). | Deferred until a third provider is needed. |
+
+## Design Decisions
+
+### Decision: Provider selected via env var, branch lives in `_get_graphiti()`
+- **Context**: R3 requires env-driven provider selection; only two values supported by this spec (`ollama` and `none`).
+- **Alternatives Considered**:
+  1. Function-pointer registry (Option C).
+  2. Inline `if/else` in the factory selecting one of two classes.
+- **Selected Approach**: Inline branch in `_get_graphiti()` reads `Config.RERANKER_PROVIDER`, picks `_build_ollama_reranker()` or `_PassthroughReranker()`, validates unknown values with a `ValueError` matching the existing `_ALLOWED_GRAPHITI_PROVIDERS` convention.
+- **Rationale**: Mirrors the established `GRAPHITI_LLM_PROVIDER` validation pattern (`_ALLOWED_GRAPHITI_PROVIDERS`) without adding speculative abstraction. Two values, two branches.
+- **Trade-offs**: Adding a third provider later costs one more `elif`; acceptable.
+- **Follow-up**: Surface the selected provider in the INFO startup log so operators can confirm.
+
+### Decision: Per-passage scoring with `asyncio.gather`, no concurrency knob
+- **Context**: R2.3 requires one score per passage in descending order; R5 requires graceful per-call failure.
+- **Alternatives Considered**:
+  1. Single batched prompt with index-aligned output.
+  2. Per-passage call with bounded `Semaphore`.
+- **Selected Approach**: Per-passage `asyncio.gather` with no explicit limit; rely on default `limit ≤ 10` in `_GraphNamespace.search`.
+- **Rationale**: Simple, deterministic, isolates per-passage failures. Avoids premature configuration knob.
+- **Trade-offs**: If a future caller asks for `limit=100`, Ollama may queue 100 requests; acceptable for now because no caller does this.
+- **Follow-up**: If real-world rerank latency becomes a concern, add `RERANKER_MAX_PARALLEL` then.
+
+### Decision: Default model = `qwen2.5:3b`
+- **Context**: Need a small, broadly-available Ollama chat model that reliably emits a numeric score in 1–2 tokens.
+- **Alternatives Considered**:
+  1. `qwen2.5:3b` (Apache-2.0, 3B params, strong instruction following).
+  2. `llama3.2:3b` (Llama community license, 3B).
+  3. `phi3:3.8b` (MIT, 3.8B).
+- **Selected Approach**: `qwen2.5:3b`.
+- **Rationale**: Matches the Qwen-family alignment of the rest of the project (`qwen-plus` is the documented LLM default). Apache-2.0 license is permissive. Small enough for typical dev machines.
+- **Trade-offs**: Operators on systems without `qwen2.5:3b` must `ollama pull qwen2.5:3b` or override `RERANKER_MODEL`.
+- **Follow-up**: README will document `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large` step.
+
+### Decision: Defensive output parsing (`json.loads` → regex float → deterministic low score)
+- **Context**: R2.6 requires deterministic handling of unparseable model responses.
+- **Selected Approach**:
+  1. Strip `<think>...</think>` blocks (project convention from `llm_client.py:64`).
+  2. Strip markdown fences (project convention from `llm_client.chat_json`).
+  3. `json.loads` and read `score` (float in `[0, 1]`, clipped on out-of-range).
+  4. On JSON failure, regex-extract the first float token; clip to `[0, 1]`.
+  5. On total failure, assign `0.0 - 0.001 * passage_index` (deterministic and below any successfully-parsed score).
+- **Rationale**: Reuses patterns already in the codebase. Keeps every passage in the output (R2.6).
+- **Trade-offs**: One failed parse silently downranks a passage; logged at DEBUG (not WARNING) to avoid log spam.
+
+## Risks & Mitigations
+- **Risk**: Ollama service is not running on startup → boot must not fail. **Mitigation**: Construct only `AsyncOpenAI` (no network call) during `__init__`. Defer connectivity to first `rank()`. R5.4.
+- **Risk**: Model is not pulled → `rank()` raises 404 from Ollama. **Mitigation**: Catch within `rank()`, log WARNING naming model + error class, return passthrough-ordered tuples so search still works. R5.1, R5.3.
+- **Risk**: Operator misconfigures `RERANKER_PROVIDER` to an unknown value → silent fallthrough to wrong reranker. **Mitigation**: `_get_graphiti()` raises `ValueError` listing allowed values, mirroring `_ALLOWED_GRAPHITI_PROVIDERS`. R3.5.
+- **Risk**: Multiple concurrent `rank()` calls overwhelm a small local Ollama daemon. **Mitigation**: Accept default Graphiti `limit ≤ 10`; document `RERANKER_MAX_PARALLEL` as a future follow-up if needed.
+
+## References
+- `backend/app/services/graphiti_adapter.py:38-51` — current passthrough reranker contract.
+- `backend/app/services/graphiti_adapter.py:142-162` — current `_get_graphiti()` wiring point.
+- `backend/app/utils/llm_client.py` — project pattern for OpenAI-SDK chat + JSON parsing + reasoning-block stripping.
+- `.kiro/specs/graphiti-neo4j-finalize/research.md` — historical context for why the passthrough was introduced.
+- Ticket `#39` in `.ticket/39.md` — feature brief and acceptance criteria.
--- a/.kiro/specs/graphiti-ollama-reranker/spec.json
+++ b/.kiro/specs/graphiti-ollama-reranker/spec.json
@ -0,0 +1,23 @@
+{
+  "feature_name": "graphiti-ollama-reranker",
+  "created_at": "2026-05-11T10:24:16Z",
+  "updated_at": "2026-05-11T10:45:00Z",
+  "language": "en",
+  "phase": "tasks-generated",
+  "approvals": {
+    "requirements": {
+      "generated": true,
+      "approved": true
+    },
+    "design": {
+      "generated": true,
+      "approved": true
+    },
+    "tasks": {
+      "generated": true,
+      "approved": true
+    }
+  },
+  "ready_for_implementation": true,
+  "ticket": 39
+}
--- a/.kiro/specs/graphiti-ollama-reranker/tasks.md
+++ b/.kiro/specs/graphiti-ollama-reranker/tasks.md
@ -0,0 +1,89 @@
+# Implementation Plan
+
+> Foundation tasks introduce the four `RERANKER_*` configuration knobs.
+> Core tasks add the new `OllamaReranker` and the factory selection branch.
+> Integration tasks wire documentation parity.
+> Validation closes the loop with a structural sweep.
+
+## Foundation
+
+- [x] 1. Add reranker configuration surface
+- [x] 1.1 Introduce four `RERANKER_*` settings on the `Config` class
+  - Add `RERANKER_PROVIDER` with default `ollama`, read via `os.environ.get('RERANKER_PROVIDER', 'ollama')`.
+  - Add `RERANKER_MODEL` with default `qwen2.5:3b`, read via `os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')`.
+  - Add `RERANKER_BASE_URL` with default that chains to the embedding host: `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`. Do not reference `Config.EMBEDDING_BASE_URL` directly; use the env-lookup form so behaviour stays consistent under reload patterns.
+  - Add `RERANKER_API_KEY` with default that chains to the embedding key the same way (`os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))`).
+  - Do not add the reranker to `Config.validate()`; the provider has no mandatory credentials.
+  - Observable completion: a Python REPL that imports `Config` shows the four attributes with the documented defaults, and overriding `EMBEDDING_BASE_URL` in the environment is visible on `Config.RERANKER_BASE_URL` too.
+  - _Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6_
+
+## Core
+
+- [x] 2. Implement the Ollama-backed reranker
+- [x] 2.1 Create the new reranker module with the `CrossEncoderClient` subclass
+  - Define a new module under `backend/app/services/` that hosts the reranker class. The class subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements only the async `rank` method.
+  - Constructor accepts `model`, `base_url`, `api_key` as keyword arguments; it instantiates `openai.AsyncOpenAI(base_url=..., api_key=...)` but performs no network I/O so the Flask app can boot when Ollama is unreachable.
+  - `rank(query, passages)` short-circuits on empty `passages` and returns `[]` without any model call.
+  - For each passage, send a single chat-completion request with `temperature=0.0` and a deterministic system prompt asking for a JSON object `{"score": <0.0..1.0>}` describing the passage's relevance to the query. Use `asyncio.gather` to run all per-passage requests concurrently.
+  - Parse each model response defensively: strip any `<think>...</think>` block, strip markdown code fences, attempt `json.loads`, fall back to regex-extract the first floating-point number, clip the value to `[0.0, 1.0]`. On any per-passage failure, assign a deterministic fallback score of `-0.001 * passage_index` and log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome.
+  - Wrap the whole call in a `try/except`. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` so search remains functional. The method must not raise.
+  - Sort the returned list by score descending before returning.
+  - Observable completion: instantiating the new class with a deliberately bad `base_url` does not raise; an async call to `rank("q", [])` returns `[]`; an async call with two non-empty passages against a reachable Ollama returns two `(passage, float)` tuples in descending-score order, with every input passage byte-identical in the output.
+  - _Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1_
+  - _Boundary: OllamaReranker module_
+
+## Integration
+
+- [x] 3. Wire the new reranker into the Graphiti factory
+- [x] 3.1 Select the reranker inside `_get_graphiti()` based on `Config.RERANKER_PROVIDER`
+  - Introduce a small allow-list constant alongside `_ALLOWED_GRAPHITI_PROVIDERS` enumerating `("ollama", "none")`.
+  - Read `Config.RERANKER_PROVIDER`, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raise `ValueError` with a message that names the offending value and lists the accepted values — same shape as the existing `GRAPHITI_LLM_PROVIDER` validation.
+  - For `ollama`, construct the new `OllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY)` and pass it as the `cross_encoder=` argument to `Graphiti(...)`.
+  - For `none`, continue to pass `_PassthroughReranker()` as today; do not change the passthrough class.
+  - Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log).
+  - Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime.
+  - Observable completion: with `RERANKER_PROVIDER` unset, app startup logs `Initializing Graphiti reranker (provider=ollama)...` and Graphiti is constructed with the `OllamaReranker`. With `RERANKER_PROVIDER=none`, the log reports `none` and Graphiti uses `_PassthroughReranker`. With `RERANKER_PROVIDER=banana`, `_get_graphiti()` raises `ValueError` listing `('ollama', 'none')`.
+  - _Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3_
+  - _Depends: 1.1, 2.1_
+
+- [ ] 4. Update operator-facing documentation
+- [ ] 4.1 (P) Add the new env knobs to `.env.example`  *(deferred — sandbox hook blocks all `.env*` access; see HANDOFF.md)*
+  - Insert a four-line `RERANKER_*` block adjacent to the existing `EMBEDDING_*` block, mirroring the comment style (default, accepted values, and a one-line note that `RERANKER_PROVIDER=none` disables reranking).
+  - Observable completion: opening `.env.example` shows the four new variables with documented defaults, positioned next to the embedding block.
+  - _Requirements: 6.1_
+  - _Boundary: .env.example_
+  - _Depends: 1.1_
+
+- [x] 4.2 (P) Extend the `Required Environment Variables` snippet in `CLAUDE.md`
+  - Add the four `RERANKER_*` variables to the existing fenced code block under "Required Environment Variables" in `CLAUDE.md`, keeping the same comment style used for the `EMBEDDING_*` block.
+  - Observable completion: `CLAUDE.md` documents the four reranker variables next to the embedding block and includes a note that `RERANKER_PROVIDER=none` keeps the previous passthrough behaviour.
+  - _Requirements: 6.2_
+  - _Boundary: CLAUDE.md_
+  - _Depends: 1.1_
+
+- [x] 4.3 (P) Document the Ollama pull prerequisite and env block in `README.md`
+  - In the existing "Install Ollama and pull the default embedding model" section, add a parallel `ollama pull qwen2.5:3b` step (or note that the model used for reranking must be pulled, using the documented default).
+  - In the `.env` snippet under "Configure Environment Variables", add the four `RERANKER_*` lines with brief comments mirroring the embedding-block style.
+  - Treat `README-EN.md` and `README-ZH.md` translations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift.
+  - Observable completion: `README.md` shows the `ollama pull qwen2.5:3b` step and the four reranker env lines in the `.env` snippet.
+  - _Requirements: 6.3_
+  - _Boundary: README.md_
+  - _Depends: 1.1_
+
+- [x] 4.4 (P) Update the stale follow-up claim in the prior spec
+  - In `.kiro/specs/graphiti-neo4j-finalize/research.md`, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped under `graphiti-ollama-reranker`. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough.
+  - Observable completion: a grep for "real per-provider reranker is a follow-up" across `.kiro/specs/` returns either zero hits or a pointer note to `graphiti-ollama-reranker`.
+  - _Requirements: 6.4_
+  - _Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md_
+
+## Validation
+
+- [x] 5. Structural verification sweep
+- [x] 5.1 Grep for legacy reranker references and verify the new wiring is reachable
+  - Grep `backend/app/services/` for `gpt-4.1-nano` and `OpenAIRerankerClient`; both must return zero hits in code paths owned by this spec.
+  - Grep `backend/app/services/graphiti_adapter.py` for the symbol of the new reranker class; confirm there is exactly one import site and one use site (the `_get_graphiti()` branch).
+  - Confirm the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) require no source changes by grepping for `client.graph.search(` call sites and verifying the kwarg shape is unchanged.
+  - Confirm `_GraphNamespace.search` still filters by `group_id` (no regression to project isolation).
+  - Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged.
+  - _Requirements: 1.4, 7.1, 7.2, 7.3_
+  - _Depends: 3.1_
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -84,6 +84,17 @@ EMBEDDING_API_KEY        # Default: "ollama"  (Ollama ignores the value)
                         # nomic-embed-text are not supported.
                         # Prerequisite for the default: `ollama pull mxbai-embed-large`.

+# Reranker (cross-encoder for Graphiti search results)
+RERANKER_PROVIDER        # Default: ollama  (allowed: "ollama", "none")
+                         # "none" keeps the legacy passthrough — useful for CI /
+                         # slim containers that cannot pull a reranker model.
+RERANKER_MODEL           # Default: qwen2.5:3b  (local Ollama chat model)
+                         # Prerequisite for the default: `ollama pull qwen2.5:3b`.
+RERANKER_BASE_URL        # Default: value of EMBEDDING_BASE_URL
+                         # (typically http://localhost:11434/v1)
+RERANKER_API_KEY         # Default: value of EMBEDDING_API_KEY
+                         # (Ollama ignores the value)
+
 # Optional — Accelerated LLM (omit entirely if not used)
 LLM_BOOST_API_KEY
 LLM_BOOST_BASE_URL
--- a/README.md
+++ b/README.md
@ -137,11 +137,12 @@ neo4j-admin dbms set-initial-password your_neo4j_password
 neo4j start
 ```

-**Install Ollama and pull the default embedding model:**
+**Install Ollama and pull the default models:**

 ```bash
 # macOS / Linux: https://ollama.com/download
-ollama pull mxbai-embed-large
+ollama pull mxbai-embed-large   # embedder for the knowledge graph
+ollama pull qwen2.5:3b          # reranker for Graphiti search results
 # Ollama serves the OpenAI-compatible /v1 endpoint on http://localhost:11434
 # by default — no further configuration required.
 ```
@ -181,6 +182,17 @@ EMBEDDING_BASE_URL=http://localhost:11434/v1
 EMBEDDING_API_KEY=ollama
 EMBEDDING_MODEL=mxbai-embed-large

+# Reranker — reorders Graphiti search results before the report tools see them.
+# Default targets the same local Ollama host used for embeddings.
+# Pre-requisite for the default: `ollama pull qwen2.5:3b`.
+# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI /
+# slim containers that cannot pull a reranker model).
+RERANKER_PROVIDER=ollama
+RERANKER_MODEL=qwen2.5:3b
+# Optional — both default to the EMBEDDING_* equivalents when unset.
+# RERANKER_BASE_URL=http://localhost:11434/v1
+# RERANKER_API_KEY=ollama
+
 # Embeddings — remote fallback (uncomment ONE block if you prefer not to run
 # Ollama locally). Note: any override must produce 1024-dim vectors to match
 # Graphiti's vector index — 768-dim models (e.g. nomic-embed-text) are NOT
--- a/backend/app/config.py
+++ b/backend/app/config.py
@ -52,6 +52,24 @@ class Config:
    # to use Google Gemini directly.
    GRAPHITI_LLM_PROVIDER = os.environ.get('GRAPHITI_LLM_PROVIDER', 'openai')

+    # Reranker (cross-encoder) settings. The reranker reorders Graphiti search
+    # results before they reach the ReportAgent tools. Defaults target the same
+    # local Ollama host used for embeddings; setting RERANKER_PROVIDER=none
+    # disables reranking and keeps the legacy passthrough (useful for CI or
+    # slim containers that cannot pull the reranker model). RERANKER_BASE_URL
+    # and RERANKER_API_KEY chain through EMBEDDING_BASE_URL / EMBEDDING_API_KEY
+    # so a single-host Ollama deployment needs no extra configuration.
+    RERANKER_PROVIDER = os.environ.get('RERANKER_PROVIDER', 'ollama')
+    RERANKER_MODEL = os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')
+    RERANKER_BASE_URL = os.environ.get(
+        'RERANKER_BASE_URL',
+        os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'),
+    )
+    RERANKER_API_KEY = os.environ.get(
+        'RERANKER_API_KEY',
+        os.environ.get('EMBEDDING_API_KEY', 'ollama'),
+    )
+
    # Zep settings (kept for backwards compatibility; deprecated).
    ZEP_API_KEY = os.environ.get('ZEP_API_KEY', '')

--- a/backend/app/services/graphiti_adapter.py
+++ b/backend/app/services/graphiti_adapter.py
@ -31,6 +31,7 @@ from graphiti_core.cross_encoder.client import CrossEncoderClient

 from ..config import Config
 from ..utils.logger import get_logger
+from .ollama_reranker import OllamaReranker

 logger = get_logger('mirofish.graphiti_adapter')

@ -42,7 +43,9 @@ class _PassthroughReranker(CrossEncoderClient):
    descending scores. Injected explicitly so Graphiti does not fall back
    to its default ``OpenAIRerankerClient`` (which uses a hard-coded
    ``gpt-4.1-nano`` model with logprobs and would 401 against Qwen /
-    Dashscope keys). A real per-provider reranker is a follow-up.
+    Dashscope keys). Selected when ``Config.RERANKER_PROVIDER == "none"``
+    — useful for CI / slim containers that cannot pull the reranker model.
+    For real reranking, set ``RERANKER_PROVIDER=ollama`` (the default).
    """

    async def rank(self, query: str, passages: list[str]) -> list[tuple[str, float]]:
@ -87,6 +90,31 @@ _graphiti_lock = threading.Lock()


 _ALLOWED_GRAPHITI_PROVIDERS = ("openai", "gemini")
+_ALLOWED_RERANKER_PROVIDERS = ("ollama", "none")
+
+
+def _build_reranker(provider: str) -> CrossEncoderClient:
+    """Build the cross-encoder reranker for the configured provider.
+
+    Defers to ``_PassthroughReranker`` when ``provider`` is ``"none"``
+    (the legacy no-op behaviour, useful for CI / slim containers that
+    cannot pull the reranker model). For ``"ollama"`` it constructs the
+    real Ollama-backed reranker; the construction is side-effect-free, so
+    Graphiti initialisation does not depend on the Ollama daemon being
+    reachable at startup.
+    """
+    if provider == "none":
+        return _PassthroughReranker()
+    if provider == "ollama":
+        return OllamaReranker(
+            model=Config.RERANKER_MODEL,
+            base_url=Config.RERANKER_BASE_URL,
+            api_key=Config.RERANKER_API_KEY,
+        )
+    raise ValueError(
+        f"Unknown RERANKER_PROVIDER={provider!r}; "
+        f"allowed: {_ALLOWED_RERANKER_PROVIDERS}"
+    )


 def _build_llm_and_embedder(provider: str):
@ -146,14 +174,19 @@ def _get_graphiti() -> Graphiti:
            if _graphiti_instance is None:
                provider = (Config.GRAPHITI_LLM_PROVIDER or "openai").lower()
                logger.info(f"Initializing Graphiti client (provider={provider})...")
+                reranker_provider = (Config.RERANKER_PROVIDER or "ollama").lower()
+                logger.info(
+                    f"Initializing Graphiti reranker (provider={reranker_provider})..."
+                )
                llm_client, embedder = _build_llm_and_embedder(provider)
+                cross_encoder = _build_reranker(reranker_provider)
                g = Graphiti(
                    Config.NEO4J_URI,
                    Config.NEO4J_USER,
                    Config.NEO4J_PASSWORD,
                    llm_client=llm_client,
                    embedder=embedder,
-                    cross_encoder=_PassthroughReranker(),
+                    cross_encoder=cross_encoder,
                )
                # Use the persistent loop so the driver is bound to it from the start
                _run(g.build_indices_and_constraints())
--- a/backend/app/services/ollama_reranker.py
+++ b/backend/app/services/ollama_reranker.py
@ -0,0 +1,170 @@
+"""Ollama-backed cross-encoder reranker for Graphiti search.
+
+Replaces the no-op ``_PassthroughReranker`` injected into Graphiti by default
+with a real reranker that scores passages against a query through an Ollama
+chat model exposed over its OpenAI-compatible ``/v1`` surface.
+
+The class implements only ``CrossEncoderClient.rank`` (the sole abstract
+member Graphiti requires) and is constructed by ``graphiti_adapter._get_graphiti``
+when ``Config.RERANKER_PROVIDER == "ollama"``. It does not perform any
+network I/O at construction time so the Flask app can boot even when the
+Ollama daemon is unreachable; failures are handled inside ``rank`` and never
+propagate, so graph search remains functional under degradation.
+"""
+
+import asyncio
+import json
+import re
+from typing import List, Tuple
+
+from openai import AsyncOpenAI
+from graphiti_core.cross_encoder.client import CrossEncoderClient
+
+from ..utils.logger import get_logger
+
+logger = get_logger('mirofish.ollama_reranker')
+
+
+_THINK_BLOCK = re.compile(r"<think>[\s\S]*?</think>", re.IGNORECASE)
+_CODE_FENCE_START = re.compile(r"^```(?:json)?\s*\n?", re.IGNORECASE)
+_CODE_FENCE_END = re.compile(r"\n?```\s*$")
+_FIRST_FLOAT = re.compile(r"-?\d+(?:\.\d+)?")
+
+_SYSTEM_PROMPT = (
+    "You are a relevance grader. Given a user query and a single passage, "
+    "rate how relevant the passage is to the query on a continuous scale "
+    "from 0.0 (not relevant at all) to 1.0 (perfectly relevant). "
+    "Respond with a single JSON object of the form {\"score\": <float>} "
+    "and nothing else."
+)
+
+
+def _clip_unit(value: float) -> float:
+    """Clamp ``value`` into the closed interval [0.0, 1.0]."""
+    if value < 0.0:
+        return 0.0
+    if value > 1.0:
+        return 1.0
+    return value
+
+
+def _parse_score(raw: str) -> float:
+    """Parse a model response into a relevance score in [0.0, 1.0].
+
+    Strips reasoning ``<think>`` blocks and markdown fences (the same
+    defensive pattern used in ``utils/llm_client.py``), then attempts
+    ``json.loads`` and reads ``score``. Falls back to extracting the first
+    floating-point number from the cleaned text. Raises ``ValueError`` when
+    no numeric value can be recovered.
+    """
+    text = _THINK_BLOCK.sub("", raw or "").strip()
+    text = _CODE_FENCE_START.sub("", text)
+    text = _CODE_FENCE_END.sub("", text).strip()
+
+    try:
+        parsed = json.loads(text)
+    except (json.JSONDecodeError, TypeError):
+        parsed = None
+
+    if isinstance(parsed, dict) and "score" in parsed:
+        try:
+            return _clip_unit(float(parsed["score"]))
+        except (TypeError, ValueError):
+            pass
+
+    match = _FIRST_FLOAT.search(text)
+    if match is not None:
+        try:
+            return _clip_unit(float(match.group(0)))
+        except ValueError:
+            pass
+
+    raise ValueError(f"no numeric score in model response: {text!r}")
+
+
+class OllamaReranker(CrossEncoderClient):
+    """Cross-encoder reranker that scores passages via an Ollama chat model.
+
+    Subclass of :class:`graphiti_core.cross_encoder.client.CrossEncoderClient`
+    that implements ``rank`` by issuing one chat-completion request per
+    passage through ``openai.AsyncOpenAI`` (which speaks the OpenAI-compatible
+    surface exposed by Ollama on ``/v1``).
+
+    Construction is side-effect-free: building the underlying ``AsyncOpenAI``
+    client does not perform any network I/O, so ``_get_graphiti`` can wire
+    this class up at startup even when the Ollama daemon is unavailable.
+    Failures surface only at ``rank`` call time and are degraded to a
+    passthrough-style result with a single ``WARNING`` log per failed call.
+    """
+
+    def __init__(self, *, model: str, base_url: str, api_key: str) -> None:
+        """Configure the reranker.
+
+        Args:
+            model: Name of the Ollama chat model used to score passages
+                (for example ``qwen2.5:3b``). The operator is expected to
+                have run ``ollama pull <model>`` before reranking is exercised.
+            base_url: OpenAI-compatible endpoint for the Ollama server, for
+                example ``http://localhost:11434/v1``.
+            api_key: API key forwarded to the OpenAI client. Ollama ignores
+                the value but the SDK requires a non-empty string.
+        """
+        self._model = model
+        self._client = AsyncOpenAI(base_url=base_url, api_key=api_key)
+
+    async def _score_passage(self, query: str, passage: str, index: int) -> float:
+        """Score a single passage; deterministic low fallback on parse failure."""
+        user_prompt = (
+            f"Query:\n{query}\n\n"
+            f"Passage:\n{passage}\n\n"
+            "Reply with only the JSON object described in the system prompt."
+        )
+        response = await self._client.chat.completions.create(
+            model=self._model,
+            messages=[
+                {"role": "system", "content": _SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt},
+            ],
+            temperature=0.0,
+            max_tokens=32,
+        )
+        raw = response.choices[0].message.content or ""
+        try:
+            return _parse_score(raw)
+        except ValueError as exc:
+            logger.debug(
+                "Reranker parse failure (model=%s, passage_index=%d): %s",
+                self._model, index, exc,
+            )
+            return -0.001 * (index + 1)
+
+    async def rank(
+        self,
+        query: str,
+        passages: List[str],
+    ) -> List[Tuple[str, float]]:
+        """Return ``(passage, score)`` tuples sorted by score descending.
+
+        Empty ``passages`` returns ``[]`` without any model call. On a
+        whole-call failure (connection refused, model 404, timeout, etc.)
+        the method logs a single ``WARNING`` and returns the passages in
+        their original order with synthetic descending scores so graph
+        search keeps functioning. The method does not raise.
+        """
+        if not passages:
+            return []
+
+        try:
+            scores = await asyncio.gather(
+                *(self._score_passage(query, p, i) for i, p in enumerate(passages))
+            )
+        except Exception as exc:  # noqa: BLE001 — graceful degrade per design R5
+            logger.warning(
+                "Ollama reranker failed (model=%s, error=%s); falling back to passthrough order.",
+                self._model, type(exc).__name__,
+            )
+            return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]
+
+        scored = list(zip(passages, scores))
+        scored.sort(key=lambda item: item[1], reverse=True)
+        return scored