Merge pull request #40 from salestech-group/fix/39-ollama-reranker

fix(graph): replace passthrough reranker with ollama-backed cross-encoder
This commit is contained in:
Dominik Seemann 2026-05-11 12:43:02 +02:00 committed by GitHub
commit 04a00ac437
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
15 changed files with 1130 additions and 8 deletions

View File

@ -62,7 +62,7 @@ Same upload+build flow; expect identical behaviour to pre-change implementation.
## Notes for reviewers
- **Default provider flipped** from Gemini (de-facto) to OpenAI-compatible (documented). Existing Gemini deployments must add `GRAPHITI_LLM_PROVIDER=gemini` to `.env` after pulling. Documented in the new `.env.example` and design.md migration section.
- **Reranker is still passthrough** — same behavioural state as before (no real reranking). A real per-provider reranker is intentionally deferred; explanation in `research.md` → "Reranker default behaviour".
- **Reranker is still passthrough** — same behavioural state as before (no real reranking). _Update:_ this was deferred from this spec and has since shipped in follow-up spec `graphiti-ollama-reranker` (ticket #39): the default is now an Ollama-backed `CrossEncoderClient`; `RERANKER_PROVIDER=none` preserves the passthrough behaviour described here.
- **`.env.example` write went through Python heredoc** because `pre_tool_env_guard.sh` blocks `cat > .env*` patterns. Worth confirming the file content is what you expect; the new content mirrors the README env section verbatim.
## Spec artefacts

View File

@ -16,7 +16,7 @@
- `.env.example` matches what the code reads; the README is unchanged (already correct).
### Non-Goals
- Implementing a real per-provider reranker (deferred to a follow-up).
- Implementing a real per-provider reranker (deferred to a follow-up — shipped in `graphiti-ollama-reranker`, ticket #39).
- Pagination cleanup of `_NodeNamespace.get_by_graph_id` / `_EdgeNamespace.get_by_graph_id` (low priority, deferred).
- Renaming `zep_*` files (tracked separately).
- Migrating data from existing Zep Cloud deployments (project is local-only by design now).
@ -336,7 +336,7 @@ class _PassthroughReranker(CrossEncoderClient):
**Implementation Notes**
- Integration: Always injected by `_get_graphiti()` regardless of provider.
- Validation: None.
- Risks: Search results are still un-reranked. Same behaviour as today; future ticket may introduce a real per-provider reranker.
- Risks: Search results are still un-reranked. Same behaviour as today; superseded by follow-up spec `graphiti-ollama-reranker` (ticket #39), which introduces a real Ollama-backed reranker and keeps this passthrough only when `RERANKER_PROVIDER=none`.
#### `_get_graphiti()` (refactored)

View File

@ -24,7 +24,7 @@
- **Context**: Ticket suggests dropping `_GeminiReranker` and "letting Graphiti use its sane default." Verify the default is sane for Qwen.
- **Sources Consulted**: `graphiti_core/graphiti.py:154`, `graphiti_core/cross_encoder/openai_reranker_client.py`.
- **Findings**: Default is `OpenAIRerankerClient()` with no config → tries `AsyncOpenAI(api_key=None, base_url=None)` → 401 against any non-OpenAI key. Reranker model is fixed to `gpt-4.1-nano`, which Dashscope does not host.
- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker is out of scope (would need a custom OpenAI-compatible logprobs implementation, which Dashscope/Qwen does not reliably support).
- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker was out of scope for this spec; follow-up spec `graphiti-ollama-reranker` (ticket #39) replaces the passthrough with an Ollama-backed `CrossEncoderClient` and keeps `_PassthroughReranker` only when `RERANKER_PROVIDER=none`.
### Env-guard hook scope
- **Context**: First Read of `.env.example` was blocked.

View File

@ -0,0 +1,53 @@
# Handoff — graphiti-ollama-reranker
## What shipped
| Task | Status | Notes |
|------|--------|-------|
| 1.1 — Config knobs | ✅ | Four `RERANKER_*` attrs added; `BASE_URL`/`API_KEY` chain to `EMBEDDING_*`. |
| 2.1 — `OllamaReranker` | ✅ | New `backend/app/services/ollama_reranker.py`. Construction is side-effect-free; `rank()` never raises; per-passage parse falls back to deterministic low score; whole-call failure degrades to passthrough order with a single WARNING log. |
| 3.1 — Factory wiring | ✅ | `_get_graphiti()` selects the reranker via new `_build_reranker()`. INFO log announces selection. `ValueError` raised for unknown providers. |
| 4.1 — `.env.example` | ⚠️ Deferred | The `pre_tool_env_guard.sh` Claude hook blocks all `.env*` access (Read, Write, Edit, Bash). Cannot be performed inside this autonomous sandbox. **Reviewer action required** — see snippet below. |
| 4.2 — `CLAUDE.md` | ✅ | New `RERANKER_*` block added under "Required Environment Variables". |
| 4.3 — `README.md` | ✅ | Adds `ollama pull qwen2.5:3b` to the prerequisites and a `RERANKER_*` block in the `.env` snippet. `README-EN.md` / `README-ZH.md` left out per design scope (i18n is its own workstream). |
| 4.4 — Prior-spec follow-up note | ✅ | Updated `graphiti-neo4j-finalize`'s `research.md`, `design.md`, and `HANDOFF.md` to point at this spec; updated the `_PassthroughReranker` docstring in `graphiti_adapter.py`. |
| 5.1 — Structural sweep | ✅ | `gpt-4.1-nano` / `OpenAIRerankerClient` referenced only in docstring text. `OllamaReranker` has exactly one import + one use site. `_GraphNamespace.search` still filters by `group_id`. |
## Reviewer action required: `.env.example`
Please paste the following block into `.env.example` alongside the existing `EMBEDDING_*` section:
```env
# Reranker — reorders Graphiti search results before the report tools see them.
# Default targets the same local Ollama host used for embeddings.
# Pre-requisite for the default: `ollama pull qwen2.5:3b`.
# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI /
# slim containers that cannot pull a reranker model).
RERANKER_PROVIDER=ollama
RERANKER_MODEL=qwen2.5:3b
# Optional — both default to the EMBEDDING_* equivalents when unset.
# RERANKER_BASE_URL=http://localhost:11434/v1
# RERANKER_API_KEY=ollama
```
This block matches what `CLAUDE.md` and `README.md` document. After paste, R6.1 is satisfied and ticket #39's acceptance-criteria checkbox "Configuration is overridable via env vars and documented in `.env.example`" becomes green.
## Verification performed
- `Config` loads with the documented defaults; `EMBEDDING_BASE_URL` override propagates to `RERANKER_BASE_URL`.
- `OllamaReranker` constructs without network I/O; empty `passages` returns `[]`; whole-call failure logs WARNING and returns passthrough-ordered tuples.
- `_build_reranker("ollama")``OllamaReranker`; `("none")``_PassthroughReranker`; `("banana")``ValueError` naming the offender and listing `("ollama", "none")`.
- Grep sweep matches design expectations (see Tasks 5.1 in `tasks.md`).
## Smoke test (recommended before merge)
With Ollama running and the reranker model pulled:
```bash
ollama pull qwen2.5:3b
RERANKER_PROVIDER=ollama npm run backend
# In another shell, exercise a graph build + report tool and confirm:
# - Startup log shows "Initializing Graphiti reranker (provider=ollama)..."
# - Search-backed report tool results differ from `RERANKER_PROVIDER=none` output
# - No WARNING about reranker failure in `backend/logs/`
```

View File

@ -0,0 +1,395 @@
# Design — graphiti-ollama-reranker
## Overview
**Purpose**: Replace the no-op `_PassthroughReranker` injected into Graphiti with a real Ollama-backed `CrossEncoderClient`, so that hybrid search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (`RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY`) with Ollama-aligned defaults; an explicit `RERANKER_PROVIDER=none` preserves the passthrough for CI and slim containers.
**Users**: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.
**Impact**: Adds one new module under `backend/app/services/`, four `Config` attributes, a small selection branch in `_get_graphiti()`, and documentation in `.env.example`, `CLAUDE.md`, `README.md`. No data schema, no API, no UI changes. Behavior under `RERANKER_PROVIDER=none` is identical to today.
### Goals
- Default Ollama-backed reranker producing one `(passage, score)` tuple per input passage, sorted descending by score.
- Env-driven configuration with sensible Ollama defaults inherited from existing `EMBEDDING_*` settings.
- Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
- Documentation parity with `EMBEDDING_*` knobs in `.env.example`, `CLAUDE.md`, and `README.md`.
### Non-Goals
- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
- Upstream contributions to `graphiti-core`.
- Adding a `sentence-transformers` or other non-`openai` reranker dependency.
## Boundary Commitments
### This Spec Owns
- The Ollama reranker implementation and its prompt/parse logic.
- The `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY` settings and their defaults.
- The branch in `_get_graphiti()` that selects between the Ollama reranker and the passthrough.
- The startup INFO log line that announces the selected reranker.
- Documentation entries in `.env.example`, `CLAUDE.md` "Required Environment Variables", and `README.md` Ollama prerequisites.
### Out of Boundary
- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
- Per-passage retrieval (still owned by `_GraphNamespace.search` and Graphiti).
- The `group_id` scoping rules.
- Any change to the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) — they receive reranked output transparently.
- Implementation of additional reranker providers; this design covers only `ollama` and `none`.
### Allowed Dependencies
- Upstream library: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0).
- In-repo: `Config` (`backend/app/config.py`), `get_logger` (`backend/app/utils/logger.py`), `openai.AsyncOpenAI` (already installed).
- Existing factory: `_get_graphiti()` continues to be the singleton chokepoint.
### Revalidation Triggers
- If `graphiti-core` changes the `CrossEncoderClient.rank` signature, this design must be revisited.
- If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in `research.md`).
- If `Config.GRAPHITI_LLM_PROVIDER` semantics change in a way that re-couples LLM and reranker, this design must be checked.
## Architecture
### Existing Architecture Analysis
- `_get_graphiti()` already injects an explicit `cross_encoder=_PassthroughReranker()` (line 156). The pattern of double-checked-locking singleton with provider switch (`GRAPHITI_LLM_PROVIDER`) is mature and must be preserved.
- The persistent event loop (`_get_loop`, `_run`) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does **not** need to schedule work onto `_get_loop()`.
- All four ReportAgent tools call `_GraphNamespace.search`, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.
### Architecture Pattern & Boundary Map
```mermaid
graph LR
subgraph Config
EnvVars[RERANKER_*\nenv vars]
ConfigCls[Config attributes]
EnvVars --> ConfigCls
end
subgraph Adapter
Factory[_get_graphiti]
Passthrough[_PassthroughReranker]
OllamaCls[OllamaReranker]
Factory -->|provider=none| Passthrough
Factory -->|provider=ollama| OllamaCls
end
subgraph Graphiti
GraphitiCore[Graphiti instance]
Search[_GraphNamespace.search]
Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
end
ConfigCls --> Factory
Passthrough -->|injected as cross_encoder| GraphitiCore
OllamaCls -->|injected as cross_encoder| GraphitiCore
GraphitiCore --> Search
Search --> Tools
OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]
```
**Architecture Integration**:
- **Selected pattern**: Strategy pattern with two implementations selected at factory time. Same shape as the existing `GRAPHITI_LLM_PROVIDER` branch.
- **Domain/feature boundaries**: Reranker construction and prompt/parse live in `ollama_reranker.py`. Wiring lives in `graphiti_adapter.py`. Config lives in `config.py`. No overlap.
- **Existing patterns preserved**: Double-checked-locking singleton; explicit `cross_encoder` injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; `Config` reads via `os.environ.get(..., default)`.
- **New components rationale**: `OllamaReranker` is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
- **Steering compliance**: Single OpenAI-SDK convention preserved; per-project `group_id` scoping unaffected; no new dependency.
### Technology Stack
| Layer | Choice / Version | Role in Feature | Notes |
|-------|------------------|-----------------|-------|
| Backend / Services | Python ≥3.11, async via `asyncio` | Hosts the new reranker class. | Inherits project minimum. |
| LLM client | `openai` SDK (already pinned, v2.x) | `AsyncOpenAI` chat completions against Ollama's `/v1`. | No new dependency. |
| Model | Ollama-served chat model, default `qwen2.5:3b` | Produces a numeric relevance score per passage. | Operator may override via `RERANKER_MODEL`. |
| Endpoint | Ollama's OpenAI-compatible `/v1` | Default `http://localhost:11434/v1`. | Reuses `EMBEDDING_BASE_URL` semantics. |
| Graph layer | `graphiti-core ≥ 0.3` | Consumes the new `CrossEncoderClient`. | No upstream change. |
## File Structure Plan
### Directory Structure
```
backend/app/
├── services/
│ ├── graphiti_adapter.py # MODIFIED — factory branches on RERANKER_PROVIDER
│ └── ollama_reranker.py # NEW — OllamaReranker(CrossEncoderClient)
├── config.py # MODIFIED — adds RERANKER_* attrs
└── utils/
└── logger.py # unchanged
repo-root/
├── .env.example # MODIFIED — adds RERANKER_* block
├── CLAUDE.md # MODIFIED — Required Environment Variables
└── README.md # MODIFIED — Ollama prerequisites note
```
### Modified Files
- `backend/app/services/graphiti_adapter.py` — Add small branch in `_get_graphiti()` that picks `OllamaReranker()` or `_PassthroughReranker()` based on `Config.RERANKER_PROVIDER`. Log the selection at INFO. `_PassthroughReranker` class is unchanged.
- `backend/app/config.py` — Add four new class attributes with documented defaults. No change to existing `validate()` (reranker has no mandatory key).
- `.env.example` — Add a four-line `RERANKER_*` block with comments mirroring the `EMBEDDING_*` style.
- `CLAUDE.md` — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
- `README.md` — Update the Ollama prerequisite section to mention `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large`.
> `_PassthroughReranker` stays in `graphiti_adapter.py` (unchanged contract); only the wiring around it changes.
## System Flows
```mermaid
sequenceDiagram
participant Search as _GraphNamespace.search
participant Graphiti as graphiti-core
participant Reranker as OllamaReranker.rank
participant Ollama as Ollama /v1/chat/completions
Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
Graphiti->>Graphiti: hybrid retrieval (RRF)
Graphiti->>Reranker: rank(query, [p1..pN])
par per-passage scoring
Reranker->>Ollama: chat.completions(prompt p1, temp=0)
Reranker->>Ollama: chat.completions(prompt p2, temp=0)
Reranker->>Ollama: chat.completions(prompt pN, temp=0)
end
alt all scores parsed
Reranker-->>Graphiti: sorted [(p, score), ...]
else any failure
Reranker->>Reranker: log WARNING, return passthrough order
Reranker-->>Graphiti: original order with synthetic scores
end
Graphiti-->>Search: ranked edges/nodes
Search-->>Tools: ranked results
```
**Decision points after diagram**:
- `temperature=0.0` makes the score deterministic per (query, passage, model) tuple.
- Per-passage failures (one bad parse out of N) downrank that passage to `0.0 - 0.001 * index` and continue; only whole-call exceptions degrade to passthrough.
- The reranker never raises; this isolates Graphiti from upstream noise even when `_GraphNamespace.search`'s existing exception swallow is removed in a future refactor.
## Requirements Traceability
| Requirement | Summary | Components | Interfaces | Flows |
|-------------|---------|------------|------------|-------|
| 1.1 | Default reranker is Ollama-backed | `_get_graphiti()`, `OllamaReranker` | Inline factory branch | Adapter init |
| 1.2 | No dependency on `OpenAIRerankerClient` | `_get_graphiti()` | Explicit `cross_encoder=` injection (unchanged behavior) | — |
| 1.3 | Unset → defaults to `ollama` | `Config.RERANKER_PROVIDER` | `os.environ.get('RERANKER_PROVIDER', 'ollama')` | — |
| 1.4 | No `gpt-4.1-nano` reference | All new files | — | — |
| 2.1 | Subclass `CrossEncoderClient.rank` | `OllamaReranker` | `async rank(query, passages) -> list[tuple[str, float]]` | Per-passage scoring |
| 2.2 | Uses `openai.AsyncOpenAI` | `OllamaReranker.__init__` | `AsyncOpenAI(base_url, api_key)` | — |
| 2.3 | Returns passages sorted descending | `OllamaReranker.rank` | Postcondition: descending by score | — |
| 2.4 | Empty input → empty output, no model call | `OllamaReranker.rank` | Guard at method entry | — |
| 2.5 | Preserves passage strings byte-for-byte | `OllamaReranker.rank` | Strings are echoed, never rewritten | — |
| 2.6 | Unparseable score → deterministic low fallback | `OllamaReranker.rank` | Internal `_parse_score` helper | Failure branch |
| 3.1 | `RERANKER_PROVIDER` env knob | `Config` | Class attr, default `ollama`, validated `{ollama, none}` | Adapter init |
| 3.2 | `RERANKER_MODEL` env knob | `Config` | Class attr, default `qwen2.5:3b` | — |
| 3.3 | `RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL` | `Config` | Class attr resolves at read time | — |
| 3.4 | `RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY` | `Config` | Class attr | — |
| 3.5 | Unknown value → `ValueError` | `_get_graphiti()` | `_ALLOWED_RERANKER_PROVIDERS` validation | Adapter init |
| 3.6 | Reads via `os.environ.get` only | `Config` | — | — |
| 4.1 | `none` keeps `_PassthroughReranker` | `_get_graphiti()` | Factory branch | Adapter init |
| 4.2 | Graph search remains functional under `none` | `_PassthroughReranker.rank` (unchanged) | — | — |
| 4.3 | INFO log announces selected provider | `_get_graphiti()` | `logger.info` line | Adapter init |
| 5.1 | WARNING log on rerank failure | `OllamaReranker.rank` | `logger.warning` with model + error class | Failure branch |
| 5.2 | No exception propagation to HTTP callers | `OllamaReranker.rank` (never raises) | — | — |
| 5.3 | Original order on whole-call failure | `OllamaReranker.rank` | Passthrough fallback inside method | Failure branch |
| 5.4 | `__init__` never raises | `OllamaReranker.__init__` | `AsyncOpenAI()` lazy I/O | Adapter init |
| 6.1 | `.env.example` documents the four vars | `.env.example` | — | — |
| 6.2 | `CLAUDE.md` lists the four vars | `CLAUDE.md` | — | — |
| 6.3 | `README.md` mentions `ollama pull <model>` | `README.md` | — | — |
| 6.4 | Old "follow-up" claim updated | `graphiti-neo4j-finalize/research.md` (or design.md) | — | — |
| 7.1 | Reranked order reaches `_GraphNamespace.search` | `OllamaReranker`, `_get_graphiti()` | Through Graphiti's own `search()` | End-to-end |
| 7.2 | No changes to report tools | n/a | n/a | — |
| 7.3 | `group_id` scoping unchanged | `_GraphNamespace.search` (unchanged) | — | — |
## Components and Interfaces
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|-----------|--------------|--------|--------------|--------------------------|-----------|
| `OllamaReranker` | Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.12.6, 5.15.4, 7.1 | `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1) | Service |
| `Config` (extended) | Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.13.6, 4.1 | `os.environ.get` (P0) | State (configuration) |
| `_get_graphiti()` (extended) | Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | `Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0) | Service |
| `.env.example`, `CLAUDE.md`, `README.md` | Docs | Communicate new knobs and Ollama prerequisite. | 6.16.4 | — | — |
---
### Backend / Services
#### `OllamaReranker`
| Field | Detail |
|-------|--------|
| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. |
| Requirements | 1.1, 1.4, 2.12.6, 5.15.4, 7.1 |
**Responsibilities & Constraints**
- Subclass `graphiti_core.cross_encoder.client.CrossEncoderClient`; implement only `rank`.
- Use `openai.AsyncOpenAI`; no second SDK; no top-level network I/O in `__init__`.
- Preserve passage strings byte-for-byte; never rewrite or truncate.
- Never raise from `rank()`. On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
- Deterministic scoring: `temperature=0.0`, no randomness in fallback scores.
- Thread-safety: stateless beyond the immutable `AsyncOpenAI` client and string config; safe under Graphiti's concurrent search.
**Dependencies**
- Inbound: `_get_graphiti()` — instantiates a single instance and passes it as `cross_encoder=` to `Graphiti(...)` (P0).
- Outbound: `Ollama /v1/chat/completions` via `openai.AsyncOpenAI` (P0).
- External: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai` SDK (P0).
**Contracts**: Service [x]
##### Service Interface
```python
class OllamaReranker(CrossEncoderClient):
def __init__(
self,
*,
model: str,
base_url: str,
api_key: str,
) -> None: ...
async def rank(
self,
query: str,
passages: list[str],
) -> list[tuple[str, float]]:
"""
Score each passage's relevance to `query` and return
`(passage, score)` tuples sorted in descending order of score.
Preconditions:
- `passages` is a (possibly empty) list of strings.
Postconditions:
- len(return) == len(passages).
- return is sorted by score descending.
- For all i, return[i][0] is byte-identical to one of the inputs.
- For any rank() call, this method does not raise.
Invariants:
- Successfully-parsed scores fall in [0.0, 1.0].
- Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
and are strictly less than every successfully-parsed score.
"""
```
**Implementation Notes**
- **Integration**: Constructed inside `_get_graphiti()` when `Config.RERANKER_PROVIDER == "ollama"`; injected into `Graphiti(..., cross_encoder=...)`.
- **Validation**:
- Reject empty `passages` immediately with `return []`.
- Clip parsed `score` to `[0.0, 1.0]`.
- Treat any uncaught per-passage exception as parse failure and assign deterministic fallback `-0.001 * passage_index`.
- Treat any whole-call exception (e.g. connection refused) as graceful degrade: return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]`.
- **Risks**: Default `qwen2.5:3b` must be `ollama pull`-ed by operators; documented in README. If absent, R5 path kicks in.
---
### Backend / Config
#### `Config` (extended)
| Field | Detail |
|-------|--------|
| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. |
| Requirements | 1.3, 3.13.6, 4.1 |
**Responsibilities & Constraints**
- Read from `os.environ.get` only; no new dependency.
- `RERANKER_PROVIDER` default `ollama`; valid values: `ollama`, `none`.
- `RERANKER_MODEL` default `qwen2.5:3b`.
- `RERANKER_BASE_URL` default = `EMBEDDING_BASE_URL` value at module load time.
- `RERANKER_API_KEY` default = `EMBEDDING_API_KEY` value at module load time.
- Validation of `RERANKER_PROVIDER` happens in `_get_graphiti()` (not `Config.validate()`) to keep the validate-at-boot list focused on credential presence.
**Contracts**: State [x]
##### State Management
- **State model**: Read-only class attributes resolved once at import.
- **Persistence & consistency**: None; values come from environment.
- **Concurrency strategy**: Immutable after import; safe.
**Implementation Notes**
- **Integration**: Defaults for `RERANKER_BASE_URL` / `RERANKER_API_KEY` should reference the corresponding `EMBEDDING_*` env vars (not the resolved `Config.EMBEDDING_BASE_URL` constant) so an operator setting only `EMBEDDING_BASE_URL` still gets the reranker pointed at the same Ollama host without needing to set `RERANKER_BASE_URL` explicitly. Implementation reads `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`.
- **Validation**: None at config-load time. Provider value is validated by `_get_graphiti()`.
- **Risks**: An operator who overrides `EMBEDDING_BASE_URL` but not `RERANKER_BASE_URL` will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.
---
### Backend / Adapter
#### `_get_graphiti()` (extended)
| Field | Detail |
|-------|--------|
| Intent | Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice. |
| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 |
**Responsibilities & Constraints**
- Preserve double-checked locking and singleton semantics exactly.
- Read `Config.RERANKER_PROVIDER` once at construction; do not re-read.
- For `ollama`: construct `OllamaReranker(model=..., base_url=..., api_key=...)`.
- For `none`: construct `_PassthroughReranker()` (current behavior preserved).
- For any other value: raise `ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")` — mirrors the existing `_ALLOWED_GRAPHITI_PROVIDERS` validation pattern.
- Log at INFO once: `f"Initializing Graphiti reranker (provider={provider})..."`.
**Contracts**: Service [x]
##### Service Interface
```python
def _get_graphiti() -> Graphiti:
"""Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""
```
**Implementation Notes**
- **Integration**: Replaces the unconditional `cross_encoder=_PassthroughReranker()` at `graphiti_adapter.py:156` with a `cross_encoder=_build_reranker(provider)` call. The factory helper lives next to `_build_llm_and_embedder` in the same file.
- **Validation**: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
- **Risks**: A typo such as `RERANKER_PROVIDER=Ollama` (capitalized) would raise; the helper lowercases the value before comparison, matching `_get_graphiti`'s existing `(... or "openai").lower()` pattern.
---
### Documentation
| File | Change | Requirements |
|------|--------|--------------|
| `.env.example` | Add commented block with the four `RERANKER_*` vars and their defaults. Position adjacent to the existing `EMBEDDING_*` block. | 6.1 |
| `CLAUDE.md` | Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`. | 6.2 |
| `README.md` | In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments. | 6.3 |
| `.kiro/specs/graphiti-neo4j-finalize/research.md` | Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 |
> README also has `README-EN.md` and `README-ZH.md` — the canonical user-facing README is `README.md` per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.
## Data Models
Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the `list[tuple[str, float]]` already defined by `CrossEncoderClient.rank`.
## Error Handling
### Error Strategy
- **Construction errors**: None possible (no network in `__init__`; no required keys to validate).
- **Per-passage errors**: Caught inside `OllamaReranker.rank`. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
- **Whole-call errors** (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost `try/except` in `rank`. Logged at WARNING with model name and error class. Returns `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` — same shape as `_PassthroughReranker` so consumers cannot tell the difference structurally.
- **Configuration errors**: `_get_graphiti()` raises `ValueError` at startup if `RERANKER_PROVIDER` is unknown. The Flask app fails to boot — preferred over silent misconfiguration.
### Error Categories and Responses
| Category | Trigger | Response |
|----------|---------|----------|
| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. |
| User input (4xx-equivalent) | Unknown `RERANKER_PROVIDER` value | `ValueError` at startup; clear message naming allowed values. |
| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. |
### Monitoring
- INFO log at startup states the selected provider.
- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
- No metrics surface yet; can be added if the reranker becomes a hot path.
## Testing Strategy
This project intentionally keeps the test surface minimal (`backend/scripts/test_profile_format.py` is the lone pytest target). Per `steering/tech.md`, do **not** add a heavy test harness.
- **Unit-level verification** (manual, by the implementer, no committed test files unless small and clearly worth keeping):
1. Constructing `OllamaReranker` with a bad host does not raise; first `rank()` call logs WARNING and returns passthrough output.
2. `rank(query, [])` returns `[]` and does not call the client.
3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
- **Integration smoke** (manual): With `qwen2.5:3b` pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. `RERANKER_PROVIDER=none`.
- **Boundary verification**: Grep that `gpt-4.1-nano` and `OpenAIRerankerClient` do not appear in any new code path.
## Supporting References
- `research.md` — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
- `gap-analysis.md` — Requirement-to-asset map.
- `.ticket/39.md` — Source ticket text.

View File

@ -0,0 +1,111 @@
# Implementation Gap Analysis — graphiti-ollama-reranker
## 1. Current State Investigation
### Domain Assets
| Asset | Location | Current behavior |
|-------|----------|------------------|
| `_PassthroughReranker` | `backend/app/services/graphiti_adapter.py:38-51` | Subclass of `graphiti_core.cross_encoder.client.CrossEncoderClient`. `rank(query, passages)` returns `(passage, 1.0 - 0.01 * i)` tuples in input order — no model call. |
| Graphiti factory | `backend/app/services/graphiti_adapter.py:142-162` (`_get_graphiti`) | Double-checked-locking singleton. Branches on `Config.GRAPHITI_LLM_PROVIDER` (`openai` / `gemini`). Always injects `_PassthroughReranker()` as `cross_encoder`. Runs `g.build_indices_and_constraints()` on the persistent event loop. |
| LLM/embedder builder | `backend/app/services/graphiti_adapter.py:92-139` (`_build_llm_and_embedder`) | Lazy-imports provider-specific Graphiti classes. Reads `Config.LLM_*` and `Config.EMBEDDING_*`. |
| Config surface | `backend/app/config.py:33-53` | Single class with class attrs; each is `os.environ.get('KEY', 'default')`. Has `EMBEDDING_MODEL`, `EMBEDDING_BASE_URL`, `EMBEDDING_API_KEY` defaults aligned with local Ollama. |
| Graph-search callers | `_GraphNamespace.search` at `graphiti_adapter.py:488-517`; consumed by `zep_tools.py:491` (`ZepToolsService.search_graph`) and `oasis_profile_generator.py:313, 337`. | All call sites already dropped the misleading `reranker=` kwarg in `graphiti-neo4j-finalize`. They invoke `client.graph.search(graph_id, query, limit, scope)` only. |
| Existing LLM wrapper | `backend/app/utils/llm_client.py` | Uses synchronous `OpenAI()` client. Includes reasoning-model `<think>` stripping and a JSON-mode retry. Not directly relevant to the reranker but documents the in-house OpenAI-SDK pattern. |
| Async-loop helper | `graphiti_adapter.py:54-79` (`_get_loop`, `_run`) | Persistent dedicated event-loop thread used for all Graphiti async calls. The reranker's `rank` is **already** awaited by Graphiti itself, not by `_run`, so the new client can use plain `await` on `openai.AsyncOpenAI`. |
### Conventions Observed
- 4-space indent, snake_case, double quotes; English + Chinese mixed in comments — preserve both styles.
- New env vars go into `backend/app/config.py` as class attrs reading from `os.environ.get` with a sensible default. Validation is centralized in `Config.validate()`.
- New backend modules live under `backend/app/services/` with module-level `logger = get_logger('mirofish.<topic>')`.
- The OpenAI SDK is the only LLM client. New providers do not add a second SDK — they add a base-URL + model knob.
- No tests for graph code beyond `scripts/test_profile_format.py`; the project explicitly discourages adding a heavy test harness.
### Integration Surfaces
- **Upstream contract**: `CrossEncoderClient` is consumed by `graphiti_core` during `Graphiti.search()` execution; the framework calls `await reranker.rank(query, passages)` on whatever event loop the caller is using.
- **Inbound integration**: only one wire point — the `cross_encoder=` kwarg on `Graphiti(...)` in `_get_graphiti()` (`graphiti_adapter.py:156`).
- **Outbound integration**: the reranker calls Ollama via `http://localhost:11434/v1/chat/completions` (OpenAI-compatible). Already proven by `EMBEDDING_BASE_URL` for embeddings; Ollama's chat endpoint follows the same surface.
## 2. Requirements Feasibility Analysis
### Requirement-to-Asset Map
| Requirement | Existing assets | New assets needed | Gap tag |
|-------------|-----------------|-------------------|---------|
| R1: Default is Ollama, not OpenAI default | `_get_graphiti()` already injects an explicit reranker (no default fallthrough). | Switch the injected client class based on `RERANKER_PROVIDER`. | Missing (selection logic). |
| R2: Real `CrossEncoderClient` calling Ollama via OpenAI SDK | Pattern proven in `llm_client.py`; `openai` already in `pyproject.toml`. | New `OllamaReranker` class — subclass of `CrossEncoderClient`, uses `openai.AsyncOpenAI` for `rank()`. | Missing. |
| R3: Env knobs (`RERANKER_PROVIDER/MODEL/BASE_URL/API_KEY`) | Config pattern is established (`EMBEDDING_*` etc.). | Four new `Config` attrs, with defaults falling back to embedding settings where stated. | Missing. |
| R4: `none` provider preserves passthrough | `_PassthroughReranker` already exists. | Branch in `_get_graphiti()` to pick passthrough when provider == `none`. | Missing (small). |
| R5: Graceful degradation when Ollama is down | `_GraphNamespace.search` (lines 515-517) already catches all exceptions and returns empty results with a warning log. | Reranker `rank` must catch its own network/parse errors, log them, and return the original passages with synthetic scores so search still returns *something*. | Missing (within new class). |
| R6: Docs (`.env.example`, `CLAUDE.md`, README) | Existing docs already document `EMBEDDING_*` in three places — pattern is clear. | Add 4 new env lines + Ollama pull note. | Missing (text). |
| R7: Report tools get reranked output transparently | `_GraphNamespace.search` is the single chokepoint already used by all 4 tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`). | None — wiring change in factory propagates automatically. | None (verification only). |
### Constraints
- **Async contract**: `CrossEncoderClient.rank` is `async def`. The new client must be async. The OpenAI SDK provides `openai.AsyncOpenAI` for this.
- **Ollama model output shape**: A small chat model (`qwen2.5:3b`, `llama3.2:3b`) can be prompted to emit a numeric score; we cannot rely on `logprobs` because Ollama's OpenAI-compatible surface does not always expose `logprobs`/`logit_bias` consistently. Therefore the scoring strategy is "ask the model for a 010 (or 01) relevance score per passage and parse it from the text response."
- **No new dependency** allowed. Reranker must reuse `openai` SDK (already installed) — confirmed in `backend/.venv/lib/python3.13/site-packages/openai-2.35.1.dist-info/`.
- **Boot must not fail** when Ollama is unreachable (R5.4). Construction is cheap (build an `AsyncOpenAI` client; no network call). The model availability check happens lazily on first `rank()`.
### Complexity Signals
- Mostly a **single file plus config plus docs** change. Algorithmic logic is local to the new class (prompt + parse). No data model changes, no API surface changes, no UI changes.
### Research Needed (Carry into Design)
- **Model choice**: pick a small Ollama chat model that (a) is widely pulled, (b) reliably emits a numeric score in a 12 token answer, (c) is small enough to run on a typical dev machine. Candidates: `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Design phase will fix the default.
- **Scoring strategy**: per-passage call (N calls per query, simple to parse) vs. batched single-call (one prompt with all passages, harder to align). The per-passage approach is simpler and parallelizable via `asyncio.gather`; latency is bounded by the slowest passage. Design will fix the strategy.
- **Output parsing**: prefer JSON output (`{"score": 0.83}`) with markdown-fence stripping (project convention from `llm_client.chat_json`); fall back to regex-extract first float on parse failure.
## 3. Implementation Approach Options
### Option A — Extend `graphiti_adapter.py` In Place
Add the `OllamaReranker` class directly to `graphiti_adapter.py` next to `_PassthroughReranker`, and branch in `_get_graphiti()`.
- **Trade-offs**:
- ✅ Same module owns all reranker wiring and the singleton; one file to read.
- ✅ Smallest diff; matches the file's existing role as "everything Graphiti".
- ❌ Adds prompt/parse logic to an already long (≈545-line) adapter module.
- ❌ Harder to reuse the reranker outside Graphiti (unlikely, but precludes it).
### Option B — Separate Module `backend/app/services/ollama_reranker.py`
New module owns the class and its prompt/parse helpers; `graphiti_adapter.py` imports it and selects it in `_get_graphiti()`.
- **Trade-offs**:
- ✅ Clear single-responsibility module; mirrors the structure suggested in the source ticket #39.
- ✅ Adapter file stays focused on wiring; reranker can be unit-tested in isolation if testing is later added.
- ❌ Slightly more navigation; one extra file in `services/`.
- ❌ The provider-selection branch still lives in the adapter, so two files must agree on the provider string.
### Option C — Hybrid: Provider Registry
Introduce a small `_RERANKER_PROVIDERS` map (`"ollama" -> _build_ollama_reranker`, `"none" -> _PassthroughReranker`) inside `graphiti_adapter.py`, with the actual class still living in a separate `ollama_reranker.py`.
- **Trade-offs**:
- ✅ Adding a future provider (e.g. `sentence_transformers`) is a one-line registry change.
- ✅ Keeps reranker class out of the adapter.
- ❌ Slight over-engineering for two providers (`ollama` + `none`); ticket #39 explicitly scopes only the Ollama path.
## 4. Implementation Complexity & Risk
- **Effort**: **S (13 days)**
- One new class (~80120 lines), four new config attrs (~10 lines), one factory branch (~10 lines), three doc updates (~30 lines). No schema or API changes.
- **Risk**: **Low**
- Established patterns (config, OpenAI SDK, logger).
- `_PassthroughReranker` is preserved exactly for the `none` fallback, so the worst-case behavior is identical to today.
- The graceful-failure path (R5) requires care, but the existing `_GraphNamespace.search` exception handling already insulates HTTP callers from reranker errors.
## 5. Recommendations for Design Phase
- **Preferred approach**: **Option B (separate `ollama_reranker.py` module)**. Best alignment with #39's "implement in `backend/app/services/`", keeps `graphiti_adapter.py` focused on Graphiti wiring, and matches the project's "one concern per module" pattern in `services/`.
- **Key decisions to lock in design**:
1. Default `RERANKER_MODEL` value (recommend `qwen2.5:3b` — small, broadly available on Ollama, reliable at structured short outputs).
2. Per-passage scoring strategy with `asyncio.gather` parallelism (simpler, deterministic).
3. Prompt + parse format: ask for JSON `{"score": <0.0..1.0>}`, strip fences, regex-fallback to first float.
4. Failure mode for a single passage: assign deterministic low score (e.g. `0.0 - 0.001 * i`) so passage still appears once.
5. Failure mode for whole `rank()` call: log warning, return original-order tuples with passthrough scores (no exception bubbles up).
6. Update `.kiro/specs/graphiti-neo4j-finalize/research.md` "follow-up" note to point at this spec (R6.4).
- **Research items carried forward**:
- Confirm `qwen2.5:3b` produces stable JSON scores in benchmark prompts (or pick alternative).
- Decide whether to expose `RERANKER_MAX_PARALLEL` for concurrency limit (default `len(passages)` — likely small, ≤10).

View File

@ -0,0 +1,95 @@
# Requirements Document
## Project Description (Input)
Replace the no-op `_PassthroughReranker` in `backend/app/services/graphiti_adapter.py` with a real reranker that uses an Ollama-available model, so Graphiti search results are properly reranked for the SearchResult / InsightForge / Panorama / Interview report tools. Add `RERANKER_PROVIDER` / `RERANKER_MODEL` / `RERANKER_BASE_URL` env knobs (defaults: ollama / a small Ollama chat model / EMBEDDING_BASE_URL), keep `_PassthroughReranker` only when `RERANKER_PROVIDER=none`, and update `.env.example`, `CLAUDE.md`, and the README accordingly. Source ticket: #39 (.ticket/39.md).
## Introduction
The Graphiti adapter currently injects a `_PassthroughReranker` into the `Graphiti(...)` constructor to bypass the upstream default (`OpenAIRerankerClient` with a hard-coded `gpt-4.1-nano` and OpenAI-specific `logprobs`/`logit_bias`), which would 401 against Qwen/Dashscope keys and is unavailable through Ollama. The passthrough is a no-op: it returns passages in original order with synthetic descending scores, so search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are not actually reranked.
This feature replaces the no-op with a real reranker backed by a model available through the local Ollama stack (matching the existing `EMBEDDING_MODEL=mxbai-embed-large` precedent). A small set of environment variables makes the provider, model, and endpoint overridable. An explicit `none` provider preserves the passthrough behavior for CI / lightweight setups that cannot pull the reranker model.
## Boundary Context
- **In scope**:
- A new `CrossEncoderClient` implementation in `backend/app/services/` that scores passages against a query by calling an Ollama model through its OpenAI-compatible endpoint.
- New `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY` settings in `backend/app/config.py`, with sensible Ollama defaults.
- Provider selection inside `_get_graphiti()` so `ollama` selects the new client and `none` keeps `_PassthroughReranker`.
- Documentation updates in `.env.example`, `CLAUDE.md` (Required Environment Variables), and the project `README.md` (Ollama prerequisites).
- Graceful failure when the configured reranker model is not pulled (clear error, no Flask crash; graph search either falls back to original order or surfaces a logged warning consistent with the existing `_GraphNamespace.search` exception path).
- **Out of scope**:
- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
- Building OpenAI-only or Dashscope-only reranker clients; this spec is specifically the Ollama path (plus the `none` escape hatch).
- Upstream changes to `graphiti-core`.
- Adding any non-Python reranker library (e.g. `sentence-transformers`); the new client must reuse the OpenAI SDK already in the dependency set.
- **Adjacent expectations**:
- `graphiti_adapter._get_graphiti()` continues to be the single Graphiti factory; the new reranker must be wired through it, not at call sites.
- All Graphiti reads remain scoped by `group_id` — the reranker operates on passages already filtered per project; it does not change isolation rules.
- The reranker integrates with `_GraphNamespace.search`, which is the path used by `SearchResult`, `InsightForge`, `Panorama`, and `Interview` tools; behavior changes propagate to those tools automatically and do not need per-tool code changes.
## Requirements
### Requirement 1: Default reranker is Ollama-backed, not the OpenAI default
**Objective:** As a backend developer running MiroFish against the default local Ollama stack, I want Graphiti to rerank search results without requiring an OpenAI key, so that report-tool relevance reflects a real model and not an arbitrary insertion order.
#### Acceptance Criteria
1. The Graphiti Adapter shall instantiate Graphiti with a non-passthrough `CrossEncoderClient` whenever `RERANKER_PROVIDER` resolves to `ollama` (the default).
2. The Graphiti Adapter shall not depend on `graphiti_core.cross_encoder.openai_reranker_client.OpenAIRerankerClient` for the default code path.
3. When `RERANKER_PROVIDER` is unset, the Graphiti Adapter shall behave as if `RERANKER_PROVIDER=ollama`.
4. The Graphiti Adapter shall not reference the model name `gpt-4.1-nano` in any reranker code path.
### Requirement 2: Ollama-backed reranker scores passages via an OpenAI-compatible chat endpoint
**Objective:** As a backend developer, I want a reranker that talks to a locally hosted model so that the local-first stack stays self-contained and no remote LLM key is required.
#### Acceptance Criteria
1. The Ollama Reranker shall expose a class that subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements the asynchronous `rank(query, passages) -> list[tuple[passage, score]]` contract.
2. The Ollama Reranker shall call its configured chat-completions endpoint through the `openai` SDK using `RERANKER_BASE_URL` and `RERANKER_API_KEY`, so no second SDK is introduced.
3. The Ollama Reranker shall return passages sorted by descending score (highest relevance first) with one score per input passage.
4. When `passages` is empty, the Ollama Reranker shall return an empty list without issuing any model call.
5. The Ollama Reranker shall preserve passage strings byte-for-byte; it shall not rewrite, truncate, or reorder content within an individual passage.
6. If the model response cannot be parsed into a numeric score for a passage, the Ollama Reranker shall assign that passage a deterministic fallback score lower than every successfully-parsed score so the passage still appears in the output exactly once.
### Requirement 3: Reranker is configurable via environment variables
**Objective:** As an operator deploying MiroFish, I want to override the reranker provider, model, and endpoint via environment variables so that I can target a different Ollama host, a different model, or disable reranking entirely.
#### Acceptance Criteria
1. The Configuration module shall expose `RERANKER_PROVIDER` with default `ollama` and accept the values `ollama` and `none`.
2. The Configuration module shall expose `RERANKER_MODEL` whose default is a small Ollama-available chat model selected during design (e.g. `qwen2.5:3b` or `llama3.2:3b`).
3. The Configuration module shall expose `RERANKER_BASE_URL` whose default is the value of `EMBEDDING_BASE_URL` (so the same Ollama host is reused by default).
4. The Configuration module shall expose `RERANKER_API_KEY` whose default is the value of `EMBEDDING_API_KEY` (so Ollama's ignored-token default `ollama` works without explicit configuration).
5. If `RERANKER_PROVIDER` is set to a value other than `ollama` or `none`, the Graphiti Adapter shall raise a clear `ValueError` at startup naming the offending value and listing accepted values.
6. The Configuration module shall read all four reranker variables from the process environment via the same `os.environ.get` pattern used by the surrounding settings, with no additional dependencies.
### Requirement 4: `none` provider preserves the passthrough fallback for CI / lightweight setups
**Objective:** As a developer running tests or a slim container that cannot pull the reranker model, I want to disable reranking explicitly so the Flask app still boots and graph search still works.
#### Acceptance Criteria
1. Where `RERANKER_PROVIDER=none`, the Graphiti Adapter shall continue to inject `_PassthroughReranker` and shall not attempt any model call at startup.
2. While `RERANKER_PROVIDER=none`, graph search shall return results in the order Graphiti supplies them with the existing synthetic-descending-score behavior.
3. The Graphiti Adapter shall log at INFO level the selected reranker provider during initialization so operators can confirm whether reranking is active.
### Requirement 5: Graceful degradation when the configured Ollama model is unreachable
**Objective:** As an operator who forgot to run `ollama pull <model>` (or whose Ollama service is down), I want the Flask backend to keep serving requests with a clear log signal rather than crashing.
#### Acceptance Criteria
1. If the Ollama Reranker fails to score passages for a given query (e.g. connection refused, 404 model not found, timeout, or unparseable response), the Graphiti Adapter shall log a warning that names the failing model and the error class.
2. If the Ollama Reranker raises during a `rank` call, the calling `_GraphNamespace.search` shall not propagate the exception to HTTP callers; existing search-error handling already swallows reranker errors into a logged warning, and this behavior shall be preserved.
3. When the Ollama Reranker fails for a query, the rerank-failure path shall return the passages in their original Graphiti order so search remains functional.
4. The Ollama Reranker shall not raise during construction (i.e. `_get_graphiti()` must succeed even if the Ollama service is unavailable); failures are deferred until the first `rank` call.
### Requirement 6: Documentation reflects the new reranker configuration
**Objective:** As a new contributor reading the docs, I want the reranker env vars, defaults, and prerequisites documented in the same places the other LLM/embedder settings live so configuration is discoverable.
#### Acceptance Criteria
1. The Environment Example file (`.env.example`) shall include entries for `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY`, each commented with its default and accepted values.
2. The CLAUDE.md document shall list the four reranker variables in its "Required Environment Variables" section with the same level of detail used for `EMBEDDING_MODEL`.
3. The README.md document shall mention the `ollama pull <reranker model>` prerequisite alongside the existing `ollama pull mxbai-embed-large` note (or wherever Ollama setup is documented).
4. Where the `.kiro/specs/graphiti-neo4j-finalize` documents state that the reranker is a passthrough no-op, those documents shall either be updated to point at this spec or left untouched (decided in design); the constraint is that no documentation shall continue to claim "a real per-provider reranker is a follow-up" once this spec is implemented.
### Requirement 7: Report-tool integration verifies reranked output reaches consumers
**Objective:** As a developer using the ReportAgent tools, I want `SearchResult`, `InsightForge`, `Panorama`, and `Interview` to receive properly reranked edges/nodes so their report output reflects model-judged relevance, not Graphiti's hybrid-search ordering alone.
#### Acceptance Criteria
1. When `RERANKER_PROVIDER=ollama` is active and the configured model is available, the `_GraphNamespace.search` shall return passages whose order is determined by the Ollama Reranker, not Graphiti's default RRF ordering.
2. The ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) shall require no changes for this feature; the rerank improvement reaches them transparently through `_GraphNamespace.search`.
3. While the Ollama Reranker is active, the per-project `group_id` scoping of all Graphiti queries shall remain unchanged.

View File

@ -0,0 +1,112 @@
# Research & Design Decisions — graphiti-ollama-reranker
## Summary
- **Feature**: `graphiti-ollama-reranker`
- **Discovery Scope**: Extension (one new service module + factory branch + config + docs).
- **Key Findings**:
- `CrossEncoderClient.rank(query, passages) -> list[tuple[str, float]]` is the only abstract contract Graphiti requires of the reranker. The existing `_PassthroughReranker` already exercises this contract correctly.
- Ollama's OpenAI-compatible `/v1/chat/completions` endpoint does not reliably expose `logprobs` / `logit_bias`, so Graphiti's default OpenAI scoring approach (binary YES/NO over token logits) cannot be ported. The reranker must use **prompted numeric scoring** with text-output parsing.
- The `openai` SDK already shipped in `backend/.venv` (v2.35.1) exposes `AsyncOpenAI`, which is the right client for the async `rank()` method without introducing any new dependency.
## Research Log
### Graphiti's `CrossEncoderClient` contract
- **Context**: Need to confirm the precise shape of the `rank` interface and any other abstract members.
- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:38-51` (`_PassthroughReranker`); `.kiro/specs/graphiti-neo4j-finalize/research.md` and `gap-analysis.md` (which captured the upstream contract on first integration); ticket #39 narrative.
- **Findings**:
- `_PassthroughReranker` subclasses `CrossEncoderClient` and only overrides `async def rank(query: str, passages: list[str]) -> list[tuple[str, float]]`.
- Graphiti's internal call site (`graphiti_core/graphiti.py:154`) constructs the reranker once and calls `rank` per search. There is no separate batch interface to satisfy.
- Passages are short text snippets (entity-edge facts / node summaries). Typical N per search ≤ 10 (limit defaulted in `_GraphNamespace.search`).
- **Implications**: A drop-in subclass that implements `rank` is sufficient. No additional abstract methods to wire.
### Ollama OpenAI-compatible scoring surface
- **Context**: Decide how to obtain a relevance score per passage from a small Ollama-served chat model.
- **Sources Consulted**: Project-internal `backend/app/utils/llm_client.py` (uses `openai.OpenAI` + `chat.completions.create` against Dashscope / OpenAI / Ollama uniformly); ticket #39 "Proposed approach" section enumerating Ollama chat-model scoring vs. embedding cosine.
- **Findings**:
- Ollama supports `/v1/chat/completions` for chat models like `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Pulling a model is required (`ollama pull <model>`).
- JSON-mode (`response_format={"type": "json_object"}`) is honored by recent Ollama versions but not universally; project convention is to fall back gracefully (cf. `LLMClient.chat_json`).
- Embedding-cosine reranker is feasible (re-embed query and passages with `mxbai-embed-large`) but produces a weaker ordering signal than an LLM that can reason about the question. Picking LLM scoring matches the ticket's preferred path.
- **Implications**:
- Use a chat-completion call per passage with a deterministic temperature (0.0) and a tight system prompt asking for a JSON score in [0.0, 1.0].
- Parse with the same defensive strategy used elsewhere: strip `<think>` blocks, strip markdown fences, attempt `json.loads`, regex-fallback to first float, deterministic low score on hard failure.
### Concurrency strategy
- **Context**: Decide between per-passage parallel calls vs. one batched call.
- **Findings**:
- Per-passage with `asyncio.gather` is simpler to align outputs and resilient — a single bad output only loses one passage's score.
- Single batched prompt requires the model to emit aligned scores (often by index); LLMs occasionally drop entries or misorder them, demanding additional validation.
- With typical `limit ≤ 10`, parallel per-passage calls hit Ollama briefly; on a 3B model this is < 5s for 10 passages.
- **Implications**: Default to per-passage `asyncio.gather`. Expose no extra concurrency knob initially (avoid premature configuration surface; YAGNI per project guidelines).
### Failure semantics
- **Context**: Required by R5 — Flask must keep serving on Ollama outage, and graph search should remain functional.
- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:515-517` (`_GraphNamespace.search` swallows all exceptions and logs a warning); `_get_graphiti()` runs once at first call.
- **Findings**:
- Construction of an `openai.AsyncOpenAI` client does not perform any network I/O. Therefore `OllamaReranker.__init__` can be safe at startup even when Ollama is down.
- If `rank()` itself raises, the upstream `Graphiti.search` may surface the exception. The new reranker should therefore catch its own errors and degrade to passthrough behavior in-method rather than relying on the outer `try/except` in `_GraphNamespace.search`.
- **Implications**: `OllamaReranker.rank` should never raise. On exception or unparseable output it returns the input passages in the original order with passthrough-style synthetic scores and emits a single WARNING log per failure (rate-limited by intent: one log per rank() call).
## Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|--------|-------------|-----------|---------------------|-------|
| A: Add class to `graphiti_adapter.py` | Define `OllamaReranker` next to `_PassthroughReranker` in the same file. | Minimal diff; single file to read. | Bloats an already-long adapter; mixes wiring with provider-specific logic. | — |
| B: New `services/ollama_reranker.py` module | Dedicated module owns prompt + parse + async client; adapter only selects it. | Single-responsibility module; matches ticket suggestion; reusable in isolation. | One extra import in adapter. | **Selected.** Aligns with project pattern of one concern per `services/*` file. |
| C: Hybrid provider registry | Map `RERANKER_PROVIDER → builder` in adapter; class still in B's module. | Future providers are a one-line registry change. | Over-engineering for two providers (`ollama` + `none`). | Deferred until a third provider is needed. |
## Design Decisions
### Decision: Provider selected via env var, branch lives in `_get_graphiti()`
- **Context**: R3 requires env-driven provider selection; only two values supported by this spec (`ollama` and `none`).
- **Alternatives Considered**:
1. Function-pointer registry (Option C).
2. Inline `if/else` in the factory selecting one of two classes.
- **Selected Approach**: Inline branch in `_get_graphiti()` reads `Config.RERANKER_PROVIDER`, picks `_build_ollama_reranker()` or `_PassthroughReranker()`, validates unknown values with a `ValueError` matching the existing `_ALLOWED_GRAPHITI_PROVIDERS` convention.
- **Rationale**: Mirrors the established `GRAPHITI_LLM_PROVIDER` validation pattern (`_ALLOWED_GRAPHITI_PROVIDERS`) without adding speculative abstraction. Two values, two branches.
- **Trade-offs**: Adding a third provider later costs one more `elif`; acceptable.
- **Follow-up**: Surface the selected provider in the INFO startup log so operators can confirm.
### Decision: Per-passage scoring with `asyncio.gather`, no concurrency knob
- **Context**: R2.3 requires one score per passage in descending order; R5 requires graceful per-call failure.
- **Alternatives Considered**:
1. Single batched prompt with index-aligned output.
2. Per-passage call with bounded `Semaphore`.
- **Selected Approach**: Per-passage `asyncio.gather` with no explicit limit; rely on default `limit ≤ 10` in `_GraphNamespace.search`.
- **Rationale**: Simple, deterministic, isolates per-passage failures. Avoids premature configuration knob.
- **Trade-offs**: If a future caller asks for `limit=100`, Ollama may queue 100 requests; acceptable for now because no caller does this.
- **Follow-up**: If real-world rerank latency becomes a concern, add `RERANKER_MAX_PARALLEL` then.
### Decision: Default model = `qwen2.5:3b`
- **Context**: Need a small, broadly-available Ollama chat model that reliably emits a numeric score in 12 tokens.
- **Alternatives Considered**:
1. `qwen2.5:3b` (Apache-2.0, 3B params, strong instruction following).
2. `llama3.2:3b` (Llama community license, 3B).
3. `phi3:3.8b` (MIT, 3.8B).
- **Selected Approach**: `qwen2.5:3b`.
- **Rationale**: Matches the Qwen-family alignment of the rest of the project (`qwen-plus` is the documented LLM default). Apache-2.0 license is permissive. Small enough for typical dev machines.
- **Trade-offs**: Operators on systems without `qwen2.5:3b` must `ollama pull qwen2.5:3b` or override `RERANKER_MODEL`.
- **Follow-up**: README will document `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large` step.
### Decision: Defensive output parsing (`json.loads` → regex float → deterministic low score)
- **Context**: R2.6 requires deterministic handling of unparseable model responses.
- **Selected Approach**:
1. Strip `<think>...</think>` blocks (project convention from `llm_client.py:64`).
2. Strip markdown fences (project convention from `llm_client.chat_json`).
3. `json.loads` and read `score` (float in `[0, 1]`, clipped on out-of-range).
4. On JSON failure, regex-extract the first float token; clip to `[0, 1]`.
5. On total failure, assign `0.0 - 0.001 * passage_index` (deterministic and below any successfully-parsed score).
- **Rationale**: Reuses patterns already in the codebase. Keeps every passage in the output (R2.6).
- **Trade-offs**: One failed parse silently downranks a passage; logged at DEBUG (not WARNING) to avoid log spam.
## Risks & Mitigations
- **Risk**: Ollama service is not running on startup → boot must not fail. **Mitigation**: Construct only `AsyncOpenAI` (no network call) during `__init__`. Defer connectivity to first `rank()`. R5.4.
- **Risk**: Model is not pulled → `rank()` raises 404 from Ollama. **Mitigation**: Catch within `rank()`, log WARNING naming model + error class, return passthrough-ordered tuples so search still works. R5.1, R5.3.
- **Risk**: Operator misconfigures `RERANKER_PROVIDER` to an unknown value → silent fallthrough to wrong reranker. **Mitigation**: `_get_graphiti()` raises `ValueError` listing allowed values, mirroring `_ALLOWED_GRAPHITI_PROVIDERS`. R3.5.
- **Risk**: Multiple concurrent `rank()` calls overwhelm a small local Ollama daemon. **Mitigation**: Accept default Graphiti `limit ≤ 10`; document `RERANKER_MAX_PARALLEL` as a future follow-up if needed.
## References
- `backend/app/services/graphiti_adapter.py:38-51` — current passthrough reranker contract.
- `backend/app/services/graphiti_adapter.py:142-162` — current `_get_graphiti()` wiring point.
- `backend/app/utils/llm_client.py` — project pattern for OpenAI-SDK chat + JSON parsing + reasoning-block stripping.
- `.kiro/specs/graphiti-neo4j-finalize/research.md` — historical context for why the passthrough was introduced.
- Ticket `#39` in `.ticket/39.md` — feature brief and acceptance criteria.

View File

@ -0,0 +1,23 @@
{
"feature_name": "graphiti-ollama-reranker",
"created_at": "2026-05-11T10:24:16Z",
"updated_at": "2026-05-11T10:45:00Z",
"language": "en",
"phase": "tasks-generated",
"approvals": {
"requirements": {
"generated": true,
"approved": true
},
"design": {
"generated": true,
"approved": true
},
"tasks": {
"generated": true,
"approved": true
}
},
"ready_for_implementation": true,
"ticket": 39
}

View File

@ -0,0 +1,89 @@
# Implementation Plan
> Foundation tasks introduce the four `RERANKER_*` configuration knobs.
> Core tasks add the new `OllamaReranker` and the factory selection branch.
> Integration tasks wire documentation parity.
> Validation closes the loop with a structural sweep.
## Foundation
- [x] 1. Add reranker configuration surface
- [x] 1.1 Introduce four `RERANKER_*` settings on the `Config` class
- Add `RERANKER_PROVIDER` with default `ollama`, read via `os.environ.get('RERANKER_PROVIDER', 'ollama')`.
- Add `RERANKER_MODEL` with default `qwen2.5:3b`, read via `os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')`.
- Add `RERANKER_BASE_URL` with default that chains to the embedding host: `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`. Do not reference `Config.EMBEDDING_BASE_URL` directly; use the env-lookup form so behaviour stays consistent under reload patterns.
- Add `RERANKER_API_KEY` with default that chains to the embedding key the same way (`os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))`).
- Do not add the reranker to `Config.validate()`; the provider has no mandatory credentials.
- Observable completion: a Python REPL that imports `Config` shows the four attributes with the documented defaults, and overriding `EMBEDDING_BASE_URL` in the environment is visible on `Config.RERANKER_BASE_URL` too.
- _Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6_
## Core
- [x] 2. Implement the Ollama-backed reranker
- [x] 2.1 Create the new reranker module with the `CrossEncoderClient` subclass
- Define a new module under `backend/app/services/` that hosts the reranker class. The class subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements only the async `rank` method.
- Constructor accepts `model`, `base_url`, `api_key` as keyword arguments; it instantiates `openai.AsyncOpenAI(base_url=..., api_key=...)` but performs no network I/O so the Flask app can boot when Ollama is unreachable.
- `rank(query, passages)` short-circuits on empty `passages` and returns `[]` without any model call.
- For each passage, send a single chat-completion request with `temperature=0.0` and a deterministic system prompt asking for a JSON object `{"score": <0.0..1.0>}` describing the passage's relevance to the query. Use `asyncio.gather` to run all per-passage requests concurrently.
- Parse each model response defensively: strip any `<think>...</think>` block, strip markdown code fences, attempt `json.loads`, fall back to regex-extract the first floating-point number, clip the value to `[0.0, 1.0]`. On any per-passage failure, assign a deterministic fallback score of `-0.001 * passage_index` and log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome.
- Wrap the whole call in a `try/except`. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` so search remains functional. The method must not raise.
- Sort the returned list by score descending before returning.
- Observable completion: instantiating the new class with a deliberately bad `base_url` does not raise; an async call to `rank("q", [])` returns `[]`; an async call with two non-empty passages against a reachable Ollama returns two `(passage, float)` tuples in descending-score order, with every input passage byte-identical in the output.
- _Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1_
- _Boundary: OllamaReranker module_
## Integration
- [x] 3. Wire the new reranker into the Graphiti factory
- [x] 3.1 Select the reranker inside `_get_graphiti()` based on `Config.RERANKER_PROVIDER`
- Introduce a small allow-list constant alongside `_ALLOWED_GRAPHITI_PROVIDERS` enumerating `("ollama", "none")`.
- Read `Config.RERANKER_PROVIDER`, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raise `ValueError` with a message that names the offending value and lists the accepted values — same shape as the existing `GRAPHITI_LLM_PROVIDER` validation.
- For `ollama`, construct the new `OllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY)` and pass it as the `cross_encoder=` argument to `Graphiti(...)`.
- For `none`, continue to pass `_PassthroughReranker()` as today; do not change the passthrough class.
- Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log).
- Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime.
- Observable completion: with `RERANKER_PROVIDER` unset, app startup logs `Initializing Graphiti reranker (provider=ollama)...` and Graphiti is constructed with the `OllamaReranker`. With `RERANKER_PROVIDER=none`, the log reports `none` and Graphiti uses `_PassthroughReranker`. With `RERANKER_PROVIDER=banana`, `_get_graphiti()` raises `ValueError` listing `('ollama', 'none')`.
- _Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3_
- _Depends: 1.1, 2.1_
- [ ] 4. Update operator-facing documentation
- [ ] 4.1 (P) Add the new env knobs to `.env.example` *(deferred — sandbox hook blocks all `.env*` access; see HANDOFF.md)*
- Insert a four-line `RERANKER_*` block adjacent to the existing `EMBEDDING_*` block, mirroring the comment style (default, accepted values, and a one-line note that `RERANKER_PROVIDER=none` disables reranking).
- Observable completion: opening `.env.example` shows the four new variables with documented defaults, positioned next to the embedding block.
- _Requirements: 6.1_
- _Boundary: .env.example_
- _Depends: 1.1_
- [x] 4.2 (P) Extend the `Required Environment Variables` snippet in `CLAUDE.md`
- Add the four `RERANKER_*` variables to the existing fenced code block under "Required Environment Variables" in `CLAUDE.md`, keeping the same comment style used for the `EMBEDDING_*` block.
- Observable completion: `CLAUDE.md` documents the four reranker variables next to the embedding block and includes a note that `RERANKER_PROVIDER=none` keeps the previous passthrough behaviour.
- _Requirements: 6.2_
- _Boundary: CLAUDE.md_
- _Depends: 1.1_
- [x] 4.3 (P) Document the Ollama pull prerequisite and env block in `README.md`
- In the existing "Install Ollama and pull the default embedding model" section, add a parallel `ollama pull qwen2.5:3b` step (or note that the model used for reranking must be pulled, using the documented default).
- In the `.env` snippet under "Configure Environment Variables", add the four `RERANKER_*` lines with brief comments mirroring the embedding-block style.
- Treat `README-EN.md` and `README-ZH.md` translations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift.
- Observable completion: `README.md` shows the `ollama pull qwen2.5:3b` step and the four reranker env lines in the `.env` snippet.
- _Requirements: 6.3_
- _Boundary: README.md_
- _Depends: 1.1_
- [x] 4.4 (P) Update the stale follow-up claim in the prior spec
- In `.kiro/specs/graphiti-neo4j-finalize/research.md`, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped under `graphiti-ollama-reranker`. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough.
- Observable completion: a grep for "real per-provider reranker is a follow-up" across `.kiro/specs/` returns either zero hits or a pointer note to `graphiti-ollama-reranker`.
- _Requirements: 6.4_
- _Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md_
## Validation
- [x] 5. Structural verification sweep
- [x] 5.1 Grep for legacy reranker references and verify the new wiring is reachable
- Grep `backend/app/services/` for `gpt-4.1-nano` and `OpenAIRerankerClient`; both must return zero hits in code paths owned by this spec.
- Grep `backend/app/services/graphiti_adapter.py` for the symbol of the new reranker class; confirm there is exactly one import site and one use site (the `_get_graphiti()` branch).
- Confirm the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) require no source changes by grepping for `client.graph.search(` call sites and verifying the kwarg shape is unchanged.
- Confirm `_GraphNamespace.search` still filters by `group_id` (no regression to project isolation).
- Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged.
- _Requirements: 1.4, 7.1, 7.2, 7.3_
- _Depends: 3.1_

View File

@ -84,6 +84,17 @@ EMBEDDING_API_KEY # Default: "ollama" (Ollama ignores the value)
# nomic-embed-text are not supported.
# Prerequisite for the default: `ollama pull mxbai-embed-large`.
# Reranker (cross-encoder for Graphiti search results)
RERANKER_PROVIDER # Default: ollama (allowed: "ollama", "none")
# "none" keeps the legacy passthrough — useful for CI /
# slim containers that cannot pull a reranker model.
RERANKER_MODEL # Default: qwen2.5:3b (local Ollama chat model)
# Prerequisite for the default: `ollama pull qwen2.5:3b`.
RERANKER_BASE_URL # Default: value of EMBEDDING_BASE_URL
# (typically http://localhost:11434/v1)
RERANKER_API_KEY # Default: value of EMBEDDING_API_KEY
# (Ollama ignores the value)
# Optional — Accelerated LLM (omit entirely if not used)
LLM_BOOST_API_KEY
LLM_BOOST_BASE_URL

View File

@ -137,11 +137,12 @@ neo4j-admin dbms set-initial-password your_neo4j_password
neo4j start
```
**Install Ollama and pull the default embedding model:**
**Install Ollama and pull the default models:**
```bash
# macOS / Linux: https://ollama.com/download
ollama pull mxbai-embed-large
ollama pull mxbai-embed-large # embedder for the knowledge graph
ollama pull qwen2.5:3b # reranker for Graphiti search results
# Ollama serves the OpenAI-compatible /v1 endpoint on http://localhost:11434
# by default — no further configuration required.
```
@ -181,6 +182,17 @@ EMBEDDING_BASE_URL=http://localhost:11434/v1
EMBEDDING_API_KEY=ollama
EMBEDDING_MODEL=mxbai-embed-large
# Reranker — reorders Graphiti search results before the report tools see them.
# Default targets the same local Ollama host used for embeddings.
# Pre-requisite for the default: `ollama pull qwen2.5:3b`.
# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI /
# slim containers that cannot pull a reranker model).
RERANKER_PROVIDER=ollama
RERANKER_MODEL=qwen2.5:3b
# Optional — both default to the EMBEDDING_* equivalents when unset.
# RERANKER_BASE_URL=http://localhost:11434/v1
# RERANKER_API_KEY=ollama
# Embeddings — remote fallback (uncomment ONE block if you prefer not to run
# Ollama locally). Note: any override must produce 1024-dim vectors to match
# Graphiti's vector index — 768-dim models (e.g. nomic-embed-text) are NOT

View File

@ -52,6 +52,24 @@ class Config:
# to use Google Gemini directly.
GRAPHITI_LLM_PROVIDER = os.environ.get('GRAPHITI_LLM_PROVIDER', 'openai')
# Reranker (cross-encoder) settings. The reranker reorders Graphiti search
# results before they reach the ReportAgent tools. Defaults target the same
# local Ollama host used for embeddings; setting RERANKER_PROVIDER=none
# disables reranking and keeps the legacy passthrough (useful for CI or
# slim containers that cannot pull the reranker model). RERANKER_BASE_URL
# and RERANKER_API_KEY chain through EMBEDDING_BASE_URL / EMBEDDING_API_KEY
# so a single-host Ollama deployment needs no extra configuration.
RERANKER_PROVIDER = os.environ.get('RERANKER_PROVIDER', 'ollama')
RERANKER_MODEL = os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')
RERANKER_BASE_URL = os.environ.get(
'RERANKER_BASE_URL',
os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'),
)
RERANKER_API_KEY = os.environ.get(
'RERANKER_API_KEY',
os.environ.get('EMBEDDING_API_KEY', 'ollama'),
)
# Zep settings (kept for backwards compatibility; deprecated).
ZEP_API_KEY = os.environ.get('ZEP_API_KEY', '')

View File

@ -31,6 +31,7 @@ from graphiti_core.cross_encoder.client import CrossEncoderClient
from ..config import Config
from ..utils.logger import get_logger
from .ollama_reranker import OllamaReranker
logger = get_logger('mirofish.graphiti_adapter')
@ -42,7 +43,9 @@ class _PassthroughReranker(CrossEncoderClient):
descending scores. Injected explicitly so Graphiti does not fall back
to its default ``OpenAIRerankerClient`` (which uses a hard-coded
``gpt-4.1-nano`` model with logprobs and would 401 against Qwen /
Dashscope keys). A real per-provider reranker is a follow-up.
Dashscope keys). Selected when ``Config.RERANKER_PROVIDER == "none"``
useful for CI / slim containers that cannot pull the reranker model.
For real reranking, set ``RERANKER_PROVIDER=ollama`` (the default).
"""
async def rank(self, query: str, passages: list[str]) -> list[tuple[str, float]]:
@ -87,6 +90,31 @@ _graphiti_lock = threading.Lock()
_ALLOWED_GRAPHITI_PROVIDERS = ("openai", "gemini")
_ALLOWED_RERANKER_PROVIDERS = ("ollama", "none")
def _build_reranker(provider: str) -> CrossEncoderClient:
"""Build the cross-encoder reranker for the configured provider.
Defers to ``_PassthroughReranker`` when ``provider`` is ``"none"``
(the legacy no-op behaviour, useful for CI / slim containers that
cannot pull the reranker model). For ``"ollama"`` it constructs the
real Ollama-backed reranker; the construction is side-effect-free, so
Graphiti initialisation does not depend on the Ollama daemon being
reachable at startup.
"""
if provider == "none":
return _PassthroughReranker()
if provider == "ollama":
return OllamaReranker(
model=Config.RERANKER_MODEL,
base_url=Config.RERANKER_BASE_URL,
api_key=Config.RERANKER_API_KEY,
)
raise ValueError(
f"Unknown RERANKER_PROVIDER={provider!r}; "
f"allowed: {_ALLOWED_RERANKER_PROVIDERS}"
)
def _build_llm_and_embedder(provider: str):
@ -146,14 +174,19 @@ def _get_graphiti() -> Graphiti:
if _graphiti_instance is None:
provider = (Config.GRAPHITI_LLM_PROVIDER or "openai").lower()
logger.info(f"Initializing Graphiti client (provider={provider})...")
reranker_provider = (Config.RERANKER_PROVIDER or "ollama").lower()
logger.info(
f"Initializing Graphiti reranker (provider={reranker_provider})..."
)
llm_client, embedder = _build_llm_and_embedder(provider)
cross_encoder = _build_reranker(reranker_provider)
g = Graphiti(
Config.NEO4J_URI,
Config.NEO4J_USER,
Config.NEO4J_PASSWORD,
llm_client=llm_client,
embedder=embedder,
cross_encoder=_PassthroughReranker(),
cross_encoder=cross_encoder,
)
# Use the persistent loop so the driver is bound to it from the start
_run(g.build_indices_and_constraints())

View File

@ -0,0 +1,170 @@
"""Ollama-backed cross-encoder reranker for Graphiti search.
Replaces the no-op ``_PassthroughReranker`` injected into Graphiti by default
with a real reranker that scores passages against a query through an Ollama
chat model exposed over its OpenAI-compatible ``/v1`` surface.
The class implements only ``CrossEncoderClient.rank`` (the sole abstract
member Graphiti requires) and is constructed by ``graphiti_adapter._get_graphiti``
when ``Config.RERANKER_PROVIDER == "ollama"``. It does not perform any
network I/O at construction time so the Flask app can boot even when the
Ollama daemon is unreachable; failures are handled inside ``rank`` and never
propagate, so graph search remains functional under degradation.
"""
import asyncio
import json
import re
from typing import List, Tuple
from openai import AsyncOpenAI
from graphiti_core.cross_encoder.client import CrossEncoderClient
from ..utils.logger import get_logger
logger = get_logger('mirofish.ollama_reranker')
_THINK_BLOCK = re.compile(r"<think>[\s\S]*?</think>", re.IGNORECASE)
_CODE_FENCE_START = re.compile(r"^```(?:json)?\s*\n?", re.IGNORECASE)
_CODE_FENCE_END = re.compile(r"\n?```\s*$")
_FIRST_FLOAT = re.compile(r"-?\d+(?:\.\d+)?")
_SYSTEM_PROMPT = (
"You are a relevance grader. Given a user query and a single passage, "
"rate how relevant the passage is to the query on a continuous scale "
"from 0.0 (not relevant at all) to 1.0 (perfectly relevant). "
"Respond with a single JSON object of the form {\"score\": <float>} "
"and nothing else."
)
def _clip_unit(value: float) -> float:
"""Clamp ``value`` into the closed interval [0.0, 1.0]."""
if value < 0.0:
return 0.0
if value > 1.0:
return 1.0
return value
def _parse_score(raw: str) -> float:
"""Parse a model response into a relevance score in [0.0, 1.0].
Strips reasoning ``<think>`` blocks and markdown fences (the same
defensive pattern used in ``utils/llm_client.py``), then attempts
``json.loads`` and reads ``score``. Falls back to extracting the first
floating-point number from the cleaned text. Raises ``ValueError`` when
no numeric value can be recovered.
"""
text = _THINK_BLOCK.sub("", raw or "").strip()
text = _CODE_FENCE_START.sub("", text)
text = _CODE_FENCE_END.sub("", text).strip()
try:
parsed = json.loads(text)
except (json.JSONDecodeError, TypeError):
parsed = None
if isinstance(parsed, dict) and "score" in parsed:
try:
return _clip_unit(float(parsed["score"]))
except (TypeError, ValueError):
pass
match = _FIRST_FLOAT.search(text)
if match is not None:
try:
return _clip_unit(float(match.group(0)))
except ValueError:
pass
raise ValueError(f"no numeric score in model response: {text!r}")
class OllamaReranker(CrossEncoderClient):
"""Cross-encoder reranker that scores passages via an Ollama chat model.
Subclass of :class:`graphiti_core.cross_encoder.client.CrossEncoderClient`
that implements ``rank`` by issuing one chat-completion request per
passage through ``openai.AsyncOpenAI`` (which speaks the OpenAI-compatible
surface exposed by Ollama on ``/v1``).
Construction is side-effect-free: building the underlying ``AsyncOpenAI``
client does not perform any network I/O, so ``_get_graphiti`` can wire
this class up at startup even when the Ollama daemon is unavailable.
Failures surface only at ``rank`` call time and are degraded to a
passthrough-style result with a single ``WARNING`` log per failed call.
"""
def __init__(self, *, model: str, base_url: str, api_key: str) -> None:
"""Configure the reranker.
Args:
model: Name of the Ollama chat model used to score passages
(for example ``qwen2.5:3b``). The operator is expected to
have run ``ollama pull <model>`` before reranking is exercised.
base_url: OpenAI-compatible endpoint for the Ollama server, for
example ``http://localhost:11434/v1``.
api_key: API key forwarded to the OpenAI client. Ollama ignores
the value but the SDK requires a non-empty string.
"""
self._model = model
self._client = AsyncOpenAI(base_url=base_url, api_key=api_key)
async def _score_passage(self, query: str, passage: str, index: int) -> float:
"""Score a single passage; deterministic low fallback on parse failure."""
user_prompt = (
f"Query:\n{query}\n\n"
f"Passage:\n{passage}\n\n"
"Reply with only the JSON object described in the system prompt."
)
response = await self._client.chat.completions.create(
model=self._model,
messages=[
{"role": "system", "content": _SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
temperature=0.0,
max_tokens=32,
)
raw = response.choices[0].message.content or ""
try:
return _parse_score(raw)
except ValueError as exc:
logger.debug(
"Reranker parse failure (model=%s, passage_index=%d): %s",
self._model, index, exc,
)
return -0.001 * (index + 1)
async def rank(
self,
query: str,
passages: List[str],
) -> List[Tuple[str, float]]:
"""Return ``(passage, score)`` tuples sorted by score descending.
Empty ``passages`` returns ``[]`` without any model call. On a
whole-call failure (connection refused, model 404, timeout, etc.)
the method logs a single ``WARNING`` and returns the passages in
their original order with synthetic descending scores so graph
search keeps functioning. The method does not raise.
"""
if not passages:
return []
try:
scores = await asyncio.gather(
*(self._score_passage(query, p, i) for i, p in enumerate(passages))
)
except Exception as exc: # noqa: BLE001 — graceful degrade per design R5
logger.warning(
"Ollama reranker failed (model=%s, error=%s); falling back to passthrough order.",
self._model, type(exc).__name__,
)
return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]
scored = list(zip(passages, scores))
scored.sort(key=lambda item: item[1], reverse=True)
return scored