diff --git a/.kiro/specs/graphiti-neo4j-finalize/HANDOFF.md b/.kiro/specs/graphiti-neo4j-finalize/HANDOFF.md index eb0219c3..964ff1a2 100644 --- a/.kiro/specs/graphiti-neo4j-finalize/HANDOFF.md +++ b/.kiro/specs/graphiti-neo4j-finalize/HANDOFF.md @@ -62,7 +62,7 @@ Same upload+build flow; expect identical behaviour to pre-change implementation. ## Notes for reviewers - **Default provider flipped** from Gemini (de-facto) to OpenAI-compatible (documented). Existing Gemini deployments must add `GRAPHITI_LLM_PROVIDER=gemini` to `.env` after pulling. Documented in the new `.env.example` and design.md migration section. -- **Reranker is still passthrough** — same behavioural state as before (no real reranking). A real per-provider reranker is intentionally deferred; explanation in `research.md` → "Reranker default behaviour". +- **Reranker is still passthrough** — same behavioural state as before (no real reranking). _Update:_ this was deferred from this spec and has since shipped in follow-up spec `graphiti-ollama-reranker` (ticket #39): the default is now an Ollama-backed `CrossEncoderClient`; `RERANKER_PROVIDER=none` preserves the passthrough behaviour described here. - **`.env.example` write went through Python heredoc** because `pre_tool_env_guard.sh` blocks `cat > .env*` patterns. Worth confirming the file content is what you expect; the new content mirrors the README env section verbatim. ## Spec artefacts diff --git a/.kiro/specs/graphiti-neo4j-finalize/design.md b/.kiro/specs/graphiti-neo4j-finalize/design.md index f2a1900c..f6a23673 100644 --- a/.kiro/specs/graphiti-neo4j-finalize/design.md +++ b/.kiro/specs/graphiti-neo4j-finalize/design.md @@ -16,7 +16,7 @@ - `.env.example` matches what the code reads; the README is unchanged (already correct). ### Non-Goals -- Implementing a real per-provider reranker (deferred to a follow-up). +- Implementing a real per-provider reranker (deferred to a follow-up — shipped in `graphiti-ollama-reranker`, ticket #39). - Pagination cleanup of `_NodeNamespace.get_by_graph_id` / `_EdgeNamespace.get_by_graph_id` (low priority, deferred). - Renaming `zep_*` files (tracked separately). - Migrating data from existing Zep Cloud deployments (project is local-only by design now). @@ -336,7 +336,7 @@ class _PassthroughReranker(CrossEncoderClient): **Implementation Notes** - Integration: Always injected by `_get_graphiti()` regardless of provider. - Validation: None. -- Risks: Search results are still un-reranked. Same behaviour as today; future ticket may introduce a real per-provider reranker. +- Risks: Search results are still un-reranked. Same behaviour as today; superseded by follow-up spec `graphiti-ollama-reranker` (ticket #39), which introduces a real Ollama-backed reranker and keeps this passthrough only when `RERANKER_PROVIDER=none`. #### `_get_graphiti()` (refactored) diff --git a/.kiro/specs/graphiti-neo4j-finalize/research.md b/.kiro/specs/graphiti-neo4j-finalize/research.md index 0826f360..4f08c5e1 100644 --- a/.kiro/specs/graphiti-neo4j-finalize/research.md +++ b/.kiro/specs/graphiti-neo4j-finalize/research.md @@ -24,7 +24,7 @@ - **Context**: Ticket suggests dropping `_GeminiReranker` and "letting Graphiti use its sane default." Verify the default is sane for Qwen. - **Sources Consulted**: `graphiti_core/graphiti.py:154`, `graphiti_core/cross_encoder/openai_reranker_client.py`. - **Findings**: Default is `OpenAIRerankerClient()` with no config → tries `AsyncOpenAI(api_key=None, base_url=None)` → 401 against any non-OpenAI key. Reranker model is fixed to `gpt-4.1-nano`, which Dashscope does not host. -- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker is out of scope (would need a custom OpenAI-compatible logprobs implementation, which Dashscope/Qwen does not reliably support). +- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker was out of scope for this spec; follow-up spec `graphiti-ollama-reranker` (ticket #39) replaces the passthrough with an Ollama-backed `CrossEncoderClient` and keeps `_PassthroughReranker` only when `RERANKER_PROVIDER=none`. ### Env-guard hook scope - **Context**: First Read of `.env.example` was blocked. diff --git a/.kiro/specs/graphiti-ollama-reranker/HANDOFF.md b/.kiro/specs/graphiti-ollama-reranker/HANDOFF.md new file mode 100644 index 00000000..b2789cca --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/HANDOFF.md @@ -0,0 +1,53 @@ +# Handoff — graphiti-ollama-reranker + +## What shipped + +| Task | Status | Notes | +|------|--------|-------| +| 1.1 — Config knobs | ✅ | Four `RERANKER_*` attrs added; `BASE_URL`/`API_KEY` chain to `EMBEDDING_*`. | +| 2.1 — `OllamaReranker` | ✅ | New `backend/app/services/ollama_reranker.py`. Construction is side-effect-free; `rank()` never raises; per-passage parse falls back to deterministic low score; whole-call failure degrades to passthrough order with a single WARNING log. | +| 3.1 — Factory wiring | ✅ | `_get_graphiti()` selects the reranker via new `_build_reranker()`. INFO log announces selection. `ValueError` raised for unknown providers. | +| 4.1 — `.env.example` | ⚠️ Deferred | The `pre_tool_env_guard.sh` Claude hook blocks all `.env*` access (Read, Write, Edit, Bash). Cannot be performed inside this autonomous sandbox. **Reviewer action required** — see snippet below. | +| 4.2 — `CLAUDE.md` | ✅ | New `RERANKER_*` block added under "Required Environment Variables". | +| 4.3 — `README.md` | ✅ | Adds `ollama pull qwen2.5:3b` to the prerequisites and a `RERANKER_*` block in the `.env` snippet. `README-EN.md` / `README-ZH.md` left out per design scope (i18n is its own workstream). | +| 4.4 — Prior-spec follow-up note | ✅ | Updated `graphiti-neo4j-finalize`'s `research.md`, `design.md`, and `HANDOFF.md` to point at this spec; updated the `_PassthroughReranker` docstring in `graphiti_adapter.py`. | +| 5.1 — Structural sweep | ✅ | `gpt-4.1-nano` / `OpenAIRerankerClient` referenced only in docstring text. `OllamaReranker` has exactly one import + one use site. `_GraphNamespace.search` still filters by `group_id`. | + +## Reviewer action required: `.env.example` + +Please paste the following block into `.env.example` alongside the existing `EMBEDDING_*` section: + +```env +# Reranker — reorders Graphiti search results before the report tools see them. +# Default targets the same local Ollama host used for embeddings. +# Pre-requisite for the default: `ollama pull qwen2.5:3b`. +# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI / +# slim containers that cannot pull a reranker model). +RERANKER_PROVIDER=ollama +RERANKER_MODEL=qwen2.5:3b +# Optional — both default to the EMBEDDING_* equivalents when unset. +# RERANKER_BASE_URL=http://localhost:11434/v1 +# RERANKER_API_KEY=ollama +``` + +This block matches what `CLAUDE.md` and `README.md` document. After paste, R6.1 is satisfied and ticket #39's acceptance-criteria checkbox "Configuration is overridable via env vars and documented in `.env.example`" becomes green. + +## Verification performed + +- `Config` loads with the documented defaults; `EMBEDDING_BASE_URL` override propagates to `RERANKER_BASE_URL`. +- `OllamaReranker` constructs without network I/O; empty `passages` returns `[]`; whole-call failure logs WARNING and returns passthrough-ordered tuples. +- `_build_reranker("ollama")` → `OllamaReranker`; `("none")` → `_PassthroughReranker`; `("banana")` → `ValueError` naming the offender and listing `("ollama", "none")`. +- Grep sweep matches design expectations (see Tasks 5.1 in `tasks.md`). + +## Smoke test (recommended before merge) + +With Ollama running and the reranker model pulled: + +```bash +ollama pull qwen2.5:3b +RERANKER_PROVIDER=ollama npm run backend +# In another shell, exercise a graph build + report tool and confirm: +# - Startup log shows "Initializing Graphiti reranker (provider=ollama)..." +# - Search-backed report tool results differ from `RERANKER_PROVIDER=none` output +# - No WARNING about reranker failure in `backend/logs/` +``` diff --git a/.kiro/specs/graphiti-ollama-reranker/design.md b/.kiro/specs/graphiti-ollama-reranker/design.md new file mode 100644 index 00000000..874f82b6 --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/design.md @@ -0,0 +1,395 @@ +# Design — graphiti-ollama-reranker + +## Overview +**Purpose**: Replace the no-op `_PassthroughReranker` injected into Graphiti with a real Ollama-backed `CrossEncoderClient`, so that hybrid search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (`RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY`) with Ollama-aligned defaults; an explicit `RERANKER_PROVIDER=none` preserves the passthrough for CI and slim containers. + +**Users**: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking. + +**Impact**: Adds one new module under `backend/app/services/`, four `Config` attributes, a small selection branch in `_get_graphiti()`, and documentation in `.env.example`, `CLAUDE.md`, `README.md`. No data schema, no API, no UI changes. Behavior under `RERANKER_PROVIDER=none` is identical to today. + +### Goals +- Default Ollama-backed reranker producing one `(passage, score)` tuple per input passage, sorted descending by score. +- Env-driven configuration with sensible Ollama defaults inherited from existing `EMBEDDING_*` settings. +- Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable. +- Documentation parity with `EMBEDDING_*` knobs in `.env.example`, `CLAUDE.md`, and `README.md`. + +### Non-Goals +- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39). +- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults. +- Upstream contributions to `graphiti-core`. +- Adding a `sentence-transformers` or other non-`openai` reranker dependency. + +## Boundary Commitments + +### This Spec Owns +- The Ollama reranker implementation and its prompt/parse logic. +- The `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY` settings and their defaults. +- The branch in `_get_graphiti()` that selects between the Ollama reranker and the passthrough. +- The startup INFO log line that announces the selected reranker. +- Documentation entries in `.env.example`, `CLAUDE.md` "Required Environment Variables", and `README.md` Ollama prerequisites. + +### Out of Boundary +- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline. +- Per-passage retrieval (still owned by `_GraphNamespace.search` and Graphiti). +- The `group_id` scoping rules. +- Any change to the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) — they receive reranked output transparently. +- Implementation of additional reranker providers; this design covers only `ollama` and `none`. + +### Allowed Dependencies +- Upstream library: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0). +- In-repo: `Config` (`backend/app/config.py`), `get_logger` (`backend/app/utils/logger.py`), `openai.AsyncOpenAI` (already installed). +- Existing factory: `_get_graphiti()` continues to be the singleton chokepoint. + +### Revalidation Triggers +- If `graphiti-core` changes the `CrossEncoderClient.rank` signature, this design must be revisited. +- If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in `research.md`). +- If `Config.GRAPHITI_LLM_PROVIDER` semantics change in a way that re-couples LLM and reranker, this design must be checked. + +## Architecture + +### Existing Architecture Analysis +- `_get_graphiti()` already injects an explicit `cross_encoder=_PassthroughReranker()` (line 156). The pattern of double-checked-locking singleton with provider switch (`GRAPHITI_LLM_PROVIDER`) is mature and must be preserved. +- The persistent event loop (`_get_loop`, `_run`) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does **not** need to schedule work onto `_get_loop()`. +- All four ReportAgent tools call `_GraphNamespace.search`, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises. + +### Architecture Pattern & Boundary Map + +```mermaid +graph LR + subgraph Config + EnvVars[RERANKER_*\nenv vars] + ConfigCls[Config attributes] + EnvVars --> ConfigCls + end + + subgraph Adapter + Factory[_get_graphiti] + Passthrough[_PassthroughReranker] + OllamaCls[OllamaReranker] + Factory -->|provider=none| Passthrough + Factory -->|provider=ollama| OllamaCls + end + + subgraph Graphiti + GraphitiCore[Graphiti instance] + Search[_GraphNamespace.search] + Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview] + end + + ConfigCls --> Factory + Passthrough -->|injected as cross_encoder| GraphitiCore + OllamaCls -->|injected as cross_encoder| GraphitiCore + GraphitiCore --> Search + Search --> Tools + + OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint] +``` + +**Architecture Integration**: +- **Selected pattern**: Strategy pattern with two implementations selected at factory time. Same shape as the existing `GRAPHITI_LLM_PROVIDER` branch. +- **Domain/feature boundaries**: Reranker construction and prompt/parse live in `ollama_reranker.py`. Wiring lives in `graphiti_adapter.py`. Config lives in `config.py`. No overlap. +- **Existing patterns preserved**: Double-checked-locking singleton; explicit `cross_encoder` injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; `Config` reads via `os.environ.get(..., default)`. +- **New components rationale**: `OllamaReranker` is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients. +- **Steering compliance**: Single OpenAI-SDK convention preserved; per-project `group_id` scoping unaffected; no new dependency. + +### Technology Stack + +| Layer | Choice / Version | Role in Feature | Notes | +|-------|------------------|-----------------|-------| +| Backend / Services | Python ≥3.11, async via `asyncio` | Hosts the new reranker class. | Inherits project minimum. | +| LLM client | `openai` SDK (already pinned, v2.x) | `AsyncOpenAI` chat completions against Ollama's `/v1`. | No new dependency. | +| Model | Ollama-served chat model, default `qwen2.5:3b` | Produces a numeric relevance score per passage. | Operator may override via `RERANKER_MODEL`. | +| Endpoint | Ollama's OpenAI-compatible `/v1` | Default `http://localhost:11434/v1`. | Reuses `EMBEDDING_BASE_URL` semantics. | +| Graph layer | `graphiti-core ≥ 0.3` | Consumes the new `CrossEncoderClient`. | No upstream change. | + +## File Structure Plan + +### Directory Structure +``` +backend/app/ +├── services/ +│ ├── graphiti_adapter.py # MODIFIED — factory branches on RERANKER_PROVIDER +│ └── ollama_reranker.py # NEW — OllamaReranker(CrossEncoderClient) +├── config.py # MODIFIED — adds RERANKER_* attrs +└── utils/ + └── logger.py # unchanged + +repo-root/ +├── .env.example # MODIFIED — adds RERANKER_* block +├── CLAUDE.md # MODIFIED — Required Environment Variables +└── README.md # MODIFIED — Ollama prerequisites note +``` + +### Modified Files +- `backend/app/services/graphiti_adapter.py` — Add small branch in `_get_graphiti()` that picks `OllamaReranker()` or `_PassthroughReranker()` based on `Config.RERANKER_PROVIDER`. Log the selection at INFO. `_PassthroughReranker` class is unchanged. +- `backend/app/config.py` — Add four new class attributes with documented defaults. No change to existing `validate()` (reranker has no mandatory key). +- `.env.example` — Add a four-line `RERANKER_*` block with comments mirroring the `EMBEDDING_*` style. +- `CLAUDE.md` — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars. +- `README.md` — Update the Ollama prerequisite section to mention `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large`. + +> `_PassthroughReranker` stays in `graphiti_adapter.py` (unchanged contract); only the wiring around it changes. + +## System Flows + +```mermaid +sequenceDiagram + participant Search as _GraphNamespace.search + participant Graphiti as graphiti-core + participant Reranker as OllamaReranker.rank + participant Ollama as Ollama /v1/chat/completions + + Search->>Graphiti: search(query, group_ids=[gid], num_results=N) + Graphiti->>Graphiti: hybrid retrieval (RRF) + Graphiti->>Reranker: rank(query, [p1..pN]) + par per-passage scoring + Reranker->>Ollama: chat.completions(prompt p1, temp=0) + Reranker->>Ollama: chat.completions(prompt p2, temp=0) + Reranker->>Ollama: chat.completions(prompt pN, temp=0) + end + alt all scores parsed + Reranker-->>Graphiti: sorted [(p, score), ...] + else any failure + Reranker->>Reranker: log WARNING, return passthrough order + Reranker-->>Graphiti: original order with synthetic scores + end + Graphiti-->>Search: ranked edges/nodes + Search-->>Tools: ranked results +``` + +**Decision points after diagram**: +- `temperature=0.0` makes the score deterministic per (query, passage, model) tuple. +- Per-passage failures (one bad parse out of N) downrank that passage to `0.0 - 0.001 * index` and continue; only whole-call exceptions degrade to passthrough. +- The reranker never raises; this isolates Graphiti from upstream noise even when `_GraphNamespace.search`'s existing exception swallow is removed in a future refactor. + +## Requirements Traceability + +| Requirement | Summary | Components | Interfaces | Flows | +|-------------|---------|------------|------------|-------| +| 1.1 | Default reranker is Ollama-backed | `_get_graphiti()`, `OllamaReranker` | Inline factory branch | Adapter init | +| 1.2 | No dependency on `OpenAIRerankerClient` | `_get_graphiti()` | Explicit `cross_encoder=` injection (unchanged behavior) | — | +| 1.3 | Unset → defaults to `ollama` | `Config.RERANKER_PROVIDER` | `os.environ.get('RERANKER_PROVIDER', 'ollama')` | — | +| 1.4 | No `gpt-4.1-nano` reference | All new files | — | — | +| 2.1 | Subclass `CrossEncoderClient.rank` | `OllamaReranker` | `async rank(query, passages) -> list[tuple[str, float]]` | Per-passage scoring | +| 2.2 | Uses `openai.AsyncOpenAI` | `OllamaReranker.__init__` | `AsyncOpenAI(base_url, api_key)` | — | +| 2.3 | Returns passages sorted descending | `OllamaReranker.rank` | Postcondition: descending by score | — | +| 2.4 | Empty input → empty output, no model call | `OllamaReranker.rank` | Guard at method entry | — | +| 2.5 | Preserves passage strings byte-for-byte | `OllamaReranker.rank` | Strings are echoed, never rewritten | — | +| 2.6 | Unparseable score → deterministic low fallback | `OllamaReranker.rank` | Internal `_parse_score` helper | Failure branch | +| 3.1 | `RERANKER_PROVIDER` env knob | `Config` | Class attr, default `ollama`, validated `{ollama, none}` | Adapter init | +| 3.2 | `RERANKER_MODEL` env knob | `Config` | Class attr, default `qwen2.5:3b` | — | +| 3.3 | `RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL` | `Config` | Class attr resolves at read time | — | +| 3.4 | `RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY` | `Config` | Class attr | — | +| 3.5 | Unknown value → `ValueError` | `_get_graphiti()` | `_ALLOWED_RERANKER_PROVIDERS` validation | Adapter init | +| 3.6 | Reads via `os.environ.get` only | `Config` | — | — | +| 4.1 | `none` keeps `_PassthroughReranker` | `_get_graphiti()` | Factory branch | Adapter init | +| 4.2 | Graph search remains functional under `none` | `_PassthroughReranker.rank` (unchanged) | — | — | +| 4.3 | INFO log announces selected provider | `_get_graphiti()` | `logger.info` line | Adapter init | +| 5.1 | WARNING log on rerank failure | `OllamaReranker.rank` | `logger.warning` with model + error class | Failure branch | +| 5.2 | No exception propagation to HTTP callers | `OllamaReranker.rank` (never raises) | — | — | +| 5.3 | Original order on whole-call failure | `OllamaReranker.rank` | Passthrough fallback inside method | Failure branch | +| 5.4 | `__init__` never raises | `OllamaReranker.__init__` | `AsyncOpenAI()` lazy I/O | Adapter init | +| 6.1 | `.env.example` documents the four vars | `.env.example` | — | — | +| 6.2 | `CLAUDE.md` lists the four vars | `CLAUDE.md` | — | — | +| 6.3 | `README.md` mentions `ollama pull ` | `README.md` | — | — | +| 6.4 | Old "follow-up" claim updated | `graphiti-neo4j-finalize/research.md` (or design.md) | — | — | +| 7.1 | Reranked order reaches `_GraphNamespace.search` | `OllamaReranker`, `_get_graphiti()` | Through Graphiti's own `search()` | End-to-end | +| 7.2 | No changes to report tools | n/a | n/a | — | +| 7.3 | `group_id` scoping unchanged | `_GraphNamespace.search` (unchanged) | — | — | + +## Components and Interfaces + +| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts | +|-----------|--------------|--------|--------------|--------------------------|-----------| +| `OllamaReranker` | Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1) | Service | +| `Config` (extended) | Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.1–3.6, 4.1 | `os.environ.get` (P0) | State (configuration) | +| `_get_graphiti()` (extended) | Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | `Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0) | Service | +| `.env.example`, `CLAUDE.md`, `README.md` | Docs | Communicate new knobs and Ollama prerequisite. | 6.1–6.4 | — | — | + +--- + +### Backend / Services + +#### `OllamaReranker` + +| Field | Detail | +|-------|--------| +| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. | +| Requirements | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | + +**Responsibilities & Constraints** +- Subclass `graphiti_core.cross_encoder.client.CrossEncoderClient`; implement only `rank`. +- Use `openai.AsyncOpenAI`; no second SDK; no top-level network I/O in `__init__`. +- Preserve passage strings byte-for-byte; never rewrite or truncate. +- Never raise from `rank()`. On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores. +- Deterministic scoring: `temperature=0.0`, no randomness in fallback scores. +- Thread-safety: stateless beyond the immutable `AsyncOpenAI` client and string config; safe under Graphiti's concurrent search. + +**Dependencies** +- Inbound: `_get_graphiti()` — instantiates a single instance and passes it as `cross_encoder=` to `Graphiti(...)` (P0). +- Outbound: `Ollama /v1/chat/completions` via `openai.AsyncOpenAI` (P0). +- External: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai` SDK (P0). + +**Contracts**: Service [x] + +##### Service Interface + +```python +class OllamaReranker(CrossEncoderClient): + def __init__( + self, + *, + model: str, + base_url: str, + api_key: str, + ) -> None: ... + + async def rank( + self, + query: str, + passages: list[str], + ) -> list[tuple[str, float]]: + """ + Score each passage's relevance to `query` and return + `(passage, score)` tuples sorted in descending order of score. + + Preconditions: + - `passages` is a (possibly empty) list of strings. + + Postconditions: + - len(return) == len(passages). + - return is sorted by score descending. + - For all i, return[i][0] is byte-identical to one of the inputs. + - For any rank() call, this method does not raise. + + Invariants: + - Successfully-parsed scores fall in [0.0, 1.0]. + - Fallback scores assigned to unparseable passages fall in [-1.0, 0.0) + and are strictly less than every successfully-parsed score. + """ +``` + +**Implementation Notes** +- **Integration**: Constructed inside `_get_graphiti()` when `Config.RERANKER_PROVIDER == "ollama"`; injected into `Graphiti(..., cross_encoder=...)`. +- **Validation**: + - Reject empty `passages` immediately with `return []`. + - Clip parsed `score` to `[0.0, 1.0]`. + - Treat any uncaught per-passage exception as parse failure and assign deterministic fallback `-0.001 * passage_index`. + - Treat any whole-call exception (e.g. connection refused) as graceful degrade: return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]`. +- **Risks**: Default `qwen2.5:3b` must be `ollama pull`-ed by operators; documented in README. If absent, R5 path kicks in. + +--- + +### Backend / Config + +#### `Config` (extended) + +| Field | Detail | +|-------|--------| +| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. | +| Requirements | 1.3, 3.1–3.6, 4.1 | + +**Responsibilities & Constraints** +- Read from `os.environ.get` only; no new dependency. +- `RERANKER_PROVIDER` default `ollama`; valid values: `ollama`, `none`. +- `RERANKER_MODEL` default `qwen2.5:3b`. +- `RERANKER_BASE_URL` default = `EMBEDDING_BASE_URL` value at module load time. +- `RERANKER_API_KEY` default = `EMBEDDING_API_KEY` value at module load time. +- Validation of `RERANKER_PROVIDER` happens in `_get_graphiti()` (not `Config.validate()`) to keep the validate-at-boot list focused on credential presence. + +**Contracts**: State [x] + +##### State Management +- **State model**: Read-only class attributes resolved once at import. +- **Persistence & consistency**: None; values come from environment. +- **Concurrency strategy**: Immutable after import; safe. + +**Implementation Notes** +- **Integration**: Defaults for `RERANKER_BASE_URL` / `RERANKER_API_KEY` should reference the corresponding `EMBEDDING_*` env vars (not the resolved `Config.EMBEDDING_BASE_URL` constant) so an operator setting only `EMBEDDING_BASE_URL` still gets the reranker pointed at the same Ollama host without needing to set `RERANKER_BASE_URL` explicitly. Implementation reads `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`. +- **Validation**: None at config-load time. Provider value is validated by `_get_graphiti()`. +- **Risks**: An operator who overrides `EMBEDDING_BASE_URL` but not `RERANKER_BASE_URL` will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented. + +--- + +### Backend / Adapter + +#### `_get_graphiti()` (extended) + +| Field | Detail | +|-------|--------| +| Intent | Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice. | +| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 | + +**Responsibilities & Constraints** +- Preserve double-checked locking and singleton semantics exactly. +- Read `Config.RERANKER_PROVIDER` once at construction; do not re-read. +- For `ollama`: construct `OllamaReranker(model=..., base_url=..., api_key=...)`. +- For `none`: construct `_PassthroughReranker()` (current behavior preserved). +- For any other value: raise `ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")` — mirrors the existing `_ALLOWED_GRAPHITI_PROVIDERS` validation pattern. +- Log at INFO once: `f"Initializing Graphiti reranker (provider={provider})..."`. + +**Contracts**: Service [x] + +##### Service Interface + +```python +def _get_graphiti() -> Graphiti: + """Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER.""" +``` + +**Implementation Notes** +- **Integration**: Replaces the unconditional `cross_encoder=_PassthroughReranker()` at `graphiti_adapter.py:156` with a `cross_encoder=_build_reranker(provider)` call. The factory helper lives next to `_build_llm_and_embedder` in the same file. +- **Validation**: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious. +- **Risks**: A typo such as `RERANKER_PROVIDER=Ollama` (capitalized) would raise; the helper lowercases the value before comparison, matching `_get_graphiti`'s existing `(... or "openai").lower()` pattern. + +--- + +### Documentation + +| File | Change | Requirements | +|------|--------|--------------| +| `.env.example` | Add commented block with the four `RERANKER_*` vars and their defaults. Position adjacent to the existing `EMBEDDING_*` block. | 6.1 | +| `CLAUDE.md` | Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`. | 6.2 | +| `README.md` | In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments. | 6.3 | +| `.kiro/specs/graphiti-neo4j-finalize/research.md` | Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 | + +> README also has `README-EN.md` and `README-ZH.md` — the canonical user-facing README is `README.md` per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition. + +## Data Models +Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the `list[tuple[str, float]]` already defined by `CrossEncoderClient.rank`. + +## Error Handling + +### Error Strategy +- **Construction errors**: None possible (no network in `__init__`; no required keys to validate). +- **Per-passage errors**: Caught inside `OllamaReranker.rank`. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once. +- **Whole-call errors** (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost `try/except` in `rank`. Logged at WARNING with model name and error class. Returns `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` — same shape as `_PassthroughReranker` so consumers cannot tell the difference structurally. +- **Configuration errors**: `_get_graphiti()` raises `ValueError` at startup if `RERANKER_PROVIDER` is unknown. The Flask app fails to boot — preferred over silent misconfiguration. + +### Error Categories and Responses +| Category | Trigger | Response | +|----------|---------|----------| +| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. | +| User input (4xx-equivalent) | Unknown `RERANKER_PROVIDER` value | `ValueError` at startup; clear message naming allowed values. | +| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. | + +### Monitoring +- INFO log at startup states the selected provider. +- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate. +- No metrics surface yet; can be added if the reranker becomes a hot path. + +## Testing Strategy + +This project intentionally keeps the test surface minimal (`backend/scripts/test_profile_format.py` is the lone pytest target). Per `steering/tech.md`, do **not** add a heavy test harness. + +- **Unit-level verification** (manual, by the implementer, no committed test files unless small and clearly worth keeping): + 1. Constructing `OllamaReranker` with a bad host does not raise; first `rank()` call logs WARNING and returns passthrough output. + 2. `rank(query, [])` returns `[]` and does not call the client. + 3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte. + 4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores. +- **Integration smoke** (manual): With `qwen2.5:3b` pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. `RERANKER_PROVIDER=none`. +- **Boundary verification**: Grep that `gpt-4.1-nano` and `OpenAIRerankerClient` do not appear in any new code path. + +## Supporting References +- `research.md` — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern. +- `gap-analysis.md` — Requirement-to-asset map. +- `.ticket/39.md` — Source ticket text. diff --git a/.kiro/specs/graphiti-ollama-reranker/gap-analysis.md b/.kiro/specs/graphiti-ollama-reranker/gap-analysis.md new file mode 100644 index 00000000..0c3302a0 --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/gap-analysis.md @@ -0,0 +1,111 @@ +# Implementation Gap Analysis — graphiti-ollama-reranker + +## 1. Current State Investigation + +### Domain Assets + +| Asset | Location | Current behavior | +|-------|----------|------------------| +| `_PassthroughReranker` | `backend/app/services/graphiti_adapter.py:38-51` | Subclass of `graphiti_core.cross_encoder.client.CrossEncoderClient`. `rank(query, passages)` returns `(passage, 1.0 - 0.01 * i)` tuples in input order — no model call. | +| Graphiti factory | `backend/app/services/graphiti_adapter.py:142-162` (`_get_graphiti`) | Double-checked-locking singleton. Branches on `Config.GRAPHITI_LLM_PROVIDER` (`openai` / `gemini`). Always injects `_PassthroughReranker()` as `cross_encoder`. Runs `g.build_indices_and_constraints()` on the persistent event loop. | +| LLM/embedder builder | `backend/app/services/graphiti_adapter.py:92-139` (`_build_llm_and_embedder`) | Lazy-imports provider-specific Graphiti classes. Reads `Config.LLM_*` and `Config.EMBEDDING_*`. | +| Config surface | `backend/app/config.py:33-53` | Single class with class attrs; each is `os.environ.get('KEY', 'default')`. Has `EMBEDDING_MODEL`, `EMBEDDING_BASE_URL`, `EMBEDDING_API_KEY` defaults aligned with local Ollama. | +| Graph-search callers | `_GraphNamespace.search` at `graphiti_adapter.py:488-517`; consumed by `zep_tools.py:491` (`ZepToolsService.search_graph`) and `oasis_profile_generator.py:313, 337`. | All call sites already dropped the misleading `reranker=` kwarg in `graphiti-neo4j-finalize`. They invoke `client.graph.search(graph_id, query, limit, scope)` only. | +| Existing LLM wrapper | `backend/app/utils/llm_client.py` | Uses synchronous `OpenAI()` client. Includes reasoning-model `` stripping and a JSON-mode retry. Not directly relevant to the reranker but documents the in-house OpenAI-SDK pattern. | +| Async-loop helper | `graphiti_adapter.py:54-79` (`_get_loop`, `_run`) | Persistent dedicated event-loop thread used for all Graphiti async calls. The reranker's `rank` is **already** awaited by Graphiti itself, not by `_run`, so the new client can use plain `await` on `openai.AsyncOpenAI`. | + +### Conventions Observed + +- 4-space indent, snake_case, double quotes; English + Chinese mixed in comments — preserve both styles. +- New env vars go into `backend/app/config.py` as class attrs reading from `os.environ.get` with a sensible default. Validation is centralized in `Config.validate()`. +- New backend modules live under `backend/app/services/` with module-level `logger = get_logger('mirofish.')`. +- The OpenAI SDK is the only LLM client. New providers do not add a second SDK — they add a base-URL + model knob. +- No tests for graph code beyond `scripts/test_profile_format.py`; the project explicitly discourages adding a heavy test harness. + +### Integration Surfaces + +- **Upstream contract**: `CrossEncoderClient` is consumed by `graphiti_core` during `Graphiti.search()` execution; the framework calls `await reranker.rank(query, passages)` on whatever event loop the caller is using. +- **Inbound integration**: only one wire point — the `cross_encoder=` kwarg on `Graphiti(...)` in `_get_graphiti()` (`graphiti_adapter.py:156`). +- **Outbound integration**: the reranker calls Ollama via `http://localhost:11434/v1/chat/completions` (OpenAI-compatible). Already proven by `EMBEDDING_BASE_URL` for embeddings; Ollama's chat endpoint follows the same surface. + +## 2. Requirements Feasibility Analysis + +### Requirement-to-Asset Map + +| Requirement | Existing assets | New assets needed | Gap tag | +|-------------|-----------------|-------------------|---------| +| R1: Default is Ollama, not OpenAI default | `_get_graphiti()` already injects an explicit reranker (no default fallthrough). | Switch the injected client class based on `RERANKER_PROVIDER`. | Missing (selection logic). | +| R2: Real `CrossEncoderClient` calling Ollama via OpenAI SDK | Pattern proven in `llm_client.py`; `openai` already in `pyproject.toml`. | New `OllamaReranker` class — subclass of `CrossEncoderClient`, uses `openai.AsyncOpenAI` for `rank()`. | Missing. | +| R3: Env knobs (`RERANKER_PROVIDER/MODEL/BASE_URL/API_KEY`) | Config pattern is established (`EMBEDDING_*` etc.). | Four new `Config` attrs, with defaults falling back to embedding settings where stated. | Missing. | +| R4: `none` provider preserves passthrough | `_PassthroughReranker` already exists. | Branch in `_get_graphiti()` to pick passthrough when provider == `none`. | Missing (small). | +| R5: Graceful degradation when Ollama is down | `_GraphNamespace.search` (lines 515-517) already catches all exceptions and returns empty results with a warning log. | Reranker `rank` must catch its own network/parse errors, log them, and return the original passages with synthetic scores so search still returns *something*. | Missing (within new class). | +| R6: Docs (`.env.example`, `CLAUDE.md`, README) | Existing docs already document `EMBEDDING_*` in three places — pattern is clear. | Add 4 new env lines + Ollama pull note. | Missing (text). | +| R7: Report tools get reranked output transparently | `_GraphNamespace.search` is the single chokepoint already used by all 4 tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`). | None — wiring change in factory propagates automatically. | None (verification only). | + +### Constraints + +- **Async contract**: `CrossEncoderClient.rank` is `async def`. The new client must be async. The OpenAI SDK provides `openai.AsyncOpenAI` for this. +- **Ollama model output shape**: A small chat model (`qwen2.5:3b`, `llama3.2:3b`) can be prompted to emit a numeric score; we cannot rely on `logprobs` because Ollama's OpenAI-compatible surface does not always expose `logprobs`/`logit_bias` consistently. Therefore the scoring strategy is "ask the model for a 0–10 (or 0–1) relevance score per passage and parse it from the text response." +- **No new dependency** allowed. Reranker must reuse `openai` SDK (already installed) — confirmed in `backend/.venv/lib/python3.13/site-packages/openai-2.35.1.dist-info/`. +- **Boot must not fail** when Ollama is unreachable (R5.4). Construction is cheap (build an `AsyncOpenAI` client; no network call). The model availability check happens lazily on first `rank()`. + +### Complexity Signals + +- Mostly a **single file plus config plus docs** change. Algorithmic logic is local to the new class (prompt + parse). No data model changes, no API surface changes, no UI changes. + +### Research Needed (Carry into Design) + +- **Model choice**: pick a small Ollama chat model that (a) is widely pulled, (b) reliably emits a numeric score in a 1–2 token answer, (c) is small enough to run on a typical dev machine. Candidates: `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Design phase will fix the default. +- **Scoring strategy**: per-passage call (N calls per query, simple to parse) vs. batched single-call (one prompt with all passages, harder to align). The per-passage approach is simpler and parallelizable via `asyncio.gather`; latency is bounded by the slowest passage. Design will fix the strategy. +- **Output parsing**: prefer JSON output (`{"score": 0.83}`) with markdown-fence stripping (project convention from `llm_client.chat_json`); fall back to regex-extract first float on parse failure. + +## 3. Implementation Approach Options + +### Option A — Extend `graphiti_adapter.py` In Place +Add the `OllamaReranker` class directly to `graphiti_adapter.py` next to `_PassthroughReranker`, and branch in `_get_graphiti()`. + +- **Trade-offs**: + - ✅ Same module owns all reranker wiring and the singleton; one file to read. + - ✅ Smallest diff; matches the file's existing role as "everything Graphiti". + - ❌ Adds prompt/parse logic to an already long (≈545-line) adapter module. + - ❌ Harder to reuse the reranker outside Graphiti (unlikely, but precludes it). + +### Option B — Separate Module `backend/app/services/ollama_reranker.py` +New module owns the class and its prompt/parse helpers; `graphiti_adapter.py` imports it and selects it in `_get_graphiti()`. + +- **Trade-offs**: + - ✅ Clear single-responsibility module; mirrors the structure suggested in the source ticket #39. + - ✅ Adapter file stays focused on wiring; reranker can be unit-tested in isolation if testing is later added. + - ❌ Slightly more navigation; one extra file in `services/`. + - ❌ The provider-selection branch still lives in the adapter, so two files must agree on the provider string. + +### Option C — Hybrid: Provider Registry +Introduce a small `_RERANKER_PROVIDERS` map (`"ollama" -> _build_ollama_reranker`, `"none" -> _PassthroughReranker`) inside `graphiti_adapter.py`, with the actual class still living in a separate `ollama_reranker.py`. + +- **Trade-offs**: + - ✅ Adding a future provider (e.g. `sentence_transformers`) is a one-line registry change. + - ✅ Keeps reranker class out of the adapter. + - ❌ Slight over-engineering for two providers (`ollama` + `none`); ticket #39 explicitly scopes only the Ollama path. + +## 4. Implementation Complexity & Risk + +- **Effort**: **S (1–3 days)** + - One new class (~80–120 lines), four new config attrs (~10 lines), one factory branch (~10 lines), three doc updates (~30 lines). No schema or API changes. +- **Risk**: **Low** + - Established patterns (config, OpenAI SDK, logger). + - `_PassthroughReranker` is preserved exactly for the `none` fallback, so the worst-case behavior is identical to today. + - The graceful-failure path (R5) requires care, but the existing `_GraphNamespace.search` exception handling already insulates HTTP callers from reranker errors. + +## 5. Recommendations for Design Phase + +- **Preferred approach**: **Option B (separate `ollama_reranker.py` module)**. Best alignment with #39's "implement in `backend/app/services/`", keeps `graphiti_adapter.py` focused on Graphiti wiring, and matches the project's "one concern per module" pattern in `services/`. +- **Key decisions to lock in design**: + 1. Default `RERANKER_MODEL` value (recommend `qwen2.5:3b` — small, broadly available on Ollama, reliable at structured short outputs). + 2. Per-passage scoring strategy with `asyncio.gather` parallelism (simpler, deterministic). + 3. Prompt + parse format: ask for JSON `{"score": <0.0..1.0>}`, strip fences, regex-fallback to first float. + 4. Failure mode for a single passage: assign deterministic low score (e.g. `0.0 - 0.001 * i`) so passage still appears once. + 5. Failure mode for whole `rank()` call: log warning, return original-order tuples with passthrough scores (no exception bubbles up). + 6. Update `.kiro/specs/graphiti-neo4j-finalize/research.md` "follow-up" note to point at this spec (R6.4). +- **Research items carried forward**: + - Confirm `qwen2.5:3b` produces stable JSON scores in benchmark prompts (or pick alternative). + - Decide whether to expose `RERANKER_MAX_PARALLEL` for concurrency limit (default `len(passages)` — likely small, ≤10). diff --git a/.kiro/specs/graphiti-ollama-reranker/requirements.md b/.kiro/specs/graphiti-ollama-reranker/requirements.md new file mode 100644 index 00000000..b15ce0ec --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/requirements.md @@ -0,0 +1,95 @@ +# Requirements Document + +## Project Description (Input) +Replace the no-op `_PassthroughReranker` in `backend/app/services/graphiti_adapter.py` with a real reranker that uses an Ollama-available model, so Graphiti search results are properly reranked for the SearchResult / InsightForge / Panorama / Interview report tools. Add `RERANKER_PROVIDER` / `RERANKER_MODEL` / `RERANKER_BASE_URL` env knobs (defaults: ollama / a small Ollama chat model / EMBEDDING_BASE_URL), keep `_PassthroughReranker` only when `RERANKER_PROVIDER=none`, and update `.env.example`, `CLAUDE.md`, and the README accordingly. Source ticket: #39 (.ticket/39.md). + +## Introduction + +The Graphiti adapter currently injects a `_PassthroughReranker` into the `Graphiti(...)` constructor to bypass the upstream default (`OpenAIRerankerClient` with a hard-coded `gpt-4.1-nano` and OpenAI-specific `logprobs`/`logit_bias`), which would 401 against Qwen/Dashscope keys and is unavailable through Ollama. The passthrough is a no-op: it returns passages in original order with synthetic descending scores, so search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are not actually reranked. + +This feature replaces the no-op with a real reranker backed by a model available through the local Ollama stack (matching the existing `EMBEDDING_MODEL=mxbai-embed-large` precedent). A small set of environment variables makes the provider, model, and endpoint overridable. An explicit `none` provider preserves the passthrough behavior for CI / lightweight setups that cannot pull the reranker model. + +## Boundary Context + +- **In scope**: + - A new `CrossEncoderClient` implementation in `backend/app/services/` that scores passages against a query by calling an Ollama model through its OpenAI-compatible endpoint. + - New `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY` settings in `backend/app/config.py`, with sensible Ollama defaults. + - Provider selection inside `_get_graphiti()` so `ollama` selects the new client and `none` keeps `_PassthroughReranker`. + - Documentation updates in `.env.example`, `CLAUDE.md` (Required Environment Variables), and the project `README.md` (Ollama prerequisites). + - Graceful failure when the configured reranker model is not pulled (clear error, no Flask crash; graph search either falls back to original order or surfaces a logged warning consistent with the existing `_GraphNamespace.search` exception path). +- **Out of scope**: + - Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults. + - Building OpenAI-only or Dashscope-only reranker clients; this spec is specifically the Ollama path (plus the `none` escape hatch). + - Upstream changes to `graphiti-core`. + - Adding any non-Python reranker library (e.g. `sentence-transformers`); the new client must reuse the OpenAI SDK already in the dependency set. +- **Adjacent expectations**: + - `graphiti_adapter._get_graphiti()` continues to be the single Graphiti factory; the new reranker must be wired through it, not at call sites. + - All Graphiti reads remain scoped by `group_id` — the reranker operates on passages already filtered per project; it does not change isolation rules. + - The reranker integrates with `_GraphNamespace.search`, which is the path used by `SearchResult`, `InsightForge`, `Panorama`, and `Interview` tools; behavior changes propagate to those tools automatically and do not need per-tool code changes. + +## Requirements + +### Requirement 1: Default reranker is Ollama-backed, not the OpenAI default +**Objective:** As a backend developer running MiroFish against the default local Ollama stack, I want Graphiti to rerank search results without requiring an OpenAI key, so that report-tool relevance reflects a real model and not an arbitrary insertion order. + +#### Acceptance Criteria +1. The Graphiti Adapter shall instantiate Graphiti with a non-passthrough `CrossEncoderClient` whenever `RERANKER_PROVIDER` resolves to `ollama` (the default). +2. The Graphiti Adapter shall not depend on `graphiti_core.cross_encoder.openai_reranker_client.OpenAIRerankerClient` for the default code path. +3. When `RERANKER_PROVIDER` is unset, the Graphiti Adapter shall behave as if `RERANKER_PROVIDER=ollama`. +4. The Graphiti Adapter shall not reference the model name `gpt-4.1-nano` in any reranker code path. + +### Requirement 2: Ollama-backed reranker scores passages via an OpenAI-compatible chat endpoint +**Objective:** As a backend developer, I want a reranker that talks to a locally hosted model so that the local-first stack stays self-contained and no remote LLM key is required. + +#### Acceptance Criteria +1. The Ollama Reranker shall expose a class that subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements the asynchronous `rank(query, passages) -> list[tuple[passage, score]]` contract. +2. The Ollama Reranker shall call its configured chat-completions endpoint through the `openai` SDK using `RERANKER_BASE_URL` and `RERANKER_API_KEY`, so no second SDK is introduced. +3. The Ollama Reranker shall return passages sorted by descending score (highest relevance first) with one score per input passage. +4. When `passages` is empty, the Ollama Reranker shall return an empty list without issuing any model call. +5. The Ollama Reranker shall preserve passage strings byte-for-byte; it shall not rewrite, truncate, or reorder content within an individual passage. +6. If the model response cannot be parsed into a numeric score for a passage, the Ollama Reranker shall assign that passage a deterministic fallback score lower than every successfully-parsed score so the passage still appears in the output exactly once. + +### Requirement 3: Reranker is configurable via environment variables +**Objective:** As an operator deploying MiroFish, I want to override the reranker provider, model, and endpoint via environment variables so that I can target a different Ollama host, a different model, or disable reranking entirely. + +#### Acceptance Criteria +1. The Configuration module shall expose `RERANKER_PROVIDER` with default `ollama` and accept the values `ollama` and `none`. +2. The Configuration module shall expose `RERANKER_MODEL` whose default is a small Ollama-available chat model selected during design (e.g. `qwen2.5:3b` or `llama3.2:3b`). +3. The Configuration module shall expose `RERANKER_BASE_URL` whose default is the value of `EMBEDDING_BASE_URL` (so the same Ollama host is reused by default). +4. The Configuration module shall expose `RERANKER_API_KEY` whose default is the value of `EMBEDDING_API_KEY` (so Ollama's ignored-token default `ollama` works without explicit configuration). +5. If `RERANKER_PROVIDER` is set to a value other than `ollama` or `none`, the Graphiti Adapter shall raise a clear `ValueError` at startup naming the offending value and listing accepted values. +6. The Configuration module shall read all four reranker variables from the process environment via the same `os.environ.get` pattern used by the surrounding settings, with no additional dependencies. + +### Requirement 4: `none` provider preserves the passthrough fallback for CI / lightweight setups +**Objective:** As a developer running tests or a slim container that cannot pull the reranker model, I want to disable reranking explicitly so the Flask app still boots and graph search still works. + +#### Acceptance Criteria +1. Where `RERANKER_PROVIDER=none`, the Graphiti Adapter shall continue to inject `_PassthroughReranker` and shall not attempt any model call at startup. +2. While `RERANKER_PROVIDER=none`, graph search shall return results in the order Graphiti supplies them with the existing synthetic-descending-score behavior. +3. The Graphiti Adapter shall log at INFO level the selected reranker provider during initialization so operators can confirm whether reranking is active. + +### Requirement 5: Graceful degradation when the configured Ollama model is unreachable +**Objective:** As an operator who forgot to run `ollama pull ` (or whose Ollama service is down), I want the Flask backend to keep serving requests with a clear log signal rather than crashing. + +#### Acceptance Criteria +1. If the Ollama Reranker fails to score passages for a given query (e.g. connection refused, 404 model not found, timeout, or unparseable response), the Graphiti Adapter shall log a warning that names the failing model and the error class. +2. If the Ollama Reranker raises during a `rank` call, the calling `_GraphNamespace.search` shall not propagate the exception to HTTP callers; existing search-error handling already swallows reranker errors into a logged warning, and this behavior shall be preserved. +3. When the Ollama Reranker fails for a query, the rerank-failure path shall return the passages in their original Graphiti order so search remains functional. +4. The Ollama Reranker shall not raise during construction (i.e. `_get_graphiti()` must succeed even if the Ollama service is unavailable); failures are deferred until the first `rank` call. + +### Requirement 6: Documentation reflects the new reranker configuration +**Objective:** As a new contributor reading the docs, I want the reranker env vars, defaults, and prerequisites documented in the same places the other LLM/embedder settings live so configuration is discoverable. + +#### Acceptance Criteria +1. The Environment Example file (`.env.example`) shall include entries for `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY`, each commented with its default and accepted values. +2. The CLAUDE.md document shall list the four reranker variables in its "Required Environment Variables" section with the same level of detail used for `EMBEDDING_MODEL`. +3. The README.md document shall mention the `ollama pull ` prerequisite alongside the existing `ollama pull mxbai-embed-large` note (or wherever Ollama setup is documented). +4. Where the `.kiro/specs/graphiti-neo4j-finalize` documents state that the reranker is a passthrough no-op, those documents shall either be updated to point at this spec or left untouched (decided in design); the constraint is that no documentation shall continue to claim "a real per-provider reranker is a follow-up" once this spec is implemented. + +### Requirement 7: Report-tool integration verifies reranked output reaches consumers +**Objective:** As a developer using the ReportAgent tools, I want `SearchResult`, `InsightForge`, `Panorama`, and `Interview` to receive properly reranked edges/nodes so their report output reflects model-judged relevance, not Graphiti's hybrid-search ordering alone. + +#### Acceptance Criteria +1. When `RERANKER_PROVIDER=ollama` is active and the configured model is available, the `_GraphNamespace.search` shall return passages whose order is determined by the Ollama Reranker, not Graphiti's default RRF ordering. +2. The ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) shall require no changes for this feature; the rerank improvement reaches them transparently through `_GraphNamespace.search`. +3. While the Ollama Reranker is active, the per-project `group_id` scoping of all Graphiti queries shall remain unchanged. diff --git a/.kiro/specs/graphiti-ollama-reranker/research.md b/.kiro/specs/graphiti-ollama-reranker/research.md new file mode 100644 index 00000000..f4355f26 --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/research.md @@ -0,0 +1,112 @@ +# Research & Design Decisions — graphiti-ollama-reranker + +## Summary +- **Feature**: `graphiti-ollama-reranker` +- **Discovery Scope**: Extension (one new service module + factory branch + config + docs). +- **Key Findings**: + - `CrossEncoderClient.rank(query, passages) -> list[tuple[str, float]]` is the only abstract contract Graphiti requires of the reranker. The existing `_PassthroughReranker` already exercises this contract correctly. + - Ollama's OpenAI-compatible `/v1/chat/completions` endpoint does not reliably expose `logprobs` / `logit_bias`, so Graphiti's default OpenAI scoring approach (binary YES/NO over token logits) cannot be ported. The reranker must use **prompted numeric scoring** with text-output parsing. + - The `openai` SDK already shipped in `backend/.venv` (v2.35.1) exposes `AsyncOpenAI`, which is the right client for the async `rank()` method without introducing any new dependency. + +## Research Log + +### Graphiti's `CrossEncoderClient` contract +- **Context**: Need to confirm the precise shape of the `rank` interface and any other abstract members. +- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:38-51` (`_PassthroughReranker`); `.kiro/specs/graphiti-neo4j-finalize/research.md` and `gap-analysis.md` (which captured the upstream contract on first integration); ticket #39 narrative. +- **Findings**: + - `_PassthroughReranker` subclasses `CrossEncoderClient` and only overrides `async def rank(query: str, passages: list[str]) -> list[tuple[str, float]]`. + - Graphiti's internal call site (`graphiti_core/graphiti.py:154`) constructs the reranker once and calls `rank` per search. There is no separate batch interface to satisfy. + - Passages are short text snippets (entity-edge facts / node summaries). Typical N per search ≤ 10 (limit defaulted in `_GraphNamespace.search`). +- **Implications**: A drop-in subclass that implements `rank` is sufficient. No additional abstract methods to wire. + +### Ollama OpenAI-compatible scoring surface +- **Context**: Decide how to obtain a relevance score per passage from a small Ollama-served chat model. +- **Sources Consulted**: Project-internal `backend/app/utils/llm_client.py` (uses `openai.OpenAI` + `chat.completions.create` against Dashscope / OpenAI / Ollama uniformly); ticket #39 "Proposed approach" section enumerating Ollama chat-model scoring vs. embedding cosine. +- **Findings**: + - Ollama supports `/v1/chat/completions` for chat models like `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Pulling a model is required (`ollama pull `). + - JSON-mode (`response_format={"type": "json_object"}`) is honored by recent Ollama versions but not universally; project convention is to fall back gracefully (cf. `LLMClient.chat_json`). + - Embedding-cosine reranker is feasible (re-embed query and passages with `mxbai-embed-large`) but produces a weaker ordering signal than an LLM that can reason about the question. Picking LLM scoring matches the ticket's preferred path. +- **Implications**: + - Use a chat-completion call per passage with a deterministic temperature (0.0) and a tight system prompt asking for a JSON score in [0.0, 1.0]. + - Parse with the same defensive strategy used elsewhere: strip `` blocks, strip markdown fences, attempt `json.loads`, regex-fallback to first float, deterministic low score on hard failure. + +### Concurrency strategy +- **Context**: Decide between per-passage parallel calls vs. one batched call. +- **Findings**: + - Per-passage with `asyncio.gather` is simpler to align outputs and resilient — a single bad output only loses one passage's score. + - Single batched prompt requires the model to emit aligned scores (often by index); LLMs occasionally drop entries or misorder them, demanding additional validation. + - With typical `limit ≤ 10`, parallel per-passage calls hit Ollama briefly; on a 3B model this is < 5s for 10 passages. +- **Implications**: Default to per-passage `asyncio.gather`. Expose no extra concurrency knob initially (avoid premature configuration surface; YAGNI per project guidelines). + +### Failure semantics +- **Context**: Required by R5 — Flask must keep serving on Ollama outage, and graph search should remain functional. +- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:515-517` (`_GraphNamespace.search` swallows all exceptions and logs a warning); `_get_graphiti()` runs once at first call. +- **Findings**: + - Construction of an `openai.AsyncOpenAI` client does not perform any network I/O. Therefore `OllamaReranker.__init__` can be safe at startup even when Ollama is down. + - If `rank()` itself raises, the upstream `Graphiti.search` may surface the exception. The new reranker should therefore catch its own errors and degrade to passthrough behavior in-method rather than relying on the outer `try/except` in `_GraphNamespace.search`. +- **Implications**: `OllamaReranker.rank` should never raise. On exception or unparseable output it returns the input passages in the original order with passthrough-style synthetic scores and emits a single WARNING log per failure (rate-limited by intent: one log per rank() call). + +## Architecture Pattern Evaluation + +| Option | Description | Strengths | Risks / Limitations | Notes | +|--------|-------------|-----------|---------------------|-------| +| A: Add class to `graphiti_adapter.py` | Define `OllamaReranker` next to `_PassthroughReranker` in the same file. | Minimal diff; single file to read. | Bloats an already-long adapter; mixes wiring with provider-specific logic. | — | +| B: New `services/ollama_reranker.py` module | Dedicated module owns prompt + parse + async client; adapter only selects it. | Single-responsibility module; matches ticket suggestion; reusable in isolation. | One extra import in adapter. | **Selected.** Aligns with project pattern of one concern per `services/*` file. | +| C: Hybrid provider registry | Map `RERANKER_PROVIDER → builder` in adapter; class still in B's module. | Future providers are a one-line registry change. | Over-engineering for two providers (`ollama` + `none`). | Deferred until a third provider is needed. | + +## Design Decisions + +### Decision: Provider selected via env var, branch lives in `_get_graphiti()` +- **Context**: R3 requires env-driven provider selection; only two values supported by this spec (`ollama` and `none`). +- **Alternatives Considered**: + 1. Function-pointer registry (Option C). + 2. Inline `if/else` in the factory selecting one of two classes. +- **Selected Approach**: Inline branch in `_get_graphiti()` reads `Config.RERANKER_PROVIDER`, picks `_build_ollama_reranker()` or `_PassthroughReranker()`, validates unknown values with a `ValueError` matching the existing `_ALLOWED_GRAPHITI_PROVIDERS` convention. +- **Rationale**: Mirrors the established `GRAPHITI_LLM_PROVIDER` validation pattern (`_ALLOWED_GRAPHITI_PROVIDERS`) without adding speculative abstraction. Two values, two branches. +- **Trade-offs**: Adding a third provider later costs one more `elif`; acceptable. +- **Follow-up**: Surface the selected provider in the INFO startup log so operators can confirm. + +### Decision: Per-passage scoring with `asyncio.gather`, no concurrency knob +- **Context**: R2.3 requires one score per passage in descending order; R5 requires graceful per-call failure. +- **Alternatives Considered**: + 1. Single batched prompt with index-aligned output. + 2. Per-passage call with bounded `Semaphore`. +- **Selected Approach**: Per-passage `asyncio.gather` with no explicit limit; rely on default `limit ≤ 10` in `_GraphNamespace.search`. +- **Rationale**: Simple, deterministic, isolates per-passage failures. Avoids premature configuration knob. +- **Trade-offs**: If a future caller asks for `limit=100`, Ollama may queue 100 requests; acceptable for now because no caller does this. +- **Follow-up**: If real-world rerank latency becomes a concern, add `RERANKER_MAX_PARALLEL` then. + +### Decision: Default model = `qwen2.5:3b` +- **Context**: Need a small, broadly-available Ollama chat model that reliably emits a numeric score in 1–2 tokens. +- **Alternatives Considered**: + 1. `qwen2.5:3b` (Apache-2.0, 3B params, strong instruction following). + 2. `llama3.2:3b` (Llama community license, 3B). + 3. `phi3:3.8b` (MIT, 3.8B). +- **Selected Approach**: `qwen2.5:3b`. +- **Rationale**: Matches the Qwen-family alignment of the rest of the project (`qwen-plus` is the documented LLM default). Apache-2.0 license is permissive. Small enough for typical dev machines. +- **Trade-offs**: Operators on systems without `qwen2.5:3b` must `ollama pull qwen2.5:3b` or override `RERANKER_MODEL`. +- **Follow-up**: README will document `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large` step. + +### Decision: Defensive output parsing (`json.loads` → regex float → deterministic low score) +- **Context**: R2.6 requires deterministic handling of unparseable model responses. +- **Selected Approach**: + 1. Strip `...` blocks (project convention from `llm_client.py:64`). + 2. Strip markdown fences (project convention from `llm_client.chat_json`). + 3. `json.loads` and read `score` (float in `[0, 1]`, clipped on out-of-range). + 4. On JSON failure, regex-extract the first float token; clip to `[0, 1]`. + 5. On total failure, assign `0.0 - 0.001 * passage_index` (deterministic and below any successfully-parsed score). +- **Rationale**: Reuses patterns already in the codebase. Keeps every passage in the output (R2.6). +- **Trade-offs**: One failed parse silently downranks a passage; logged at DEBUG (not WARNING) to avoid log spam. + +## Risks & Mitigations +- **Risk**: Ollama service is not running on startup → boot must not fail. **Mitigation**: Construct only `AsyncOpenAI` (no network call) during `__init__`. Defer connectivity to first `rank()`. R5.4. +- **Risk**: Model is not pulled → `rank()` raises 404 from Ollama. **Mitigation**: Catch within `rank()`, log WARNING naming model + error class, return passthrough-ordered tuples so search still works. R5.1, R5.3. +- **Risk**: Operator misconfigures `RERANKER_PROVIDER` to an unknown value → silent fallthrough to wrong reranker. **Mitigation**: `_get_graphiti()` raises `ValueError` listing allowed values, mirroring `_ALLOWED_GRAPHITI_PROVIDERS`. R3.5. +- **Risk**: Multiple concurrent `rank()` calls overwhelm a small local Ollama daemon. **Mitigation**: Accept default Graphiti `limit ≤ 10`; document `RERANKER_MAX_PARALLEL` as a future follow-up if needed. + +## References +- `backend/app/services/graphiti_adapter.py:38-51` — current passthrough reranker contract. +- `backend/app/services/graphiti_adapter.py:142-162` — current `_get_graphiti()` wiring point. +- `backend/app/utils/llm_client.py` — project pattern for OpenAI-SDK chat + JSON parsing + reasoning-block stripping. +- `.kiro/specs/graphiti-neo4j-finalize/research.md` — historical context for why the passthrough was introduced. +- Ticket `#39` in `.ticket/39.md` — feature brief and acceptance criteria. diff --git a/.kiro/specs/graphiti-ollama-reranker/spec.json b/.kiro/specs/graphiti-ollama-reranker/spec.json new file mode 100644 index 00000000..8523933c --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/spec.json @@ -0,0 +1,23 @@ +{ + "feature_name": "graphiti-ollama-reranker", + "created_at": "2026-05-11T10:24:16Z", + "updated_at": "2026-05-11T10:45:00Z", + "language": "en", + "phase": "tasks-generated", + "approvals": { + "requirements": { + "generated": true, + "approved": true + }, + "design": { + "generated": true, + "approved": true + }, + "tasks": { + "generated": true, + "approved": true + } + }, + "ready_for_implementation": true, + "ticket": 39 +} diff --git a/.kiro/specs/graphiti-ollama-reranker/tasks.md b/.kiro/specs/graphiti-ollama-reranker/tasks.md new file mode 100644 index 00000000..03be9834 --- /dev/null +++ b/.kiro/specs/graphiti-ollama-reranker/tasks.md @@ -0,0 +1,89 @@ +# Implementation Plan + +> Foundation tasks introduce the four `RERANKER_*` configuration knobs. +> Core tasks add the new `OllamaReranker` and the factory selection branch. +> Integration tasks wire documentation parity. +> Validation closes the loop with a structural sweep. + +## Foundation + +- [x] 1. Add reranker configuration surface +- [x] 1.1 Introduce four `RERANKER_*` settings on the `Config` class + - Add `RERANKER_PROVIDER` with default `ollama`, read via `os.environ.get('RERANKER_PROVIDER', 'ollama')`. + - Add `RERANKER_MODEL` with default `qwen2.5:3b`, read via `os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')`. + - Add `RERANKER_BASE_URL` with default that chains to the embedding host: `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`. Do not reference `Config.EMBEDDING_BASE_URL` directly; use the env-lookup form so behaviour stays consistent under reload patterns. + - Add `RERANKER_API_KEY` with default that chains to the embedding key the same way (`os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))`). + - Do not add the reranker to `Config.validate()`; the provider has no mandatory credentials. + - Observable completion: a Python REPL that imports `Config` shows the four attributes with the documented defaults, and overriding `EMBEDDING_BASE_URL` in the environment is visible on `Config.RERANKER_BASE_URL` too. + - _Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6_ + +## Core + +- [x] 2. Implement the Ollama-backed reranker +- [x] 2.1 Create the new reranker module with the `CrossEncoderClient` subclass + - Define a new module under `backend/app/services/` that hosts the reranker class. The class subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements only the async `rank` method. + - Constructor accepts `model`, `base_url`, `api_key` as keyword arguments; it instantiates `openai.AsyncOpenAI(base_url=..., api_key=...)` but performs no network I/O so the Flask app can boot when Ollama is unreachable. + - `rank(query, passages)` short-circuits on empty `passages` and returns `[]` without any model call. + - For each passage, send a single chat-completion request with `temperature=0.0` and a deterministic system prompt asking for a JSON object `{"score": <0.0..1.0>}` describing the passage's relevance to the query. Use `asyncio.gather` to run all per-passage requests concurrently. + - Parse each model response defensively: strip any `...` block, strip markdown code fences, attempt `json.loads`, fall back to regex-extract the first floating-point number, clip the value to `[0.0, 1.0]`. On any per-passage failure, assign a deterministic fallback score of `-0.001 * passage_index` and log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome. + - Wrap the whole call in a `try/except`. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` so search remains functional. The method must not raise. + - Sort the returned list by score descending before returning. + - Observable completion: instantiating the new class with a deliberately bad `base_url` does not raise; an async call to `rank("q", [])` returns `[]`; an async call with two non-empty passages against a reachable Ollama returns two `(passage, float)` tuples in descending-score order, with every input passage byte-identical in the output. + - _Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1_ + - _Boundary: OllamaReranker module_ + +## Integration + +- [x] 3. Wire the new reranker into the Graphiti factory +- [x] 3.1 Select the reranker inside `_get_graphiti()` based on `Config.RERANKER_PROVIDER` + - Introduce a small allow-list constant alongside `_ALLOWED_GRAPHITI_PROVIDERS` enumerating `("ollama", "none")`. + - Read `Config.RERANKER_PROVIDER`, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raise `ValueError` with a message that names the offending value and lists the accepted values — same shape as the existing `GRAPHITI_LLM_PROVIDER` validation. + - For `ollama`, construct the new `OllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY)` and pass it as the `cross_encoder=` argument to `Graphiti(...)`. + - For `none`, continue to pass `_PassthroughReranker()` as today; do not change the passthrough class. + - Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log). + - Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime. + - Observable completion: with `RERANKER_PROVIDER` unset, app startup logs `Initializing Graphiti reranker (provider=ollama)...` and Graphiti is constructed with the `OllamaReranker`. With `RERANKER_PROVIDER=none`, the log reports `none` and Graphiti uses `_PassthroughReranker`. With `RERANKER_PROVIDER=banana`, `_get_graphiti()` raises `ValueError` listing `('ollama', 'none')`. + - _Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3_ + - _Depends: 1.1, 2.1_ + +- [ ] 4. Update operator-facing documentation +- [ ] 4.1 (P) Add the new env knobs to `.env.example` *(deferred — sandbox hook blocks all `.env*` access; see HANDOFF.md)* + - Insert a four-line `RERANKER_*` block adjacent to the existing `EMBEDDING_*` block, mirroring the comment style (default, accepted values, and a one-line note that `RERANKER_PROVIDER=none` disables reranking). + - Observable completion: opening `.env.example` shows the four new variables with documented defaults, positioned next to the embedding block. + - _Requirements: 6.1_ + - _Boundary: .env.example_ + - _Depends: 1.1_ + +- [x] 4.2 (P) Extend the `Required Environment Variables` snippet in `CLAUDE.md` + - Add the four `RERANKER_*` variables to the existing fenced code block under "Required Environment Variables" in `CLAUDE.md`, keeping the same comment style used for the `EMBEDDING_*` block. + - Observable completion: `CLAUDE.md` documents the four reranker variables next to the embedding block and includes a note that `RERANKER_PROVIDER=none` keeps the previous passthrough behaviour. + - _Requirements: 6.2_ + - _Boundary: CLAUDE.md_ + - _Depends: 1.1_ + +- [x] 4.3 (P) Document the Ollama pull prerequisite and env block in `README.md` + - In the existing "Install Ollama and pull the default embedding model" section, add a parallel `ollama pull qwen2.5:3b` step (or note that the model used for reranking must be pulled, using the documented default). + - In the `.env` snippet under "Configure Environment Variables", add the four `RERANKER_*` lines with brief comments mirroring the embedding-block style. + - Treat `README-EN.md` and `README-ZH.md` translations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift. + - Observable completion: `README.md` shows the `ollama pull qwen2.5:3b` step and the four reranker env lines in the `.env` snippet. + - _Requirements: 6.3_ + - _Boundary: README.md_ + - _Depends: 1.1_ + +- [x] 4.4 (P) Update the stale follow-up claim in the prior spec + - In `.kiro/specs/graphiti-neo4j-finalize/research.md`, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped under `graphiti-ollama-reranker`. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough. + - Observable completion: a grep for "real per-provider reranker is a follow-up" across `.kiro/specs/` returns either zero hits or a pointer note to `graphiti-ollama-reranker`. + - _Requirements: 6.4_ + - _Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md_ + +## Validation + +- [x] 5. Structural verification sweep +- [x] 5.1 Grep for legacy reranker references and verify the new wiring is reachable + - Grep `backend/app/services/` for `gpt-4.1-nano` and `OpenAIRerankerClient`; both must return zero hits in code paths owned by this spec. + - Grep `backend/app/services/graphiti_adapter.py` for the symbol of the new reranker class; confirm there is exactly one import site and one use site (the `_get_graphiti()` branch). + - Confirm the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) require no source changes by grepping for `client.graph.search(` call sites and verifying the kwarg shape is unchanged. + - Confirm `_GraphNamespace.search` still filters by `group_id` (no regression to project isolation). + - Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged. + - _Requirements: 1.4, 7.1, 7.2, 7.3_ + - _Depends: 3.1_ diff --git a/CLAUDE.md b/CLAUDE.md index 99240fb8..ccb99029 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -84,6 +84,17 @@ EMBEDDING_API_KEY # Default: "ollama" (Ollama ignores the value) # nomic-embed-text are not supported. # Prerequisite for the default: `ollama pull mxbai-embed-large`. +# Reranker (cross-encoder for Graphiti search results) +RERANKER_PROVIDER # Default: ollama (allowed: "ollama", "none") + # "none" keeps the legacy passthrough — useful for CI / + # slim containers that cannot pull a reranker model. +RERANKER_MODEL # Default: qwen2.5:3b (local Ollama chat model) + # Prerequisite for the default: `ollama pull qwen2.5:3b`. +RERANKER_BASE_URL # Default: value of EMBEDDING_BASE_URL + # (typically http://localhost:11434/v1) +RERANKER_API_KEY # Default: value of EMBEDDING_API_KEY + # (Ollama ignores the value) + # Optional — Accelerated LLM (omit entirely if not used) LLM_BOOST_API_KEY LLM_BOOST_BASE_URL diff --git a/README.md b/README.md index 05be734b..79563a7d 100644 --- a/README.md +++ b/README.md @@ -137,11 +137,12 @@ neo4j-admin dbms set-initial-password your_neo4j_password neo4j start ``` -**Install Ollama and pull the default embedding model:** +**Install Ollama and pull the default models:** ```bash # macOS / Linux: https://ollama.com/download -ollama pull mxbai-embed-large +ollama pull mxbai-embed-large # embedder for the knowledge graph +ollama pull qwen2.5:3b # reranker for Graphiti search results # Ollama serves the OpenAI-compatible /v1 endpoint on http://localhost:11434 # by default — no further configuration required. ``` @@ -181,6 +182,17 @@ EMBEDDING_BASE_URL=http://localhost:11434/v1 EMBEDDING_API_KEY=ollama EMBEDDING_MODEL=mxbai-embed-large +# Reranker — reorders Graphiti search results before the report tools see them. +# Default targets the same local Ollama host used for embeddings. +# Pre-requisite for the default: `ollama pull qwen2.5:3b`. +# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI / +# slim containers that cannot pull a reranker model). +RERANKER_PROVIDER=ollama +RERANKER_MODEL=qwen2.5:3b +# Optional — both default to the EMBEDDING_* equivalents when unset. +# RERANKER_BASE_URL=http://localhost:11434/v1 +# RERANKER_API_KEY=ollama + # Embeddings — remote fallback (uncomment ONE block if you prefer not to run # Ollama locally). Note: any override must produce 1024-dim vectors to match # Graphiti's vector index — 768-dim models (e.g. nomic-embed-text) are NOT diff --git a/backend/app/config.py b/backend/app/config.py index 06ba9097..8477b23a 100644 --- a/backend/app/config.py +++ b/backend/app/config.py @@ -52,6 +52,24 @@ class Config: # to use Google Gemini directly. GRAPHITI_LLM_PROVIDER = os.environ.get('GRAPHITI_LLM_PROVIDER', 'openai') + # Reranker (cross-encoder) settings. The reranker reorders Graphiti search + # results before they reach the ReportAgent tools. Defaults target the same + # local Ollama host used for embeddings; setting RERANKER_PROVIDER=none + # disables reranking and keeps the legacy passthrough (useful for CI or + # slim containers that cannot pull the reranker model). RERANKER_BASE_URL + # and RERANKER_API_KEY chain through EMBEDDING_BASE_URL / EMBEDDING_API_KEY + # so a single-host Ollama deployment needs no extra configuration. + RERANKER_PROVIDER = os.environ.get('RERANKER_PROVIDER', 'ollama') + RERANKER_MODEL = os.environ.get('RERANKER_MODEL', 'qwen2.5:3b') + RERANKER_BASE_URL = os.environ.get( + 'RERANKER_BASE_URL', + os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'), + ) + RERANKER_API_KEY = os.environ.get( + 'RERANKER_API_KEY', + os.environ.get('EMBEDDING_API_KEY', 'ollama'), + ) + # Zep settings (kept for backwards compatibility; deprecated). ZEP_API_KEY = os.environ.get('ZEP_API_KEY', '') diff --git a/backend/app/services/graphiti_adapter.py b/backend/app/services/graphiti_adapter.py index b16f0e11..13f4899e 100644 --- a/backend/app/services/graphiti_adapter.py +++ b/backend/app/services/graphiti_adapter.py @@ -31,6 +31,7 @@ from graphiti_core.cross_encoder.client import CrossEncoderClient from ..config import Config from ..utils.logger import get_logger +from .ollama_reranker import OllamaReranker logger = get_logger('mirofish.graphiti_adapter') @@ -42,7 +43,9 @@ class _PassthroughReranker(CrossEncoderClient): descending scores. Injected explicitly so Graphiti does not fall back to its default ``OpenAIRerankerClient`` (which uses a hard-coded ``gpt-4.1-nano`` model with logprobs and would 401 against Qwen / - Dashscope keys). A real per-provider reranker is a follow-up. + Dashscope keys). Selected when ``Config.RERANKER_PROVIDER == "none"`` + — useful for CI / slim containers that cannot pull the reranker model. + For real reranking, set ``RERANKER_PROVIDER=ollama`` (the default). """ async def rank(self, query: str, passages: list[str]) -> list[tuple[str, float]]: @@ -87,6 +90,31 @@ _graphiti_lock = threading.Lock() _ALLOWED_GRAPHITI_PROVIDERS = ("openai", "gemini") +_ALLOWED_RERANKER_PROVIDERS = ("ollama", "none") + + +def _build_reranker(provider: str) -> CrossEncoderClient: + """Build the cross-encoder reranker for the configured provider. + + Defers to ``_PassthroughReranker`` when ``provider`` is ``"none"`` + (the legacy no-op behaviour, useful for CI / slim containers that + cannot pull the reranker model). For ``"ollama"`` it constructs the + real Ollama-backed reranker; the construction is side-effect-free, so + Graphiti initialisation does not depend on the Ollama daemon being + reachable at startup. + """ + if provider == "none": + return _PassthroughReranker() + if provider == "ollama": + return OllamaReranker( + model=Config.RERANKER_MODEL, + base_url=Config.RERANKER_BASE_URL, + api_key=Config.RERANKER_API_KEY, + ) + raise ValueError( + f"Unknown RERANKER_PROVIDER={provider!r}; " + f"allowed: {_ALLOWED_RERANKER_PROVIDERS}" + ) def _build_llm_and_embedder(provider: str): @@ -146,14 +174,19 @@ def _get_graphiti() -> Graphiti: if _graphiti_instance is None: provider = (Config.GRAPHITI_LLM_PROVIDER or "openai").lower() logger.info(f"Initializing Graphiti client (provider={provider})...") + reranker_provider = (Config.RERANKER_PROVIDER or "ollama").lower() + logger.info( + f"Initializing Graphiti reranker (provider={reranker_provider})..." + ) llm_client, embedder = _build_llm_and_embedder(provider) + cross_encoder = _build_reranker(reranker_provider) g = Graphiti( Config.NEO4J_URI, Config.NEO4J_USER, Config.NEO4J_PASSWORD, llm_client=llm_client, embedder=embedder, - cross_encoder=_PassthroughReranker(), + cross_encoder=cross_encoder, ) # Use the persistent loop so the driver is bound to it from the start _run(g.build_indices_and_constraints()) diff --git a/backend/app/services/ollama_reranker.py b/backend/app/services/ollama_reranker.py new file mode 100644 index 00000000..57a455b5 --- /dev/null +++ b/backend/app/services/ollama_reranker.py @@ -0,0 +1,170 @@ +"""Ollama-backed cross-encoder reranker for Graphiti search. + +Replaces the no-op ``_PassthroughReranker`` injected into Graphiti by default +with a real reranker that scores passages against a query through an Ollama +chat model exposed over its OpenAI-compatible ``/v1`` surface. + +The class implements only ``CrossEncoderClient.rank`` (the sole abstract +member Graphiti requires) and is constructed by ``graphiti_adapter._get_graphiti`` +when ``Config.RERANKER_PROVIDER == "ollama"``. It does not perform any +network I/O at construction time so the Flask app can boot even when the +Ollama daemon is unreachable; failures are handled inside ``rank`` and never +propagate, so graph search remains functional under degradation. +""" + +import asyncio +import json +import re +from typing import List, Tuple + +from openai import AsyncOpenAI +from graphiti_core.cross_encoder.client import CrossEncoderClient + +from ..utils.logger import get_logger + +logger = get_logger('mirofish.ollama_reranker') + + +_THINK_BLOCK = re.compile(r"[\s\S]*?", re.IGNORECASE) +_CODE_FENCE_START = re.compile(r"^```(?:json)?\s*\n?", re.IGNORECASE) +_CODE_FENCE_END = re.compile(r"\n?```\s*$") +_FIRST_FLOAT = re.compile(r"-?\d+(?:\.\d+)?") + +_SYSTEM_PROMPT = ( + "You are a relevance grader. Given a user query and a single passage, " + "rate how relevant the passage is to the query on a continuous scale " + "from 0.0 (not relevant at all) to 1.0 (perfectly relevant). " + "Respond with a single JSON object of the form {\"score\": } " + "and nothing else." +) + + +def _clip_unit(value: float) -> float: + """Clamp ``value`` into the closed interval [0.0, 1.0].""" + if value < 0.0: + return 0.0 + if value > 1.0: + return 1.0 + return value + + +def _parse_score(raw: str) -> float: + """Parse a model response into a relevance score in [0.0, 1.0]. + + Strips reasoning ```` blocks and markdown fences (the same + defensive pattern used in ``utils/llm_client.py``), then attempts + ``json.loads`` and reads ``score``. Falls back to extracting the first + floating-point number from the cleaned text. Raises ``ValueError`` when + no numeric value can be recovered. + """ + text = _THINK_BLOCK.sub("", raw or "").strip() + text = _CODE_FENCE_START.sub("", text) + text = _CODE_FENCE_END.sub("", text).strip() + + try: + parsed = json.loads(text) + except (json.JSONDecodeError, TypeError): + parsed = None + + if isinstance(parsed, dict) and "score" in parsed: + try: + return _clip_unit(float(parsed["score"])) + except (TypeError, ValueError): + pass + + match = _FIRST_FLOAT.search(text) + if match is not None: + try: + return _clip_unit(float(match.group(0))) + except ValueError: + pass + + raise ValueError(f"no numeric score in model response: {text!r}") + + +class OllamaReranker(CrossEncoderClient): + """Cross-encoder reranker that scores passages via an Ollama chat model. + + Subclass of :class:`graphiti_core.cross_encoder.client.CrossEncoderClient` + that implements ``rank`` by issuing one chat-completion request per + passage through ``openai.AsyncOpenAI`` (which speaks the OpenAI-compatible + surface exposed by Ollama on ``/v1``). + + Construction is side-effect-free: building the underlying ``AsyncOpenAI`` + client does not perform any network I/O, so ``_get_graphiti`` can wire + this class up at startup even when the Ollama daemon is unavailable. + Failures surface only at ``rank`` call time and are degraded to a + passthrough-style result with a single ``WARNING`` log per failed call. + """ + + def __init__(self, *, model: str, base_url: str, api_key: str) -> None: + """Configure the reranker. + + Args: + model: Name of the Ollama chat model used to score passages + (for example ``qwen2.5:3b``). The operator is expected to + have run ``ollama pull `` before reranking is exercised. + base_url: OpenAI-compatible endpoint for the Ollama server, for + example ``http://localhost:11434/v1``. + api_key: API key forwarded to the OpenAI client. Ollama ignores + the value but the SDK requires a non-empty string. + """ + self._model = model + self._client = AsyncOpenAI(base_url=base_url, api_key=api_key) + + async def _score_passage(self, query: str, passage: str, index: int) -> float: + """Score a single passage; deterministic low fallback on parse failure.""" + user_prompt = ( + f"Query:\n{query}\n\n" + f"Passage:\n{passage}\n\n" + "Reply with only the JSON object described in the system prompt." + ) + response = await self._client.chat.completions.create( + model=self._model, + messages=[ + {"role": "system", "content": _SYSTEM_PROMPT}, + {"role": "user", "content": user_prompt}, + ], + temperature=0.0, + max_tokens=32, + ) + raw = response.choices[0].message.content or "" + try: + return _parse_score(raw) + except ValueError as exc: + logger.debug( + "Reranker parse failure (model=%s, passage_index=%d): %s", + self._model, index, exc, + ) + return -0.001 * (index + 1) + + async def rank( + self, + query: str, + passages: List[str], + ) -> List[Tuple[str, float]]: + """Return ``(passage, score)`` tuples sorted by score descending. + + Empty ``passages`` returns ``[]`` without any model call. On a + whole-call failure (connection refused, model 404, timeout, etc.) + the method logs a single ``WARNING`` and returns the passages in + their original order with synthetic descending scores so graph + search keeps functioning. The method does not raise. + """ + if not passages: + return [] + + try: + scores = await asyncio.gather( + *(self._score_passage(query, p, i) for i, p in enumerate(passages)) + ) + except Exception as exc: # noqa: BLE001 — graceful degrade per design R5 + logger.warning( + "Ollama reranker failed (model=%s, error=%s); falling back to passthrough order.", + self._model, type(exc).__name__, + ) + return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)] + + scored = list(zip(passages, scores)) + scored.sort(key=lambda item: item[1], reverse=True) + return scored