Merge pull request #40 from salestech-group/fix/39-ollama-reranker
fix(graph): replace passthrough reranker with ollama-backed cross-encoder
This commit is contained in:
commit
04a00ac437
|
|
@ -62,7 +62,7 @@ Same upload+build flow; expect identical behaviour to pre-change implementation.
|
|||
## Notes for reviewers
|
||||
|
||||
- **Default provider flipped** from Gemini (de-facto) to OpenAI-compatible (documented). Existing Gemini deployments must add `GRAPHITI_LLM_PROVIDER=gemini` to `.env` after pulling. Documented in the new `.env.example` and design.md migration section.
|
||||
- **Reranker is still passthrough** — same behavioural state as before (no real reranking). A real per-provider reranker is intentionally deferred; explanation in `research.md` → "Reranker default behaviour".
|
||||
- **Reranker is still passthrough** — same behavioural state as before (no real reranking). _Update:_ this was deferred from this spec and has since shipped in follow-up spec `graphiti-ollama-reranker` (ticket #39): the default is now an Ollama-backed `CrossEncoderClient`; `RERANKER_PROVIDER=none` preserves the passthrough behaviour described here.
|
||||
- **`.env.example` write went through Python heredoc** because `pre_tool_env_guard.sh` blocks `cat > .env*` patterns. Worth confirming the file content is what you expect; the new content mirrors the README env section verbatim.
|
||||
|
||||
## Spec artefacts
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@
|
|||
- `.env.example` matches what the code reads; the README is unchanged (already correct).
|
||||
|
||||
### Non-Goals
|
||||
- Implementing a real per-provider reranker (deferred to a follow-up).
|
||||
- Implementing a real per-provider reranker (deferred to a follow-up — shipped in `graphiti-ollama-reranker`, ticket #39).
|
||||
- Pagination cleanup of `_NodeNamespace.get_by_graph_id` / `_EdgeNamespace.get_by_graph_id` (low priority, deferred).
|
||||
- Renaming `zep_*` files (tracked separately).
|
||||
- Migrating data from existing Zep Cloud deployments (project is local-only by design now).
|
||||
|
|
@ -336,7 +336,7 @@ class _PassthroughReranker(CrossEncoderClient):
|
|||
**Implementation Notes**
|
||||
- Integration: Always injected by `_get_graphiti()` regardless of provider.
|
||||
- Validation: None.
|
||||
- Risks: Search results are still un-reranked. Same behaviour as today; future ticket may introduce a real per-provider reranker.
|
||||
- Risks: Search results are still un-reranked. Same behaviour as today; superseded by follow-up spec `graphiti-ollama-reranker` (ticket #39), which introduces a real Ollama-backed reranker and keeps this passthrough only when `RERANKER_PROVIDER=none`.
|
||||
|
||||
#### `_get_graphiti()` (refactored)
|
||||
|
||||
|
|
|
|||
|
|
@ -24,7 +24,7 @@
|
|||
- **Context**: Ticket suggests dropping `_GeminiReranker` and "letting Graphiti use its sane default." Verify the default is sane for Qwen.
|
||||
- **Sources Consulted**: `graphiti_core/graphiti.py:154`, `graphiti_core/cross_encoder/openai_reranker_client.py`.
|
||||
- **Findings**: Default is `OpenAIRerankerClient()` with no config → tries `AsyncOpenAI(api_key=None, base_url=None)` → 401 against any non-OpenAI key. Reranker model is fixed to `gpt-4.1-nano`, which Dashscope does not host.
|
||||
- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker is out of scope (would need a custom OpenAI-compatible logprobs implementation, which Dashscope/Qwen does not reliably support).
|
||||
- **Implications**: Cannot rely on Graphiti's default. Continue to inject an explicit passthrough reranker so Qwen users do not silently 401 in search code paths. A real per-provider reranker was out of scope for this spec; follow-up spec `graphiti-ollama-reranker` (ticket #39) replaces the passthrough with an Ollama-backed `CrossEncoderClient` and keeps `_PassthroughReranker` only when `RERANKER_PROVIDER=none`.
|
||||
|
||||
### Env-guard hook scope
|
||||
- **Context**: First Read of `.env.example` was blocked.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,53 @@
|
|||
# Handoff — graphiti-ollama-reranker
|
||||
|
||||
## What shipped
|
||||
|
||||
| Task | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| 1.1 — Config knobs | ✅ | Four `RERANKER_*` attrs added; `BASE_URL`/`API_KEY` chain to `EMBEDDING_*`. |
|
||||
| 2.1 — `OllamaReranker` | ✅ | New `backend/app/services/ollama_reranker.py`. Construction is side-effect-free; `rank()` never raises; per-passage parse falls back to deterministic low score; whole-call failure degrades to passthrough order with a single WARNING log. |
|
||||
| 3.1 — Factory wiring | ✅ | `_get_graphiti()` selects the reranker via new `_build_reranker()`. INFO log announces selection. `ValueError` raised for unknown providers. |
|
||||
| 4.1 — `.env.example` | ⚠️ Deferred | The `pre_tool_env_guard.sh` Claude hook blocks all `.env*` access (Read, Write, Edit, Bash). Cannot be performed inside this autonomous sandbox. **Reviewer action required** — see snippet below. |
|
||||
| 4.2 — `CLAUDE.md` | ✅ | New `RERANKER_*` block added under "Required Environment Variables". |
|
||||
| 4.3 — `README.md` | ✅ | Adds `ollama pull qwen2.5:3b` to the prerequisites and a `RERANKER_*` block in the `.env` snippet. `README-EN.md` / `README-ZH.md` left out per design scope (i18n is its own workstream). |
|
||||
| 4.4 — Prior-spec follow-up note | ✅ | Updated `graphiti-neo4j-finalize`'s `research.md`, `design.md`, and `HANDOFF.md` to point at this spec; updated the `_PassthroughReranker` docstring in `graphiti_adapter.py`. |
|
||||
| 5.1 — Structural sweep | ✅ | `gpt-4.1-nano` / `OpenAIRerankerClient` referenced only in docstring text. `OllamaReranker` has exactly one import + one use site. `_GraphNamespace.search` still filters by `group_id`. |
|
||||
|
||||
## Reviewer action required: `.env.example`
|
||||
|
||||
Please paste the following block into `.env.example` alongside the existing `EMBEDDING_*` section:
|
||||
|
||||
```env
|
||||
# Reranker — reorders Graphiti search results before the report tools see them.
|
||||
# Default targets the same local Ollama host used for embeddings.
|
||||
# Pre-requisite for the default: `ollama pull qwen2.5:3b`.
|
||||
# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI /
|
||||
# slim containers that cannot pull a reranker model).
|
||||
RERANKER_PROVIDER=ollama
|
||||
RERANKER_MODEL=qwen2.5:3b
|
||||
# Optional — both default to the EMBEDDING_* equivalents when unset.
|
||||
# RERANKER_BASE_URL=http://localhost:11434/v1
|
||||
# RERANKER_API_KEY=ollama
|
||||
```
|
||||
|
||||
This block matches what `CLAUDE.md` and `README.md` document. After paste, R6.1 is satisfied and ticket #39's acceptance-criteria checkbox "Configuration is overridable via env vars and documented in `.env.example`" becomes green.
|
||||
|
||||
## Verification performed
|
||||
|
||||
- `Config` loads with the documented defaults; `EMBEDDING_BASE_URL` override propagates to `RERANKER_BASE_URL`.
|
||||
- `OllamaReranker` constructs without network I/O; empty `passages` returns `[]`; whole-call failure logs WARNING and returns passthrough-ordered tuples.
|
||||
- `_build_reranker("ollama")` → `OllamaReranker`; `("none")` → `_PassthroughReranker`; `("banana")` → `ValueError` naming the offender and listing `("ollama", "none")`.
|
||||
- Grep sweep matches design expectations (see Tasks 5.1 in `tasks.md`).
|
||||
|
||||
## Smoke test (recommended before merge)
|
||||
|
||||
With Ollama running and the reranker model pulled:
|
||||
|
||||
```bash
|
||||
ollama pull qwen2.5:3b
|
||||
RERANKER_PROVIDER=ollama npm run backend
|
||||
# In another shell, exercise a graph build + report tool and confirm:
|
||||
# - Startup log shows "Initializing Graphiti reranker (provider=ollama)..."
|
||||
# - Search-backed report tool results differ from `RERANKER_PROVIDER=none` output
|
||||
# - No WARNING about reranker failure in `backend/logs/`
|
||||
```
|
||||
|
|
@ -0,0 +1,395 @@
|
|||
# Design — graphiti-ollama-reranker
|
||||
|
||||
## Overview
|
||||
**Purpose**: Replace the no-op `_PassthroughReranker` injected into Graphiti with a real Ollama-backed `CrossEncoderClient`, so that hybrid search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (`RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY`) with Ollama-aligned defaults; an explicit `RERANKER_PROVIDER=none` preserves the passthrough for CI and slim containers.
|
||||
|
||||
**Users**: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.
|
||||
|
||||
**Impact**: Adds one new module under `backend/app/services/`, four `Config` attributes, a small selection branch in `_get_graphiti()`, and documentation in `.env.example`, `CLAUDE.md`, `README.md`. No data schema, no API, no UI changes. Behavior under `RERANKER_PROVIDER=none` is identical to today.
|
||||
|
||||
### Goals
|
||||
- Default Ollama-backed reranker producing one `(passage, score)` tuple per input passage, sorted descending by score.
|
||||
- Env-driven configuration with sensible Ollama defaults inherited from existing `EMBEDDING_*` settings.
|
||||
- Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
|
||||
- Documentation parity with `EMBEDDING_*` knobs in `.env.example`, `CLAUDE.md`, and `README.md`.
|
||||
|
||||
### Non-Goals
|
||||
- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
|
||||
- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
|
||||
- Upstream contributions to `graphiti-core`.
|
||||
- Adding a `sentence-transformers` or other non-`openai` reranker dependency.
|
||||
|
||||
## Boundary Commitments
|
||||
|
||||
### This Spec Owns
|
||||
- The Ollama reranker implementation and its prompt/parse logic.
|
||||
- The `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY` settings and their defaults.
|
||||
- The branch in `_get_graphiti()` that selects between the Ollama reranker and the passthrough.
|
||||
- The startup INFO log line that announces the selected reranker.
|
||||
- Documentation entries in `.env.example`, `CLAUDE.md` "Required Environment Variables", and `README.md` Ollama prerequisites.
|
||||
|
||||
### Out of Boundary
|
||||
- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
|
||||
- Per-passage retrieval (still owned by `_GraphNamespace.search` and Graphiti).
|
||||
- The `group_id` scoping rules.
|
||||
- Any change to the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) — they receive reranked output transparently.
|
||||
- Implementation of additional reranker providers; this design covers only `ollama` and `none`.
|
||||
|
||||
### Allowed Dependencies
|
||||
- Upstream library: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0).
|
||||
- In-repo: `Config` (`backend/app/config.py`), `get_logger` (`backend/app/utils/logger.py`), `openai.AsyncOpenAI` (already installed).
|
||||
- Existing factory: `_get_graphiti()` continues to be the singleton chokepoint.
|
||||
|
||||
### Revalidation Triggers
|
||||
- If `graphiti-core` changes the `CrossEncoderClient.rank` signature, this design must be revisited.
|
||||
- If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in `research.md`).
|
||||
- If `Config.GRAPHITI_LLM_PROVIDER` semantics change in a way that re-couples LLM and reranker, this design must be checked.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Existing Architecture Analysis
|
||||
- `_get_graphiti()` already injects an explicit `cross_encoder=_PassthroughReranker()` (line 156). The pattern of double-checked-locking singleton with provider switch (`GRAPHITI_LLM_PROVIDER`) is mature and must be preserved.
|
||||
- The persistent event loop (`_get_loop`, `_run`) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does **not** need to schedule work onto `_get_loop()`.
|
||||
- All four ReportAgent tools call `_GraphNamespace.search`, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.
|
||||
|
||||
### Architecture Pattern & Boundary Map
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Config
|
||||
EnvVars[RERANKER_*\nenv vars]
|
||||
ConfigCls[Config attributes]
|
||||
EnvVars --> ConfigCls
|
||||
end
|
||||
|
||||
subgraph Adapter
|
||||
Factory[_get_graphiti]
|
||||
Passthrough[_PassthroughReranker]
|
||||
OllamaCls[OllamaReranker]
|
||||
Factory -->|provider=none| Passthrough
|
||||
Factory -->|provider=ollama| OllamaCls
|
||||
end
|
||||
|
||||
subgraph Graphiti
|
||||
GraphitiCore[Graphiti instance]
|
||||
Search[_GraphNamespace.search]
|
||||
Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
|
||||
end
|
||||
|
||||
ConfigCls --> Factory
|
||||
Passthrough -->|injected as cross_encoder| GraphitiCore
|
||||
OllamaCls -->|injected as cross_encoder| GraphitiCore
|
||||
GraphitiCore --> Search
|
||||
Search --> Tools
|
||||
|
||||
OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]
|
||||
```
|
||||
|
||||
**Architecture Integration**:
|
||||
- **Selected pattern**: Strategy pattern with two implementations selected at factory time. Same shape as the existing `GRAPHITI_LLM_PROVIDER` branch.
|
||||
- **Domain/feature boundaries**: Reranker construction and prompt/parse live in `ollama_reranker.py`. Wiring lives in `graphiti_adapter.py`. Config lives in `config.py`. No overlap.
|
||||
- **Existing patterns preserved**: Double-checked-locking singleton; explicit `cross_encoder` injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; `Config` reads via `os.environ.get(..., default)`.
|
||||
- **New components rationale**: `OllamaReranker` is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
|
||||
- **Steering compliance**: Single OpenAI-SDK convention preserved; per-project `group_id` scoping unaffected; no new dependency.
|
||||
|
||||
### Technology Stack
|
||||
|
||||
| Layer | Choice / Version | Role in Feature | Notes |
|
||||
|-------|------------------|-----------------|-------|
|
||||
| Backend / Services | Python ≥3.11, async via `asyncio` | Hosts the new reranker class. | Inherits project minimum. |
|
||||
| LLM client | `openai` SDK (already pinned, v2.x) | `AsyncOpenAI` chat completions against Ollama's `/v1`. | No new dependency. |
|
||||
| Model | Ollama-served chat model, default `qwen2.5:3b` | Produces a numeric relevance score per passage. | Operator may override via `RERANKER_MODEL`. |
|
||||
| Endpoint | Ollama's OpenAI-compatible `/v1` | Default `http://localhost:11434/v1`. | Reuses `EMBEDDING_BASE_URL` semantics. |
|
||||
| Graph layer | `graphiti-core ≥ 0.3` | Consumes the new `CrossEncoderClient`. | No upstream change. |
|
||||
|
||||
## File Structure Plan
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
backend/app/
|
||||
├── services/
|
||||
│ ├── graphiti_adapter.py # MODIFIED — factory branches on RERANKER_PROVIDER
|
||||
│ └── ollama_reranker.py # NEW — OllamaReranker(CrossEncoderClient)
|
||||
├── config.py # MODIFIED — adds RERANKER_* attrs
|
||||
└── utils/
|
||||
└── logger.py # unchanged
|
||||
|
||||
repo-root/
|
||||
├── .env.example # MODIFIED — adds RERANKER_* block
|
||||
├── CLAUDE.md # MODIFIED — Required Environment Variables
|
||||
└── README.md # MODIFIED — Ollama prerequisites note
|
||||
```
|
||||
|
||||
### Modified Files
|
||||
- `backend/app/services/graphiti_adapter.py` — Add small branch in `_get_graphiti()` that picks `OllamaReranker()` or `_PassthroughReranker()` based on `Config.RERANKER_PROVIDER`. Log the selection at INFO. `_PassthroughReranker` class is unchanged.
|
||||
- `backend/app/config.py` — Add four new class attributes with documented defaults. No change to existing `validate()` (reranker has no mandatory key).
|
||||
- `.env.example` — Add a four-line `RERANKER_*` block with comments mirroring the `EMBEDDING_*` style.
|
||||
- `CLAUDE.md` — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
|
||||
- `README.md` — Update the Ollama prerequisite section to mention `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large`.
|
||||
|
||||
> `_PassthroughReranker` stays in `graphiti_adapter.py` (unchanged contract); only the wiring around it changes.
|
||||
|
||||
## System Flows
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Search as _GraphNamespace.search
|
||||
participant Graphiti as graphiti-core
|
||||
participant Reranker as OllamaReranker.rank
|
||||
participant Ollama as Ollama /v1/chat/completions
|
||||
|
||||
Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
|
||||
Graphiti->>Graphiti: hybrid retrieval (RRF)
|
||||
Graphiti->>Reranker: rank(query, [p1..pN])
|
||||
par per-passage scoring
|
||||
Reranker->>Ollama: chat.completions(prompt p1, temp=0)
|
||||
Reranker->>Ollama: chat.completions(prompt p2, temp=0)
|
||||
Reranker->>Ollama: chat.completions(prompt pN, temp=0)
|
||||
end
|
||||
alt all scores parsed
|
||||
Reranker-->>Graphiti: sorted [(p, score), ...]
|
||||
else any failure
|
||||
Reranker->>Reranker: log WARNING, return passthrough order
|
||||
Reranker-->>Graphiti: original order with synthetic scores
|
||||
end
|
||||
Graphiti-->>Search: ranked edges/nodes
|
||||
Search-->>Tools: ranked results
|
||||
```
|
||||
|
||||
**Decision points after diagram**:
|
||||
- `temperature=0.0` makes the score deterministic per (query, passage, model) tuple.
|
||||
- Per-passage failures (one bad parse out of N) downrank that passage to `0.0 - 0.001 * index` and continue; only whole-call exceptions degrade to passthrough.
|
||||
- The reranker never raises; this isolates Graphiti from upstream noise even when `_GraphNamespace.search`'s existing exception swallow is removed in a future refactor.
|
||||
|
||||
## Requirements Traceability
|
||||
|
||||
| Requirement | Summary | Components | Interfaces | Flows |
|
||||
|-------------|---------|------------|------------|-------|
|
||||
| 1.1 | Default reranker is Ollama-backed | `_get_graphiti()`, `OllamaReranker` | Inline factory branch | Adapter init |
|
||||
| 1.2 | No dependency on `OpenAIRerankerClient` | `_get_graphiti()` | Explicit `cross_encoder=` injection (unchanged behavior) | — |
|
||||
| 1.3 | Unset → defaults to `ollama` | `Config.RERANKER_PROVIDER` | `os.environ.get('RERANKER_PROVIDER', 'ollama')` | — |
|
||||
| 1.4 | No `gpt-4.1-nano` reference | All new files | — | — |
|
||||
| 2.1 | Subclass `CrossEncoderClient.rank` | `OllamaReranker` | `async rank(query, passages) -> list[tuple[str, float]]` | Per-passage scoring |
|
||||
| 2.2 | Uses `openai.AsyncOpenAI` | `OllamaReranker.__init__` | `AsyncOpenAI(base_url, api_key)` | — |
|
||||
| 2.3 | Returns passages sorted descending | `OllamaReranker.rank` | Postcondition: descending by score | — |
|
||||
| 2.4 | Empty input → empty output, no model call | `OllamaReranker.rank` | Guard at method entry | — |
|
||||
| 2.5 | Preserves passage strings byte-for-byte | `OllamaReranker.rank` | Strings are echoed, never rewritten | — |
|
||||
| 2.6 | Unparseable score → deterministic low fallback | `OllamaReranker.rank` | Internal `_parse_score` helper | Failure branch |
|
||||
| 3.1 | `RERANKER_PROVIDER` env knob | `Config` | Class attr, default `ollama`, validated `{ollama, none}` | Adapter init |
|
||||
| 3.2 | `RERANKER_MODEL` env knob | `Config` | Class attr, default `qwen2.5:3b` | — |
|
||||
| 3.3 | `RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL` | `Config` | Class attr resolves at read time | — |
|
||||
| 3.4 | `RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY` | `Config` | Class attr | — |
|
||||
| 3.5 | Unknown value → `ValueError` | `_get_graphiti()` | `_ALLOWED_RERANKER_PROVIDERS` validation | Adapter init |
|
||||
| 3.6 | Reads via `os.environ.get` only | `Config` | — | — |
|
||||
| 4.1 | `none` keeps `_PassthroughReranker` | `_get_graphiti()` | Factory branch | Adapter init |
|
||||
| 4.2 | Graph search remains functional under `none` | `_PassthroughReranker.rank` (unchanged) | — | — |
|
||||
| 4.3 | INFO log announces selected provider | `_get_graphiti()` | `logger.info` line | Adapter init |
|
||||
| 5.1 | WARNING log on rerank failure | `OllamaReranker.rank` | `logger.warning` with model + error class | Failure branch |
|
||||
| 5.2 | No exception propagation to HTTP callers | `OllamaReranker.rank` (never raises) | — | — |
|
||||
| 5.3 | Original order on whole-call failure | `OllamaReranker.rank` | Passthrough fallback inside method | Failure branch |
|
||||
| 5.4 | `__init__` never raises | `OllamaReranker.__init__` | `AsyncOpenAI()` lazy I/O | Adapter init |
|
||||
| 6.1 | `.env.example` documents the four vars | `.env.example` | — | — |
|
||||
| 6.2 | `CLAUDE.md` lists the four vars | `CLAUDE.md` | — | — |
|
||||
| 6.3 | `README.md` mentions `ollama pull <model>` | `README.md` | — | — |
|
||||
| 6.4 | Old "follow-up" claim updated | `graphiti-neo4j-finalize/research.md` (or design.md) | — | — |
|
||||
| 7.1 | Reranked order reaches `_GraphNamespace.search` | `OllamaReranker`, `_get_graphiti()` | Through Graphiti's own `search()` | End-to-end |
|
||||
| 7.2 | No changes to report tools | n/a | n/a | — |
|
||||
| 7.3 | `group_id` scoping unchanged | `_GraphNamespace.search` (unchanged) | — | — |
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|
||||
|-----------|--------------|--------|--------------|--------------------------|-----------|
|
||||
| `OllamaReranker` | Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1) | Service |
|
||||
| `Config` (extended) | Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.1–3.6, 4.1 | `os.environ.get` (P0) | State (configuration) |
|
||||
| `_get_graphiti()` (extended) | Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | `Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0) | Service |
|
||||
| `.env.example`, `CLAUDE.md`, `README.md` | Docs | Communicate new knobs and Ollama prerequisite. | 6.1–6.4 | — | — |
|
||||
|
||||
---
|
||||
|
||||
### Backend / Services
|
||||
|
||||
#### `OllamaReranker`
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. |
|
||||
| Requirements | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
- Subclass `graphiti_core.cross_encoder.client.CrossEncoderClient`; implement only `rank`.
|
||||
- Use `openai.AsyncOpenAI`; no second SDK; no top-level network I/O in `__init__`.
|
||||
- Preserve passage strings byte-for-byte; never rewrite or truncate.
|
||||
- Never raise from `rank()`. On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
|
||||
- Deterministic scoring: `temperature=0.0`, no randomness in fallback scores.
|
||||
- Thread-safety: stateless beyond the immutable `AsyncOpenAI` client and string config; safe under Graphiti's concurrent search.
|
||||
|
||||
**Dependencies**
|
||||
- Inbound: `_get_graphiti()` — instantiates a single instance and passes it as `cross_encoder=` to `Graphiti(...)` (P0).
|
||||
- Outbound: `Ollama /v1/chat/completions` via `openai.AsyncOpenAI` (P0).
|
||||
- External: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai` SDK (P0).
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
class OllamaReranker(CrossEncoderClient):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
model: str,
|
||||
base_url: str,
|
||||
api_key: str,
|
||||
) -> None: ...
|
||||
|
||||
async def rank(
|
||||
self,
|
||||
query: str,
|
||||
passages: list[str],
|
||||
) -> list[tuple[str, float]]:
|
||||
"""
|
||||
Score each passage's relevance to `query` and return
|
||||
`(passage, score)` tuples sorted in descending order of score.
|
||||
|
||||
Preconditions:
|
||||
- `passages` is a (possibly empty) list of strings.
|
||||
|
||||
Postconditions:
|
||||
- len(return) == len(passages).
|
||||
- return is sorted by score descending.
|
||||
- For all i, return[i][0] is byte-identical to one of the inputs.
|
||||
- For any rank() call, this method does not raise.
|
||||
|
||||
Invariants:
|
||||
- Successfully-parsed scores fall in [0.0, 1.0].
|
||||
- Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
|
||||
and are strictly less than every successfully-parsed score.
|
||||
"""
|
||||
```
|
||||
|
||||
**Implementation Notes**
|
||||
- **Integration**: Constructed inside `_get_graphiti()` when `Config.RERANKER_PROVIDER == "ollama"`; injected into `Graphiti(..., cross_encoder=...)`.
|
||||
- **Validation**:
|
||||
- Reject empty `passages` immediately with `return []`.
|
||||
- Clip parsed `score` to `[0.0, 1.0]`.
|
||||
- Treat any uncaught per-passage exception as parse failure and assign deterministic fallback `-0.001 * passage_index`.
|
||||
- Treat any whole-call exception (e.g. connection refused) as graceful degrade: return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]`.
|
||||
- **Risks**: Default `qwen2.5:3b` must be `ollama pull`-ed by operators; documented in README. If absent, R5 path kicks in.
|
||||
|
||||
---
|
||||
|
||||
### Backend / Config
|
||||
|
||||
#### `Config` (extended)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. |
|
||||
| Requirements | 1.3, 3.1–3.6, 4.1 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
- Read from `os.environ.get` only; no new dependency.
|
||||
- `RERANKER_PROVIDER` default `ollama`; valid values: `ollama`, `none`.
|
||||
- `RERANKER_MODEL` default `qwen2.5:3b`.
|
||||
- `RERANKER_BASE_URL` default = `EMBEDDING_BASE_URL` value at module load time.
|
||||
- `RERANKER_API_KEY` default = `EMBEDDING_API_KEY` value at module load time.
|
||||
- Validation of `RERANKER_PROVIDER` happens in `_get_graphiti()` (not `Config.validate()`) to keep the validate-at-boot list focused on credential presence.
|
||||
|
||||
**Contracts**: State [x]
|
||||
|
||||
##### State Management
|
||||
- **State model**: Read-only class attributes resolved once at import.
|
||||
- **Persistence & consistency**: None; values come from environment.
|
||||
- **Concurrency strategy**: Immutable after import; safe.
|
||||
|
||||
**Implementation Notes**
|
||||
- **Integration**: Defaults for `RERANKER_BASE_URL` / `RERANKER_API_KEY` should reference the corresponding `EMBEDDING_*` env vars (not the resolved `Config.EMBEDDING_BASE_URL` constant) so an operator setting only `EMBEDDING_BASE_URL` still gets the reranker pointed at the same Ollama host without needing to set `RERANKER_BASE_URL` explicitly. Implementation reads `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`.
|
||||
- **Validation**: None at config-load time. Provider value is validated by `_get_graphiti()`.
|
||||
- **Risks**: An operator who overrides `EMBEDDING_BASE_URL` but not `RERANKER_BASE_URL` will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.
|
||||
|
||||
---
|
||||
|
||||
### Backend / Adapter
|
||||
|
||||
#### `_get_graphiti()` (extended)
|
||||
|
||||
| Field | Detail |
|
||||
|-------|--------|
|
||||
| Intent | Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice. |
|
||||
| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 |
|
||||
|
||||
**Responsibilities & Constraints**
|
||||
- Preserve double-checked locking and singleton semantics exactly.
|
||||
- Read `Config.RERANKER_PROVIDER` once at construction; do not re-read.
|
||||
- For `ollama`: construct `OllamaReranker(model=..., base_url=..., api_key=...)`.
|
||||
- For `none`: construct `_PassthroughReranker()` (current behavior preserved).
|
||||
- For any other value: raise `ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")` — mirrors the existing `_ALLOWED_GRAPHITI_PROVIDERS` validation pattern.
|
||||
- Log at INFO once: `f"Initializing Graphiti reranker (provider={provider})..."`.
|
||||
|
||||
**Contracts**: Service [x]
|
||||
|
||||
##### Service Interface
|
||||
|
||||
```python
|
||||
def _get_graphiti() -> Graphiti:
|
||||
"""Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""
|
||||
```
|
||||
|
||||
**Implementation Notes**
|
||||
- **Integration**: Replaces the unconditional `cross_encoder=_PassthroughReranker()` at `graphiti_adapter.py:156` with a `cross_encoder=_build_reranker(provider)` call. The factory helper lives next to `_build_llm_and_embedder` in the same file.
|
||||
- **Validation**: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
|
||||
- **Risks**: A typo such as `RERANKER_PROVIDER=Ollama` (capitalized) would raise; the helper lowercases the value before comparison, matching `_get_graphiti`'s existing `(... or "openai").lower()` pattern.
|
||||
|
||||
---
|
||||
|
||||
### Documentation
|
||||
|
||||
| File | Change | Requirements |
|
||||
|------|--------|--------------|
|
||||
| `.env.example` | Add commented block with the four `RERANKER_*` vars and their defaults. Position adjacent to the existing `EMBEDDING_*` block. | 6.1 |
|
||||
| `CLAUDE.md` | Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`. | 6.2 |
|
||||
| `README.md` | In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments. | 6.3 |
|
||||
| `.kiro/specs/graphiti-neo4j-finalize/research.md` | Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 |
|
||||
|
||||
> README also has `README-EN.md` and `README-ZH.md` — the canonical user-facing README is `README.md` per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.
|
||||
|
||||
## Data Models
|
||||
Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the `list[tuple[str, float]]` already defined by `CrossEncoderClient.rank`.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Strategy
|
||||
- **Construction errors**: None possible (no network in `__init__`; no required keys to validate).
|
||||
- **Per-passage errors**: Caught inside `OllamaReranker.rank`. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
|
||||
- **Whole-call errors** (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost `try/except` in `rank`. Logged at WARNING with model name and error class. Returns `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` — same shape as `_PassthroughReranker` so consumers cannot tell the difference structurally.
|
||||
- **Configuration errors**: `_get_graphiti()` raises `ValueError` at startup if `RERANKER_PROVIDER` is unknown. The Flask app fails to boot — preferred over silent misconfiguration.
|
||||
|
||||
### Error Categories and Responses
|
||||
| Category | Trigger | Response |
|
||||
|----------|---------|----------|
|
||||
| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. |
|
||||
| User input (4xx-equivalent) | Unknown `RERANKER_PROVIDER` value | `ValueError` at startup; clear message naming allowed values. |
|
||||
| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. |
|
||||
|
||||
### Monitoring
|
||||
- INFO log at startup states the selected provider.
|
||||
- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
|
||||
- No metrics surface yet; can be added if the reranker becomes a hot path.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
This project intentionally keeps the test surface minimal (`backend/scripts/test_profile_format.py` is the lone pytest target). Per `steering/tech.md`, do **not** add a heavy test harness.
|
||||
|
||||
- **Unit-level verification** (manual, by the implementer, no committed test files unless small and clearly worth keeping):
|
||||
1. Constructing `OllamaReranker` with a bad host does not raise; first `rank()` call logs WARNING and returns passthrough output.
|
||||
2. `rank(query, [])` returns `[]` and does not call the client.
|
||||
3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
|
||||
4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
|
||||
- **Integration smoke** (manual): With `qwen2.5:3b` pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. `RERANKER_PROVIDER=none`.
|
||||
- **Boundary verification**: Grep that `gpt-4.1-nano` and `OpenAIRerankerClient` do not appear in any new code path.
|
||||
|
||||
## Supporting References
|
||||
- `research.md` — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
|
||||
- `gap-analysis.md` — Requirement-to-asset map.
|
||||
- `.ticket/39.md` — Source ticket text.
|
||||
|
|
@ -0,0 +1,111 @@
|
|||
# Implementation Gap Analysis — graphiti-ollama-reranker
|
||||
|
||||
## 1. Current State Investigation
|
||||
|
||||
### Domain Assets
|
||||
|
||||
| Asset | Location | Current behavior |
|
||||
|-------|----------|------------------|
|
||||
| `_PassthroughReranker` | `backend/app/services/graphiti_adapter.py:38-51` | Subclass of `graphiti_core.cross_encoder.client.CrossEncoderClient`. `rank(query, passages)` returns `(passage, 1.0 - 0.01 * i)` tuples in input order — no model call. |
|
||||
| Graphiti factory | `backend/app/services/graphiti_adapter.py:142-162` (`_get_graphiti`) | Double-checked-locking singleton. Branches on `Config.GRAPHITI_LLM_PROVIDER` (`openai` / `gemini`). Always injects `_PassthroughReranker()` as `cross_encoder`. Runs `g.build_indices_and_constraints()` on the persistent event loop. |
|
||||
| LLM/embedder builder | `backend/app/services/graphiti_adapter.py:92-139` (`_build_llm_and_embedder`) | Lazy-imports provider-specific Graphiti classes. Reads `Config.LLM_*` and `Config.EMBEDDING_*`. |
|
||||
| Config surface | `backend/app/config.py:33-53` | Single class with class attrs; each is `os.environ.get('KEY', 'default')`. Has `EMBEDDING_MODEL`, `EMBEDDING_BASE_URL`, `EMBEDDING_API_KEY` defaults aligned with local Ollama. |
|
||||
| Graph-search callers | `_GraphNamespace.search` at `graphiti_adapter.py:488-517`; consumed by `zep_tools.py:491` (`ZepToolsService.search_graph`) and `oasis_profile_generator.py:313, 337`. | All call sites already dropped the misleading `reranker=` kwarg in `graphiti-neo4j-finalize`. They invoke `client.graph.search(graph_id, query, limit, scope)` only. |
|
||||
| Existing LLM wrapper | `backend/app/utils/llm_client.py` | Uses synchronous `OpenAI()` client. Includes reasoning-model `<think>` stripping and a JSON-mode retry. Not directly relevant to the reranker but documents the in-house OpenAI-SDK pattern. |
|
||||
| Async-loop helper | `graphiti_adapter.py:54-79` (`_get_loop`, `_run`) | Persistent dedicated event-loop thread used for all Graphiti async calls. The reranker's `rank` is **already** awaited by Graphiti itself, not by `_run`, so the new client can use plain `await` on `openai.AsyncOpenAI`. |
|
||||
|
||||
### Conventions Observed
|
||||
|
||||
- 4-space indent, snake_case, double quotes; English + Chinese mixed in comments — preserve both styles.
|
||||
- New env vars go into `backend/app/config.py` as class attrs reading from `os.environ.get` with a sensible default. Validation is centralized in `Config.validate()`.
|
||||
- New backend modules live under `backend/app/services/` with module-level `logger = get_logger('mirofish.<topic>')`.
|
||||
- The OpenAI SDK is the only LLM client. New providers do not add a second SDK — they add a base-URL + model knob.
|
||||
- No tests for graph code beyond `scripts/test_profile_format.py`; the project explicitly discourages adding a heavy test harness.
|
||||
|
||||
### Integration Surfaces
|
||||
|
||||
- **Upstream contract**: `CrossEncoderClient` is consumed by `graphiti_core` during `Graphiti.search()` execution; the framework calls `await reranker.rank(query, passages)` on whatever event loop the caller is using.
|
||||
- **Inbound integration**: only one wire point — the `cross_encoder=` kwarg on `Graphiti(...)` in `_get_graphiti()` (`graphiti_adapter.py:156`).
|
||||
- **Outbound integration**: the reranker calls Ollama via `http://localhost:11434/v1/chat/completions` (OpenAI-compatible). Already proven by `EMBEDDING_BASE_URL` for embeddings; Ollama's chat endpoint follows the same surface.
|
||||
|
||||
## 2. Requirements Feasibility Analysis
|
||||
|
||||
### Requirement-to-Asset Map
|
||||
|
||||
| Requirement | Existing assets | New assets needed | Gap tag |
|
||||
|-------------|-----------------|-------------------|---------|
|
||||
| R1: Default is Ollama, not OpenAI default | `_get_graphiti()` already injects an explicit reranker (no default fallthrough). | Switch the injected client class based on `RERANKER_PROVIDER`. | Missing (selection logic). |
|
||||
| R2: Real `CrossEncoderClient` calling Ollama via OpenAI SDK | Pattern proven in `llm_client.py`; `openai` already in `pyproject.toml`. | New `OllamaReranker` class — subclass of `CrossEncoderClient`, uses `openai.AsyncOpenAI` for `rank()`. | Missing. |
|
||||
| R3: Env knobs (`RERANKER_PROVIDER/MODEL/BASE_URL/API_KEY`) | Config pattern is established (`EMBEDDING_*` etc.). | Four new `Config` attrs, with defaults falling back to embedding settings where stated. | Missing. |
|
||||
| R4: `none` provider preserves passthrough | `_PassthroughReranker` already exists. | Branch in `_get_graphiti()` to pick passthrough when provider == `none`. | Missing (small). |
|
||||
| R5: Graceful degradation when Ollama is down | `_GraphNamespace.search` (lines 515-517) already catches all exceptions and returns empty results with a warning log. | Reranker `rank` must catch its own network/parse errors, log them, and return the original passages with synthetic scores so search still returns *something*. | Missing (within new class). |
|
||||
| R6: Docs (`.env.example`, `CLAUDE.md`, README) | Existing docs already document `EMBEDDING_*` in three places — pattern is clear. | Add 4 new env lines + Ollama pull note. | Missing (text). |
|
||||
| R7: Report tools get reranked output transparently | `_GraphNamespace.search` is the single chokepoint already used by all 4 tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`). | None — wiring change in factory propagates automatically. | None (verification only). |
|
||||
|
||||
### Constraints
|
||||
|
||||
- **Async contract**: `CrossEncoderClient.rank` is `async def`. The new client must be async. The OpenAI SDK provides `openai.AsyncOpenAI` for this.
|
||||
- **Ollama model output shape**: A small chat model (`qwen2.5:3b`, `llama3.2:3b`) can be prompted to emit a numeric score; we cannot rely on `logprobs` because Ollama's OpenAI-compatible surface does not always expose `logprobs`/`logit_bias` consistently. Therefore the scoring strategy is "ask the model for a 0–10 (or 0–1) relevance score per passage and parse it from the text response."
|
||||
- **No new dependency** allowed. Reranker must reuse `openai` SDK (already installed) — confirmed in `backend/.venv/lib/python3.13/site-packages/openai-2.35.1.dist-info/`.
|
||||
- **Boot must not fail** when Ollama is unreachable (R5.4). Construction is cheap (build an `AsyncOpenAI` client; no network call). The model availability check happens lazily on first `rank()`.
|
||||
|
||||
### Complexity Signals
|
||||
|
||||
- Mostly a **single file plus config plus docs** change. Algorithmic logic is local to the new class (prompt + parse). No data model changes, no API surface changes, no UI changes.
|
||||
|
||||
### Research Needed (Carry into Design)
|
||||
|
||||
- **Model choice**: pick a small Ollama chat model that (a) is widely pulled, (b) reliably emits a numeric score in a 1–2 token answer, (c) is small enough to run on a typical dev machine. Candidates: `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Design phase will fix the default.
|
||||
- **Scoring strategy**: per-passage call (N calls per query, simple to parse) vs. batched single-call (one prompt with all passages, harder to align). The per-passage approach is simpler and parallelizable via `asyncio.gather`; latency is bounded by the slowest passage. Design will fix the strategy.
|
||||
- **Output parsing**: prefer JSON output (`{"score": 0.83}`) with markdown-fence stripping (project convention from `llm_client.chat_json`); fall back to regex-extract first float on parse failure.
|
||||
|
||||
## 3. Implementation Approach Options
|
||||
|
||||
### Option A — Extend `graphiti_adapter.py` In Place
|
||||
Add the `OllamaReranker` class directly to `graphiti_adapter.py` next to `_PassthroughReranker`, and branch in `_get_graphiti()`.
|
||||
|
||||
- **Trade-offs**:
|
||||
- ✅ Same module owns all reranker wiring and the singleton; one file to read.
|
||||
- ✅ Smallest diff; matches the file's existing role as "everything Graphiti".
|
||||
- ❌ Adds prompt/parse logic to an already long (≈545-line) adapter module.
|
||||
- ❌ Harder to reuse the reranker outside Graphiti (unlikely, but precludes it).
|
||||
|
||||
### Option B — Separate Module `backend/app/services/ollama_reranker.py`
|
||||
New module owns the class and its prompt/parse helpers; `graphiti_adapter.py` imports it and selects it in `_get_graphiti()`.
|
||||
|
||||
- **Trade-offs**:
|
||||
- ✅ Clear single-responsibility module; mirrors the structure suggested in the source ticket #39.
|
||||
- ✅ Adapter file stays focused on wiring; reranker can be unit-tested in isolation if testing is later added.
|
||||
- ❌ Slightly more navigation; one extra file in `services/`.
|
||||
- ❌ The provider-selection branch still lives in the adapter, so two files must agree on the provider string.
|
||||
|
||||
### Option C — Hybrid: Provider Registry
|
||||
Introduce a small `_RERANKER_PROVIDERS` map (`"ollama" -> _build_ollama_reranker`, `"none" -> _PassthroughReranker`) inside `graphiti_adapter.py`, with the actual class still living in a separate `ollama_reranker.py`.
|
||||
|
||||
- **Trade-offs**:
|
||||
- ✅ Adding a future provider (e.g. `sentence_transformers`) is a one-line registry change.
|
||||
- ✅ Keeps reranker class out of the adapter.
|
||||
- ❌ Slight over-engineering for two providers (`ollama` + `none`); ticket #39 explicitly scopes only the Ollama path.
|
||||
|
||||
## 4. Implementation Complexity & Risk
|
||||
|
||||
- **Effort**: **S (1–3 days)**
|
||||
- One new class (~80–120 lines), four new config attrs (~10 lines), one factory branch (~10 lines), three doc updates (~30 lines). No schema or API changes.
|
||||
- **Risk**: **Low**
|
||||
- Established patterns (config, OpenAI SDK, logger).
|
||||
- `_PassthroughReranker` is preserved exactly for the `none` fallback, so the worst-case behavior is identical to today.
|
||||
- The graceful-failure path (R5) requires care, but the existing `_GraphNamespace.search` exception handling already insulates HTTP callers from reranker errors.
|
||||
|
||||
## 5. Recommendations for Design Phase
|
||||
|
||||
- **Preferred approach**: **Option B (separate `ollama_reranker.py` module)**. Best alignment with #39's "implement in `backend/app/services/`", keeps `graphiti_adapter.py` focused on Graphiti wiring, and matches the project's "one concern per module" pattern in `services/`.
|
||||
- **Key decisions to lock in design**:
|
||||
1. Default `RERANKER_MODEL` value (recommend `qwen2.5:3b` — small, broadly available on Ollama, reliable at structured short outputs).
|
||||
2. Per-passage scoring strategy with `asyncio.gather` parallelism (simpler, deterministic).
|
||||
3. Prompt + parse format: ask for JSON `{"score": <0.0..1.0>}`, strip fences, regex-fallback to first float.
|
||||
4. Failure mode for a single passage: assign deterministic low score (e.g. `0.0 - 0.001 * i`) so passage still appears once.
|
||||
5. Failure mode for whole `rank()` call: log warning, return original-order tuples with passthrough scores (no exception bubbles up).
|
||||
6. Update `.kiro/specs/graphiti-neo4j-finalize/research.md` "follow-up" note to point at this spec (R6.4).
|
||||
- **Research items carried forward**:
|
||||
- Confirm `qwen2.5:3b` produces stable JSON scores in benchmark prompts (or pick alternative).
|
||||
- Decide whether to expose `RERANKER_MAX_PARALLEL` for concurrency limit (default `len(passages)` — likely small, ≤10).
|
||||
|
|
@ -0,0 +1,95 @@
|
|||
# Requirements Document
|
||||
|
||||
## Project Description (Input)
|
||||
Replace the no-op `_PassthroughReranker` in `backend/app/services/graphiti_adapter.py` with a real reranker that uses an Ollama-available model, so Graphiti search results are properly reranked for the SearchResult / InsightForge / Panorama / Interview report tools. Add `RERANKER_PROVIDER` / `RERANKER_MODEL` / `RERANKER_BASE_URL` env knobs (defaults: ollama / a small Ollama chat model / EMBEDDING_BASE_URL), keep `_PassthroughReranker` only when `RERANKER_PROVIDER=none`, and update `.env.example`, `CLAUDE.md`, and the README accordingly. Source ticket: #39 (.ticket/39.md).
|
||||
|
||||
## Introduction
|
||||
|
||||
The Graphiti adapter currently injects a `_PassthroughReranker` into the `Graphiti(...)` constructor to bypass the upstream default (`OpenAIRerankerClient` with a hard-coded `gpt-4.1-nano` and OpenAI-specific `logprobs`/`logit_bias`), which would 401 against Qwen/Dashscope keys and is unavailable through Ollama. The passthrough is a no-op: it returns passages in original order with synthetic descending scores, so search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are not actually reranked.
|
||||
|
||||
This feature replaces the no-op with a real reranker backed by a model available through the local Ollama stack (matching the existing `EMBEDDING_MODEL=mxbai-embed-large` precedent). A small set of environment variables makes the provider, model, and endpoint overridable. An explicit `none` provider preserves the passthrough behavior for CI / lightweight setups that cannot pull the reranker model.
|
||||
|
||||
## Boundary Context
|
||||
|
||||
- **In scope**:
|
||||
- A new `CrossEncoderClient` implementation in `backend/app/services/` that scores passages against a query by calling an Ollama model through its OpenAI-compatible endpoint.
|
||||
- New `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY` settings in `backend/app/config.py`, with sensible Ollama defaults.
|
||||
- Provider selection inside `_get_graphiti()` so `ollama` selects the new client and `none` keeps `_PassthroughReranker`.
|
||||
- Documentation updates in `.env.example`, `CLAUDE.md` (Required Environment Variables), and the project `README.md` (Ollama prerequisites).
|
||||
- Graceful failure when the configured reranker model is not pulled (clear error, no Flask crash; graph search either falls back to original order or surfaces a logged warning consistent with the existing `_GraphNamespace.search` exception path).
|
||||
- **Out of scope**:
|
||||
- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
|
||||
- Building OpenAI-only or Dashscope-only reranker clients; this spec is specifically the Ollama path (plus the `none` escape hatch).
|
||||
- Upstream changes to `graphiti-core`.
|
||||
- Adding any non-Python reranker library (e.g. `sentence-transformers`); the new client must reuse the OpenAI SDK already in the dependency set.
|
||||
- **Adjacent expectations**:
|
||||
- `graphiti_adapter._get_graphiti()` continues to be the single Graphiti factory; the new reranker must be wired through it, not at call sites.
|
||||
- All Graphiti reads remain scoped by `group_id` — the reranker operates on passages already filtered per project; it does not change isolation rules.
|
||||
- The reranker integrates with `_GraphNamespace.search`, which is the path used by `SearchResult`, `InsightForge`, `Panorama`, and `Interview` tools; behavior changes propagate to those tools automatically and do not need per-tool code changes.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Default reranker is Ollama-backed, not the OpenAI default
|
||||
**Objective:** As a backend developer running MiroFish against the default local Ollama stack, I want Graphiti to rerank search results without requiring an OpenAI key, so that report-tool relevance reflects a real model and not an arbitrary insertion order.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. The Graphiti Adapter shall instantiate Graphiti with a non-passthrough `CrossEncoderClient` whenever `RERANKER_PROVIDER` resolves to `ollama` (the default).
|
||||
2. The Graphiti Adapter shall not depend on `graphiti_core.cross_encoder.openai_reranker_client.OpenAIRerankerClient` for the default code path.
|
||||
3. When `RERANKER_PROVIDER` is unset, the Graphiti Adapter shall behave as if `RERANKER_PROVIDER=ollama`.
|
||||
4. The Graphiti Adapter shall not reference the model name `gpt-4.1-nano` in any reranker code path.
|
||||
|
||||
### Requirement 2: Ollama-backed reranker scores passages via an OpenAI-compatible chat endpoint
|
||||
**Objective:** As a backend developer, I want a reranker that talks to a locally hosted model so that the local-first stack stays self-contained and no remote LLM key is required.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. The Ollama Reranker shall expose a class that subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements the asynchronous `rank(query, passages) -> list[tuple[passage, score]]` contract.
|
||||
2. The Ollama Reranker shall call its configured chat-completions endpoint through the `openai` SDK using `RERANKER_BASE_URL` and `RERANKER_API_KEY`, so no second SDK is introduced.
|
||||
3. The Ollama Reranker shall return passages sorted by descending score (highest relevance first) with one score per input passage.
|
||||
4. When `passages` is empty, the Ollama Reranker shall return an empty list without issuing any model call.
|
||||
5. The Ollama Reranker shall preserve passage strings byte-for-byte; it shall not rewrite, truncate, or reorder content within an individual passage.
|
||||
6. If the model response cannot be parsed into a numeric score for a passage, the Ollama Reranker shall assign that passage a deterministic fallback score lower than every successfully-parsed score so the passage still appears in the output exactly once.
|
||||
|
||||
### Requirement 3: Reranker is configurable via environment variables
|
||||
**Objective:** As an operator deploying MiroFish, I want to override the reranker provider, model, and endpoint via environment variables so that I can target a different Ollama host, a different model, or disable reranking entirely.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. The Configuration module shall expose `RERANKER_PROVIDER` with default `ollama` and accept the values `ollama` and `none`.
|
||||
2. The Configuration module shall expose `RERANKER_MODEL` whose default is a small Ollama-available chat model selected during design (e.g. `qwen2.5:3b` or `llama3.2:3b`).
|
||||
3. The Configuration module shall expose `RERANKER_BASE_URL` whose default is the value of `EMBEDDING_BASE_URL` (so the same Ollama host is reused by default).
|
||||
4. The Configuration module shall expose `RERANKER_API_KEY` whose default is the value of `EMBEDDING_API_KEY` (so Ollama's ignored-token default `ollama` works without explicit configuration).
|
||||
5. If `RERANKER_PROVIDER` is set to a value other than `ollama` or `none`, the Graphiti Adapter shall raise a clear `ValueError` at startup naming the offending value and listing accepted values.
|
||||
6. The Configuration module shall read all four reranker variables from the process environment via the same `os.environ.get` pattern used by the surrounding settings, with no additional dependencies.
|
||||
|
||||
### Requirement 4: `none` provider preserves the passthrough fallback for CI / lightweight setups
|
||||
**Objective:** As a developer running tests or a slim container that cannot pull the reranker model, I want to disable reranking explicitly so the Flask app still boots and graph search still works.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. Where `RERANKER_PROVIDER=none`, the Graphiti Adapter shall continue to inject `_PassthroughReranker` and shall not attempt any model call at startup.
|
||||
2. While `RERANKER_PROVIDER=none`, graph search shall return results in the order Graphiti supplies them with the existing synthetic-descending-score behavior.
|
||||
3. The Graphiti Adapter shall log at INFO level the selected reranker provider during initialization so operators can confirm whether reranking is active.
|
||||
|
||||
### Requirement 5: Graceful degradation when the configured Ollama model is unreachable
|
||||
**Objective:** As an operator who forgot to run `ollama pull <model>` (or whose Ollama service is down), I want the Flask backend to keep serving requests with a clear log signal rather than crashing.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. If the Ollama Reranker fails to score passages for a given query (e.g. connection refused, 404 model not found, timeout, or unparseable response), the Graphiti Adapter shall log a warning that names the failing model and the error class.
|
||||
2. If the Ollama Reranker raises during a `rank` call, the calling `_GraphNamespace.search` shall not propagate the exception to HTTP callers; existing search-error handling already swallows reranker errors into a logged warning, and this behavior shall be preserved.
|
||||
3. When the Ollama Reranker fails for a query, the rerank-failure path shall return the passages in their original Graphiti order so search remains functional.
|
||||
4. The Ollama Reranker shall not raise during construction (i.e. `_get_graphiti()` must succeed even if the Ollama service is unavailable); failures are deferred until the first `rank` call.
|
||||
|
||||
### Requirement 6: Documentation reflects the new reranker configuration
|
||||
**Objective:** As a new contributor reading the docs, I want the reranker env vars, defaults, and prerequisites documented in the same places the other LLM/embedder settings live so configuration is discoverable.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. The Environment Example file (`.env.example`) shall include entries for `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, and `RERANKER_API_KEY`, each commented with its default and accepted values.
|
||||
2. The CLAUDE.md document shall list the four reranker variables in its "Required Environment Variables" section with the same level of detail used for `EMBEDDING_MODEL`.
|
||||
3. The README.md document shall mention the `ollama pull <reranker model>` prerequisite alongside the existing `ollama pull mxbai-embed-large` note (or wherever Ollama setup is documented).
|
||||
4. Where the `.kiro/specs/graphiti-neo4j-finalize` documents state that the reranker is a passthrough no-op, those documents shall either be updated to point at this spec or left untouched (decided in design); the constraint is that no documentation shall continue to claim "a real per-provider reranker is a follow-up" once this spec is implemented.
|
||||
|
||||
### Requirement 7: Report-tool integration verifies reranked output reaches consumers
|
||||
**Objective:** As a developer using the ReportAgent tools, I want `SearchResult`, `InsightForge`, `Panorama`, and `Interview` to receive properly reranked edges/nodes so their report output reflects model-judged relevance, not Graphiti's hybrid-search ordering alone.
|
||||
|
||||
#### Acceptance Criteria
|
||||
1. When `RERANKER_PROVIDER=ollama` is active and the configured model is available, the `_GraphNamespace.search` shall return passages whose order is determined by the Ollama Reranker, not Graphiti's default RRF ordering.
|
||||
2. The ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) shall require no changes for this feature; the rerank improvement reaches them transparently through `_GraphNamespace.search`.
|
||||
3. While the Ollama Reranker is active, the per-project `group_id` scoping of all Graphiti queries shall remain unchanged.
|
||||
|
|
@ -0,0 +1,112 @@
|
|||
# Research & Design Decisions — graphiti-ollama-reranker
|
||||
|
||||
## Summary
|
||||
- **Feature**: `graphiti-ollama-reranker`
|
||||
- **Discovery Scope**: Extension (one new service module + factory branch + config + docs).
|
||||
- **Key Findings**:
|
||||
- `CrossEncoderClient.rank(query, passages) -> list[tuple[str, float]]` is the only abstract contract Graphiti requires of the reranker. The existing `_PassthroughReranker` already exercises this contract correctly.
|
||||
- Ollama's OpenAI-compatible `/v1/chat/completions` endpoint does not reliably expose `logprobs` / `logit_bias`, so Graphiti's default OpenAI scoring approach (binary YES/NO over token logits) cannot be ported. The reranker must use **prompted numeric scoring** with text-output parsing.
|
||||
- The `openai` SDK already shipped in `backend/.venv` (v2.35.1) exposes `AsyncOpenAI`, which is the right client for the async `rank()` method without introducing any new dependency.
|
||||
|
||||
## Research Log
|
||||
|
||||
### Graphiti's `CrossEncoderClient` contract
|
||||
- **Context**: Need to confirm the precise shape of the `rank` interface and any other abstract members.
|
||||
- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:38-51` (`_PassthroughReranker`); `.kiro/specs/graphiti-neo4j-finalize/research.md` and `gap-analysis.md` (which captured the upstream contract on first integration); ticket #39 narrative.
|
||||
- **Findings**:
|
||||
- `_PassthroughReranker` subclasses `CrossEncoderClient` and only overrides `async def rank(query: str, passages: list[str]) -> list[tuple[str, float]]`.
|
||||
- Graphiti's internal call site (`graphiti_core/graphiti.py:154`) constructs the reranker once and calls `rank` per search. There is no separate batch interface to satisfy.
|
||||
- Passages are short text snippets (entity-edge facts / node summaries). Typical N per search ≤ 10 (limit defaulted in `_GraphNamespace.search`).
|
||||
- **Implications**: A drop-in subclass that implements `rank` is sufficient. No additional abstract methods to wire.
|
||||
|
||||
### Ollama OpenAI-compatible scoring surface
|
||||
- **Context**: Decide how to obtain a relevance score per passage from a small Ollama-served chat model.
|
||||
- **Sources Consulted**: Project-internal `backend/app/utils/llm_client.py` (uses `openai.OpenAI` + `chat.completions.create` against Dashscope / OpenAI / Ollama uniformly); ticket #39 "Proposed approach" section enumerating Ollama chat-model scoring vs. embedding cosine.
|
||||
- **Findings**:
|
||||
- Ollama supports `/v1/chat/completions` for chat models like `qwen2.5:3b`, `llama3.2:3b`, `phi3:3.8b`. Pulling a model is required (`ollama pull <model>`).
|
||||
- JSON-mode (`response_format={"type": "json_object"}`) is honored by recent Ollama versions but not universally; project convention is to fall back gracefully (cf. `LLMClient.chat_json`).
|
||||
- Embedding-cosine reranker is feasible (re-embed query and passages with `mxbai-embed-large`) but produces a weaker ordering signal than an LLM that can reason about the question. Picking LLM scoring matches the ticket's preferred path.
|
||||
- **Implications**:
|
||||
- Use a chat-completion call per passage with a deterministic temperature (0.0) and a tight system prompt asking for a JSON score in [0.0, 1.0].
|
||||
- Parse with the same defensive strategy used elsewhere: strip `<think>` blocks, strip markdown fences, attempt `json.loads`, regex-fallback to first float, deterministic low score on hard failure.
|
||||
|
||||
### Concurrency strategy
|
||||
- **Context**: Decide between per-passage parallel calls vs. one batched call.
|
||||
- **Findings**:
|
||||
- Per-passage with `asyncio.gather` is simpler to align outputs and resilient — a single bad output only loses one passage's score.
|
||||
- Single batched prompt requires the model to emit aligned scores (often by index); LLMs occasionally drop entries or misorder them, demanding additional validation.
|
||||
- With typical `limit ≤ 10`, parallel per-passage calls hit Ollama briefly; on a 3B model this is < 5s for 10 passages.
|
||||
- **Implications**: Default to per-passage `asyncio.gather`. Expose no extra concurrency knob initially (avoid premature configuration surface; YAGNI per project guidelines).
|
||||
|
||||
### Failure semantics
|
||||
- **Context**: Required by R5 — Flask must keep serving on Ollama outage, and graph search should remain functional.
|
||||
- **Sources Consulted**: `backend/app/services/graphiti_adapter.py:515-517` (`_GraphNamespace.search` swallows all exceptions and logs a warning); `_get_graphiti()` runs once at first call.
|
||||
- **Findings**:
|
||||
- Construction of an `openai.AsyncOpenAI` client does not perform any network I/O. Therefore `OllamaReranker.__init__` can be safe at startup even when Ollama is down.
|
||||
- If `rank()` itself raises, the upstream `Graphiti.search` may surface the exception. The new reranker should therefore catch its own errors and degrade to passthrough behavior in-method rather than relying on the outer `try/except` in `_GraphNamespace.search`.
|
||||
- **Implications**: `OllamaReranker.rank` should never raise. On exception or unparseable output it returns the input passages in the original order with passthrough-style synthetic scores and emits a single WARNING log per failure (rate-limited by intent: one log per rank() call).
|
||||
|
||||
## Architecture Pattern Evaluation
|
||||
|
||||
| Option | Description | Strengths | Risks / Limitations | Notes |
|
||||
|--------|-------------|-----------|---------------------|-------|
|
||||
| A: Add class to `graphiti_adapter.py` | Define `OllamaReranker` next to `_PassthroughReranker` in the same file. | Minimal diff; single file to read. | Bloats an already-long adapter; mixes wiring with provider-specific logic. | — |
|
||||
| B: New `services/ollama_reranker.py` module | Dedicated module owns prompt + parse + async client; adapter only selects it. | Single-responsibility module; matches ticket suggestion; reusable in isolation. | One extra import in adapter. | **Selected.** Aligns with project pattern of one concern per `services/*` file. |
|
||||
| C: Hybrid provider registry | Map `RERANKER_PROVIDER → builder` in adapter; class still in B's module. | Future providers are a one-line registry change. | Over-engineering for two providers (`ollama` + `none`). | Deferred until a third provider is needed. |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Decision: Provider selected via env var, branch lives in `_get_graphiti()`
|
||||
- **Context**: R3 requires env-driven provider selection; only two values supported by this spec (`ollama` and `none`).
|
||||
- **Alternatives Considered**:
|
||||
1. Function-pointer registry (Option C).
|
||||
2. Inline `if/else` in the factory selecting one of two classes.
|
||||
- **Selected Approach**: Inline branch in `_get_graphiti()` reads `Config.RERANKER_PROVIDER`, picks `_build_ollama_reranker()` or `_PassthroughReranker()`, validates unknown values with a `ValueError` matching the existing `_ALLOWED_GRAPHITI_PROVIDERS` convention.
|
||||
- **Rationale**: Mirrors the established `GRAPHITI_LLM_PROVIDER` validation pattern (`_ALLOWED_GRAPHITI_PROVIDERS`) without adding speculative abstraction. Two values, two branches.
|
||||
- **Trade-offs**: Adding a third provider later costs one more `elif`; acceptable.
|
||||
- **Follow-up**: Surface the selected provider in the INFO startup log so operators can confirm.
|
||||
|
||||
### Decision: Per-passage scoring with `asyncio.gather`, no concurrency knob
|
||||
- **Context**: R2.3 requires one score per passage in descending order; R5 requires graceful per-call failure.
|
||||
- **Alternatives Considered**:
|
||||
1. Single batched prompt with index-aligned output.
|
||||
2. Per-passage call with bounded `Semaphore`.
|
||||
- **Selected Approach**: Per-passage `asyncio.gather` with no explicit limit; rely on default `limit ≤ 10` in `_GraphNamespace.search`.
|
||||
- **Rationale**: Simple, deterministic, isolates per-passage failures. Avoids premature configuration knob.
|
||||
- **Trade-offs**: If a future caller asks for `limit=100`, Ollama may queue 100 requests; acceptable for now because no caller does this.
|
||||
- **Follow-up**: If real-world rerank latency becomes a concern, add `RERANKER_MAX_PARALLEL` then.
|
||||
|
||||
### Decision: Default model = `qwen2.5:3b`
|
||||
- **Context**: Need a small, broadly-available Ollama chat model that reliably emits a numeric score in 1–2 tokens.
|
||||
- **Alternatives Considered**:
|
||||
1. `qwen2.5:3b` (Apache-2.0, 3B params, strong instruction following).
|
||||
2. `llama3.2:3b` (Llama community license, 3B).
|
||||
3. `phi3:3.8b` (MIT, 3.8B).
|
||||
- **Selected Approach**: `qwen2.5:3b`.
|
||||
- **Rationale**: Matches the Qwen-family alignment of the rest of the project (`qwen-plus` is the documented LLM default). Apache-2.0 license is permissive. Small enough for typical dev machines.
|
||||
- **Trade-offs**: Operators on systems without `qwen2.5:3b` must `ollama pull qwen2.5:3b` or override `RERANKER_MODEL`.
|
||||
- **Follow-up**: README will document `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large` step.
|
||||
|
||||
### Decision: Defensive output parsing (`json.loads` → regex float → deterministic low score)
|
||||
- **Context**: R2.6 requires deterministic handling of unparseable model responses.
|
||||
- **Selected Approach**:
|
||||
1. Strip `<think>...</think>` blocks (project convention from `llm_client.py:64`).
|
||||
2. Strip markdown fences (project convention from `llm_client.chat_json`).
|
||||
3. `json.loads` and read `score` (float in `[0, 1]`, clipped on out-of-range).
|
||||
4. On JSON failure, regex-extract the first float token; clip to `[0, 1]`.
|
||||
5. On total failure, assign `0.0 - 0.001 * passage_index` (deterministic and below any successfully-parsed score).
|
||||
- **Rationale**: Reuses patterns already in the codebase. Keeps every passage in the output (R2.6).
|
||||
- **Trade-offs**: One failed parse silently downranks a passage; logged at DEBUG (not WARNING) to avoid log spam.
|
||||
|
||||
## Risks & Mitigations
|
||||
- **Risk**: Ollama service is not running on startup → boot must not fail. **Mitigation**: Construct only `AsyncOpenAI` (no network call) during `__init__`. Defer connectivity to first `rank()`. R5.4.
|
||||
- **Risk**: Model is not pulled → `rank()` raises 404 from Ollama. **Mitigation**: Catch within `rank()`, log WARNING naming model + error class, return passthrough-ordered tuples so search still works. R5.1, R5.3.
|
||||
- **Risk**: Operator misconfigures `RERANKER_PROVIDER` to an unknown value → silent fallthrough to wrong reranker. **Mitigation**: `_get_graphiti()` raises `ValueError` listing allowed values, mirroring `_ALLOWED_GRAPHITI_PROVIDERS`. R3.5.
|
||||
- **Risk**: Multiple concurrent `rank()` calls overwhelm a small local Ollama daemon. **Mitigation**: Accept default Graphiti `limit ≤ 10`; document `RERANKER_MAX_PARALLEL` as a future follow-up if needed.
|
||||
|
||||
## References
|
||||
- `backend/app/services/graphiti_adapter.py:38-51` — current passthrough reranker contract.
|
||||
- `backend/app/services/graphiti_adapter.py:142-162` — current `_get_graphiti()` wiring point.
|
||||
- `backend/app/utils/llm_client.py` — project pattern for OpenAI-SDK chat + JSON parsing + reasoning-block stripping.
|
||||
- `.kiro/specs/graphiti-neo4j-finalize/research.md` — historical context for why the passthrough was introduced.
|
||||
- Ticket `#39` in `.ticket/39.md` — feature brief and acceptance criteria.
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
{
|
||||
"feature_name": "graphiti-ollama-reranker",
|
||||
"created_at": "2026-05-11T10:24:16Z",
|
||||
"updated_at": "2026-05-11T10:45:00Z",
|
||||
"language": "en",
|
||||
"phase": "tasks-generated",
|
||||
"approvals": {
|
||||
"requirements": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"design": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
},
|
||||
"tasks": {
|
||||
"generated": true,
|
||||
"approved": true
|
||||
}
|
||||
},
|
||||
"ready_for_implementation": true,
|
||||
"ticket": 39
|
||||
}
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
# Implementation Plan
|
||||
|
||||
> Foundation tasks introduce the four `RERANKER_*` configuration knobs.
|
||||
> Core tasks add the new `OllamaReranker` and the factory selection branch.
|
||||
> Integration tasks wire documentation parity.
|
||||
> Validation closes the loop with a structural sweep.
|
||||
|
||||
## Foundation
|
||||
|
||||
- [x] 1. Add reranker configuration surface
|
||||
- [x] 1.1 Introduce four `RERANKER_*` settings on the `Config` class
|
||||
- Add `RERANKER_PROVIDER` with default `ollama`, read via `os.environ.get('RERANKER_PROVIDER', 'ollama')`.
|
||||
- Add `RERANKER_MODEL` with default `qwen2.5:3b`, read via `os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')`.
|
||||
- Add `RERANKER_BASE_URL` with default that chains to the embedding host: `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`. Do not reference `Config.EMBEDDING_BASE_URL` directly; use the env-lookup form so behaviour stays consistent under reload patterns.
|
||||
- Add `RERANKER_API_KEY` with default that chains to the embedding key the same way (`os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))`).
|
||||
- Do not add the reranker to `Config.validate()`; the provider has no mandatory credentials.
|
||||
- Observable completion: a Python REPL that imports `Config` shows the four attributes with the documented defaults, and overriding `EMBEDDING_BASE_URL` in the environment is visible on `Config.RERANKER_BASE_URL` too.
|
||||
- _Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6_
|
||||
|
||||
## Core
|
||||
|
||||
- [x] 2. Implement the Ollama-backed reranker
|
||||
- [x] 2.1 Create the new reranker module with the `CrossEncoderClient` subclass
|
||||
- Define a new module under `backend/app/services/` that hosts the reranker class. The class subclasses `graphiti_core.cross_encoder.client.CrossEncoderClient` and implements only the async `rank` method.
|
||||
- Constructor accepts `model`, `base_url`, `api_key` as keyword arguments; it instantiates `openai.AsyncOpenAI(base_url=..., api_key=...)` but performs no network I/O so the Flask app can boot when Ollama is unreachable.
|
||||
- `rank(query, passages)` short-circuits on empty `passages` and returns `[]` without any model call.
|
||||
- For each passage, send a single chat-completion request with `temperature=0.0` and a deterministic system prompt asking for a JSON object `{"score": <0.0..1.0>}` describing the passage's relevance to the query. Use `asyncio.gather` to run all per-passage requests concurrently.
|
||||
- Parse each model response defensively: strip any `<think>...</think>` block, strip markdown code fences, attempt `json.loads`, fall back to regex-extract the first floating-point number, clip the value to `[0.0, 1.0]`. On any per-passage failure, assign a deterministic fallback score of `-0.001 * passage_index` and log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome.
|
||||
- Wrap the whole call in a `try/except`. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` so search remains functional. The method must not raise.
|
||||
- Sort the returned list by score descending before returning.
|
||||
- Observable completion: instantiating the new class with a deliberately bad `base_url` does not raise; an async call to `rank("q", [])` returns `[]`; an async call with two non-empty passages against a reachable Ollama returns two `(passage, float)` tuples in descending-score order, with every input passage byte-identical in the output.
|
||||
- _Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1_
|
||||
- _Boundary: OllamaReranker module_
|
||||
|
||||
## Integration
|
||||
|
||||
- [x] 3. Wire the new reranker into the Graphiti factory
|
||||
- [x] 3.1 Select the reranker inside `_get_graphiti()` based on `Config.RERANKER_PROVIDER`
|
||||
- Introduce a small allow-list constant alongside `_ALLOWED_GRAPHITI_PROVIDERS` enumerating `("ollama", "none")`.
|
||||
- Read `Config.RERANKER_PROVIDER`, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raise `ValueError` with a message that names the offending value and lists the accepted values — same shape as the existing `GRAPHITI_LLM_PROVIDER` validation.
|
||||
- For `ollama`, construct the new `OllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY)` and pass it as the `cross_encoder=` argument to `Graphiti(...)`.
|
||||
- For `none`, continue to pass `_PassthroughReranker()` as today; do not change the passthrough class.
|
||||
- Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log).
|
||||
- Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime.
|
||||
- Observable completion: with `RERANKER_PROVIDER` unset, app startup logs `Initializing Graphiti reranker (provider=ollama)...` and Graphiti is constructed with the `OllamaReranker`. With `RERANKER_PROVIDER=none`, the log reports `none` and Graphiti uses `_PassthroughReranker`. With `RERANKER_PROVIDER=banana`, `_get_graphiti()` raises `ValueError` listing `('ollama', 'none')`.
|
||||
- _Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3_
|
||||
- _Depends: 1.1, 2.1_
|
||||
|
||||
- [ ] 4. Update operator-facing documentation
|
||||
- [ ] 4.1 (P) Add the new env knobs to `.env.example` *(deferred — sandbox hook blocks all `.env*` access; see HANDOFF.md)*
|
||||
- Insert a four-line `RERANKER_*` block adjacent to the existing `EMBEDDING_*` block, mirroring the comment style (default, accepted values, and a one-line note that `RERANKER_PROVIDER=none` disables reranking).
|
||||
- Observable completion: opening `.env.example` shows the four new variables with documented defaults, positioned next to the embedding block.
|
||||
- _Requirements: 6.1_
|
||||
- _Boundary: .env.example_
|
||||
- _Depends: 1.1_
|
||||
|
||||
- [x] 4.2 (P) Extend the `Required Environment Variables` snippet in `CLAUDE.md`
|
||||
- Add the four `RERANKER_*` variables to the existing fenced code block under "Required Environment Variables" in `CLAUDE.md`, keeping the same comment style used for the `EMBEDDING_*` block.
|
||||
- Observable completion: `CLAUDE.md` documents the four reranker variables next to the embedding block and includes a note that `RERANKER_PROVIDER=none` keeps the previous passthrough behaviour.
|
||||
- _Requirements: 6.2_
|
||||
- _Boundary: CLAUDE.md_
|
||||
- _Depends: 1.1_
|
||||
|
||||
- [x] 4.3 (P) Document the Ollama pull prerequisite and env block in `README.md`
|
||||
- In the existing "Install Ollama and pull the default embedding model" section, add a parallel `ollama pull qwen2.5:3b` step (or note that the model used for reranking must be pulled, using the documented default).
|
||||
- In the `.env` snippet under "Configure Environment Variables", add the four `RERANKER_*` lines with brief comments mirroring the embedding-block style.
|
||||
- Treat `README-EN.md` and `README-ZH.md` translations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift.
|
||||
- Observable completion: `README.md` shows the `ollama pull qwen2.5:3b` step and the four reranker env lines in the `.env` snippet.
|
||||
- _Requirements: 6.3_
|
||||
- _Boundary: README.md_
|
||||
- _Depends: 1.1_
|
||||
|
||||
- [x] 4.4 (P) Update the stale follow-up claim in the prior spec
|
||||
- In `.kiro/specs/graphiti-neo4j-finalize/research.md`, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped under `graphiti-ollama-reranker`. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough.
|
||||
- Observable completion: a grep for "real per-provider reranker is a follow-up" across `.kiro/specs/` returns either zero hits or a pointer note to `graphiti-ollama-reranker`.
|
||||
- _Requirements: 6.4_
|
||||
- _Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md_
|
||||
|
||||
## Validation
|
||||
|
||||
- [x] 5. Structural verification sweep
|
||||
- [x] 5.1 Grep for legacy reranker references and verify the new wiring is reachable
|
||||
- Grep `backend/app/services/` for `gpt-4.1-nano` and `OpenAIRerankerClient`; both must return zero hits in code paths owned by this spec.
|
||||
- Grep `backend/app/services/graphiti_adapter.py` for the symbol of the new reranker class; confirm there is exactly one import site and one use site (the `_get_graphiti()` branch).
|
||||
- Confirm the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) require no source changes by grepping for `client.graph.search(` call sites and verifying the kwarg shape is unchanged.
|
||||
- Confirm `_GraphNamespace.search` still filters by `group_id` (no regression to project isolation).
|
||||
- Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged.
|
||||
- _Requirements: 1.4, 7.1, 7.2, 7.3_
|
||||
- _Depends: 3.1_
|
||||
11
CLAUDE.md
11
CLAUDE.md
|
|
@ -84,6 +84,17 @@ EMBEDDING_API_KEY # Default: "ollama" (Ollama ignores the value)
|
|||
# nomic-embed-text are not supported.
|
||||
# Prerequisite for the default: `ollama pull mxbai-embed-large`.
|
||||
|
||||
# Reranker (cross-encoder for Graphiti search results)
|
||||
RERANKER_PROVIDER # Default: ollama (allowed: "ollama", "none")
|
||||
# "none" keeps the legacy passthrough — useful for CI /
|
||||
# slim containers that cannot pull a reranker model.
|
||||
RERANKER_MODEL # Default: qwen2.5:3b (local Ollama chat model)
|
||||
# Prerequisite for the default: `ollama pull qwen2.5:3b`.
|
||||
RERANKER_BASE_URL # Default: value of EMBEDDING_BASE_URL
|
||||
# (typically http://localhost:11434/v1)
|
||||
RERANKER_API_KEY # Default: value of EMBEDDING_API_KEY
|
||||
# (Ollama ignores the value)
|
||||
|
||||
# Optional — Accelerated LLM (omit entirely if not used)
|
||||
LLM_BOOST_API_KEY
|
||||
LLM_BOOST_BASE_URL
|
||||
|
|
|
|||
16
README.md
16
README.md
|
|
@ -137,11 +137,12 @@ neo4j-admin dbms set-initial-password your_neo4j_password
|
|||
neo4j start
|
||||
```
|
||||
|
||||
**Install Ollama and pull the default embedding model:**
|
||||
**Install Ollama and pull the default models:**
|
||||
|
||||
```bash
|
||||
# macOS / Linux: https://ollama.com/download
|
||||
ollama pull mxbai-embed-large
|
||||
ollama pull mxbai-embed-large # embedder for the knowledge graph
|
||||
ollama pull qwen2.5:3b # reranker for Graphiti search results
|
||||
# Ollama serves the OpenAI-compatible /v1 endpoint on http://localhost:11434
|
||||
# by default — no further configuration required.
|
||||
```
|
||||
|
|
@ -181,6 +182,17 @@ EMBEDDING_BASE_URL=http://localhost:11434/v1
|
|||
EMBEDDING_API_KEY=ollama
|
||||
EMBEDDING_MODEL=mxbai-embed-large
|
||||
|
||||
# Reranker — reorders Graphiti search results before the report tools see them.
|
||||
# Default targets the same local Ollama host used for embeddings.
|
||||
# Pre-requisite for the default: `ollama pull qwen2.5:3b`.
|
||||
# Set RERANKER_PROVIDER=none to keep the legacy passthrough (useful for CI /
|
||||
# slim containers that cannot pull a reranker model).
|
||||
RERANKER_PROVIDER=ollama
|
||||
RERANKER_MODEL=qwen2.5:3b
|
||||
# Optional — both default to the EMBEDDING_* equivalents when unset.
|
||||
# RERANKER_BASE_URL=http://localhost:11434/v1
|
||||
# RERANKER_API_KEY=ollama
|
||||
|
||||
# Embeddings — remote fallback (uncomment ONE block if you prefer not to run
|
||||
# Ollama locally). Note: any override must produce 1024-dim vectors to match
|
||||
# Graphiti's vector index — 768-dim models (e.g. nomic-embed-text) are NOT
|
||||
|
|
|
|||
|
|
@ -52,6 +52,24 @@ class Config:
|
|||
# to use Google Gemini directly.
|
||||
GRAPHITI_LLM_PROVIDER = os.environ.get('GRAPHITI_LLM_PROVIDER', 'openai')
|
||||
|
||||
# Reranker (cross-encoder) settings. The reranker reorders Graphiti search
|
||||
# results before they reach the ReportAgent tools. Defaults target the same
|
||||
# local Ollama host used for embeddings; setting RERANKER_PROVIDER=none
|
||||
# disables reranking and keeps the legacy passthrough (useful for CI or
|
||||
# slim containers that cannot pull the reranker model). RERANKER_BASE_URL
|
||||
# and RERANKER_API_KEY chain through EMBEDDING_BASE_URL / EMBEDDING_API_KEY
|
||||
# so a single-host Ollama deployment needs no extra configuration.
|
||||
RERANKER_PROVIDER = os.environ.get('RERANKER_PROVIDER', 'ollama')
|
||||
RERANKER_MODEL = os.environ.get('RERANKER_MODEL', 'qwen2.5:3b')
|
||||
RERANKER_BASE_URL = os.environ.get(
|
||||
'RERANKER_BASE_URL',
|
||||
os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'),
|
||||
)
|
||||
RERANKER_API_KEY = os.environ.get(
|
||||
'RERANKER_API_KEY',
|
||||
os.environ.get('EMBEDDING_API_KEY', 'ollama'),
|
||||
)
|
||||
|
||||
# Zep settings (kept for backwards compatibility; deprecated).
|
||||
ZEP_API_KEY = os.environ.get('ZEP_API_KEY', '')
|
||||
|
||||
|
|
|
|||
|
|
@ -31,6 +31,7 @@ from graphiti_core.cross_encoder.client import CrossEncoderClient
|
|||
|
||||
from ..config import Config
|
||||
from ..utils.logger import get_logger
|
||||
from .ollama_reranker import OllamaReranker
|
||||
|
||||
logger = get_logger('mirofish.graphiti_adapter')
|
||||
|
||||
|
|
@ -42,7 +43,9 @@ class _PassthroughReranker(CrossEncoderClient):
|
|||
descending scores. Injected explicitly so Graphiti does not fall back
|
||||
to its default ``OpenAIRerankerClient`` (which uses a hard-coded
|
||||
``gpt-4.1-nano`` model with logprobs and would 401 against Qwen /
|
||||
Dashscope keys). A real per-provider reranker is a follow-up.
|
||||
Dashscope keys). Selected when ``Config.RERANKER_PROVIDER == "none"``
|
||||
— useful for CI / slim containers that cannot pull the reranker model.
|
||||
For real reranking, set ``RERANKER_PROVIDER=ollama`` (the default).
|
||||
"""
|
||||
|
||||
async def rank(self, query: str, passages: list[str]) -> list[tuple[str, float]]:
|
||||
|
|
@ -87,6 +90,31 @@ _graphiti_lock = threading.Lock()
|
|||
|
||||
|
||||
_ALLOWED_GRAPHITI_PROVIDERS = ("openai", "gemini")
|
||||
_ALLOWED_RERANKER_PROVIDERS = ("ollama", "none")
|
||||
|
||||
|
||||
def _build_reranker(provider: str) -> CrossEncoderClient:
|
||||
"""Build the cross-encoder reranker for the configured provider.
|
||||
|
||||
Defers to ``_PassthroughReranker`` when ``provider`` is ``"none"``
|
||||
(the legacy no-op behaviour, useful for CI / slim containers that
|
||||
cannot pull the reranker model). For ``"ollama"`` it constructs the
|
||||
real Ollama-backed reranker; the construction is side-effect-free, so
|
||||
Graphiti initialisation does not depend on the Ollama daemon being
|
||||
reachable at startup.
|
||||
"""
|
||||
if provider == "none":
|
||||
return _PassthroughReranker()
|
||||
if provider == "ollama":
|
||||
return OllamaReranker(
|
||||
model=Config.RERANKER_MODEL,
|
||||
base_url=Config.RERANKER_BASE_URL,
|
||||
api_key=Config.RERANKER_API_KEY,
|
||||
)
|
||||
raise ValueError(
|
||||
f"Unknown RERANKER_PROVIDER={provider!r}; "
|
||||
f"allowed: {_ALLOWED_RERANKER_PROVIDERS}"
|
||||
)
|
||||
|
||||
|
||||
def _build_llm_and_embedder(provider: str):
|
||||
|
|
@ -146,14 +174,19 @@ def _get_graphiti() -> Graphiti:
|
|||
if _graphiti_instance is None:
|
||||
provider = (Config.GRAPHITI_LLM_PROVIDER or "openai").lower()
|
||||
logger.info(f"Initializing Graphiti client (provider={provider})...")
|
||||
reranker_provider = (Config.RERANKER_PROVIDER or "ollama").lower()
|
||||
logger.info(
|
||||
f"Initializing Graphiti reranker (provider={reranker_provider})..."
|
||||
)
|
||||
llm_client, embedder = _build_llm_and_embedder(provider)
|
||||
cross_encoder = _build_reranker(reranker_provider)
|
||||
g = Graphiti(
|
||||
Config.NEO4J_URI,
|
||||
Config.NEO4J_USER,
|
||||
Config.NEO4J_PASSWORD,
|
||||
llm_client=llm_client,
|
||||
embedder=embedder,
|
||||
cross_encoder=_PassthroughReranker(),
|
||||
cross_encoder=cross_encoder,
|
||||
)
|
||||
# Use the persistent loop so the driver is bound to it from the start
|
||||
_run(g.build_indices_and_constraints())
|
||||
|
|
|
|||
|
|
@ -0,0 +1,170 @@
|
|||
"""Ollama-backed cross-encoder reranker for Graphiti search.
|
||||
|
||||
Replaces the no-op ``_PassthroughReranker`` injected into Graphiti by default
|
||||
with a real reranker that scores passages against a query through an Ollama
|
||||
chat model exposed over its OpenAI-compatible ``/v1`` surface.
|
||||
|
||||
The class implements only ``CrossEncoderClient.rank`` (the sole abstract
|
||||
member Graphiti requires) and is constructed by ``graphiti_adapter._get_graphiti``
|
||||
when ``Config.RERANKER_PROVIDER == "ollama"``. It does not perform any
|
||||
network I/O at construction time so the Flask app can boot even when the
|
||||
Ollama daemon is unreachable; failures are handled inside ``rank`` and never
|
||||
propagate, so graph search remains functional under degradation.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import re
|
||||
from typing import List, Tuple
|
||||
|
||||
from openai import AsyncOpenAI
|
||||
from graphiti_core.cross_encoder.client import CrossEncoderClient
|
||||
|
||||
from ..utils.logger import get_logger
|
||||
|
||||
logger = get_logger('mirofish.ollama_reranker')
|
||||
|
||||
|
||||
_THINK_BLOCK = re.compile(r"<think>[\s\S]*?</think>", re.IGNORECASE)
|
||||
_CODE_FENCE_START = re.compile(r"^```(?:json)?\s*\n?", re.IGNORECASE)
|
||||
_CODE_FENCE_END = re.compile(r"\n?```\s*$")
|
||||
_FIRST_FLOAT = re.compile(r"-?\d+(?:\.\d+)?")
|
||||
|
||||
_SYSTEM_PROMPT = (
|
||||
"You are a relevance grader. Given a user query and a single passage, "
|
||||
"rate how relevant the passage is to the query on a continuous scale "
|
||||
"from 0.0 (not relevant at all) to 1.0 (perfectly relevant). "
|
||||
"Respond with a single JSON object of the form {\"score\": <float>} "
|
||||
"and nothing else."
|
||||
)
|
||||
|
||||
|
||||
def _clip_unit(value: float) -> float:
|
||||
"""Clamp ``value`` into the closed interval [0.0, 1.0]."""
|
||||
if value < 0.0:
|
||||
return 0.0
|
||||
if value > 1.0:
|
||||
return 1.0
|
||||
return value
|
||||
|
||||
|
||||
def _parse_score(raw: str) -> float:
|
||||
"""Parse a model response into a relevance score in [0.0, 1.0].
|
||||
|
||||
Strips reasoning ``<think>`` blocks and markdown fences (the same
|
||||
defensive pattern used in ``utils/llm_client.py``), then attempts
|
||||
``json.loads`` and reads ``score``. Falls back to extracting the first
|
||||
floating-point number from the cleaned text. Raises ``ValueError`` when
|
||||
no numeric value can be recovered.
|
||||
"""
|
||||
text = _THINK_BLOCK.sub("", raw or "").strip()
|
||||
text = _CODE_FENCE_START.sub("", text)
|
||||
text = _CODE_FENCE_END.sub("", text).strip()
|
||||
|
||||
try:
|
||||
parsed = json.loads(text)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
parsed = None
|
||||
|
||||
if isinstance(parsed, dict) and "score" in parsed:
|
||||
try:
|
||||
return _clip_unit(float(parsed["score"]))
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
|
||||
match = _FIRST_FLOAT.search(text)
|
||||
if match is not None:
|
||||
try:
|
||||
return _clip_unit(float(match.group(0)))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
raise ValueError(f"no numeric score in model response: {text!r}")
|
||||
|
||||
|
||||
class OllamaReranker(CrossEncoderClient):
|
||||
"""Cross-encoder reranker that scores passages via an Ollama chat model.
|
||||
|
||||
Subclass of :class:`graphiti_core.cross_encoder.client.CrossEncoderClient`
|
||||
that implements ``rank`` by issuing one chat-completion request per
|
||||
passage through ``openai.AsyncOpenAI`` (which speaks the OpenAI-compatible
|
||||
surface exposed by Ollama on ``/v1``).
|
||||
|
||||
Construction is side-effect-free: building the underlying ``AsyncOpenAI``
|
||||
client does not perform any network I/O, so ``_get_graphiti`` can wire
|
||||
this class up at startup even when the Ollama daemon is unavailable.
|
||||
Failures surface only at ``rank`` call time and are degraded to a
|
||||
passthrough-style result with a single ``WARNING`` log per failed call.
|
||||
"""
|
||||
|
||||
def __init__(self, *, model: str, base_url: str, api_key: str) -> None:
|
||||
"""Configure the reranker.
|
||||
|
||||
Args:
|
||||
model: Name of the Ollama chat model used to score passages
|
||||
(for example ``qwen2.5:3b``). The operator is expected to
|
||||
have run ``ollama pull <model>`` before reranking is exercised.
|
||||
base_url: OpenAI-compatible endpoint for the Ollama server, for
|
||||
example ``http://localhost:11434/v1``.
|
||||
api_key: API key forwarded to the OpenAI client. Ollama ignores
|
||||
the value but the SDK requires a non-empty string.
|
||||
"""
|
||||
self._model = model
|
||||
self._client = AsyncOpenAI(base_url=base_url, api_key=api_key)
|
||||
|
||||
async def _score_passage(self, query: str, passage: str, index: int) -> float:
|
||||
"""Score a single passage; deterministic low fallback on parse failure."""
|
||||
user_prompt = (
|
||||
f"Query:\n{query}\n\n"
|
||||
f"Passage:\n{passage}\n\n"
|
||||
"Reply with only the JSON object described in the system prompt."
|
||||
)
|
||||
response = await self._client.chat.completions.create(
|
||||
model=self._model,
|
||||
messages=[
|
||||
{"role": "system", "content": _SYSTEM_PROMPT},
|
||||
{"role": "user", "content": user_prompt},
|
||||
],
|
||||
temperature=0.0,
|
||||
max_tokens=32,
|
||||
)
|
||||
raw = response.choices[0].message.content or ""
|
||||
try:
|
||||
return _parse_score(raw)
|
||||
except ValueError as exc:
|
||||
logger.debug(
|
||||
"Reranker parse failure (model=%s, passage_index=%d): %s",
|
||||
self._model, index, exc,
|
||||
)
|
||||
return -0.001 * (index + 1)
|
||||
|
||||
async def rank(
|
||||
self,
|
||||
query: str,
|
||||
passages: List[str],
|
||||
) -> List[Tuple[str, float]]:
|
||||
"""Return ``(passage, score)`` tuples sorted by score descending.
|
||||
|
||||
Empty ``passages`` returns ``[]`` without any model call. On a
|
||||
whole-call failure (connection refused, model 404, timeout, etc.)
|
||||
the method logs a single ``WARNING`` and returns the passages in
|
||||
their original order with synthetic descending scores so graph
|
||||
search keeps functioning. The method does not raise.
|
||||
"""
|
||||
if not passages:
|
||||
return []
|
||||
|
||||
try:
|
||||
scores = await asyncio.gather(
|
||||
*(self._score_passage(query, p, i) for i, p in enumerate(passages))
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001 — graceful degrade per design R5
|
||||
logger.warning(
|
||||
"Ollama reranker failed (model=%s, error=%s); falling back to passthrough order.",
|
||||
self._model, type(exc).__name__,
|
||||
)
|
||||
return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]
|
||||
|
||||
scored = list(zip(passages, scores))
|
||||
scored.sort(key=lambda item: item[1], reverse=True)
|
||||
return scored
|
||||
Loading…
Reference in New Issue