396 lines
24 KiB
Markdown
396 lines
24 KiB
Markdown
# Design — graphiti-ollama-reranker
|
||
|
||
## Overview
|
||
**Purpose**: Replace the no-op `_PassthroughReranker` injected into Graphiti with a real Ollama-backed `CrossEncoderClient`, so that hybrid search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (`RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY`) with Ollama-aligned defaults; an explicit `RERANKER_PROVIDER=none` preserves the passthrough for CI and slim containers.
|
||
|
||
**Users**: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.
|
||
|
||
**Impact**: Adds one new module under `backend/app/services/`, four `Config` attributes, a small selection branch in `_get_graphiti()`, and documentation in `.env.example`, `CLAUDE.md`, `README.md`. No data schema, no API, no UI changes. Behavior under `RERANKER_PROVIDER=none` is identical to today.
|
||
|
||
### Goals
|
||
- Default Ollama-backed reranker producing one `(passage, score)` tuple per input passage, sorted descending by score.
|
||
- Env-driven configuration with sensible Ollama defaults inherited from existing `EMBEDDING_*` settings.
|
||
- Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
|
||
- Documentation parity with `EMBEDDING_*` knobs in `.env.example`, `CLAUDE.md`, and `README.md`.
|
||
|
||
### Non-Goals
|
||
- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
|
||
- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
|
||
- Upstream contributions to `graphiti-core`.
|
||
- Adding a `sentence-transformers` or other non-`openai` reranker dependency.
|
||
|
||
## Boundary Commitments
|
||
|
||
### This Spec Owns
|
||
- The Ollama reranker implementation and its prompt/parse logic.
|
||
- The `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY` settings and their defaults.
|
||
- The branch in `_get_graphiti()` that selects between the Ollama reranker and the passthrough.
|
||
- The startup INFO log line that announces the selected reranker.
|
||
- Documentation entries in `.env.example`, `CLAUDE.md` "Required Environment Variables", and `README.md` Ollama prerequisites.
|
||
|
||
### Out of Boundary
|
||
- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
|
||
- Per-passage retrieval (still owned by `_GraphNamespace.search` and Graphiti).
|
||
- The `group_id` scoping rules.
|
||
- Any change to the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) — they receive reranked output transparently.
|
||
- Implementation of additional reranker providers; this design covers only `ollama` and `none`.
|
||
|
||
### Allowed Dependencies
|
||
- Upstream library: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0).
|
||
- In-repo: `Config` (`backend/app/config.py`), `get_logger` (`backend/app/utils/logger.py`), `openai.AsyncOpenAI` (already installed).
|
||
- Existing factory: `_get_graphiti()` continues to be the singleton chokepoint.
|
||
|
||
### Revalidation Triggers
|
||
- If `graphiti-core` changes the `CrossEncoderClient.rank` signature, this design must be revisited.
|
||
- If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in `research.md`).
|
||
- If `Config.GRAPHITI_LLM_PROVIDER` semantics change in a way that re-couples LLM and reranker, this design must be checked.
|
||
|
||
## Architecture
|
||
|
||
### Existing Architecture Analysis
|
||
- `_get_graphiti()` already injects an explicit `cross_encoder=_PassthroughReranker()` (line 156). The pattern of double-checked-locking singleton with provider switch (`GRAPHITI_LLM_PROVIDER`) is mature and must be preserved.
|
||
- The persistent event loop (`_get_loop`, `_run`) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does **not** need to schedule work onto `_get_loop()`.
|
||
- All four ReportAgent tools call `_GraphNamespace.search`, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.
|
||
|
||
### Architecture Pattern & Boundary Map
|
||
|
||
```mermaid
|
||
graph LR
|
||
subgraph Config
|
||
EnvVars[RERANKER_*\nenv vars]
|
||
ConfigCls[Config attributes]
|
||
EnvVars --> ConfigCls
|
||
end
|
||
|
||
subgraph Adapter
|
||
Factory[_get_graphiti]
|
||
Passthrough[_PassthroughReranker]
|
||
OllamaCls[OllamaReranker]
|
||
Factory -->|provider=none| Passthrough
|
||
Factory -->|provider=ollama| OllamaCls
|
||
end
|
||
|
||
subgraph Graphiti
|
||
GraphitiCore[Graphiti instance]
|
||
Search[_GraphNamespace.search]
|
||
Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
|
||
end
|
||
|
||
ConfigCls --> Factory
|
||
Passthrough -->|injected as cross_encoder| GraphitiCore
|
||
OllamaCls -->|injected as cross_encoder| GraphitiCore
|
||
GraphitiCore --> Search
|
||
Search --> Tools
|
||
|
||
OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]
|
||
```
|
||
|
||
**Architecture Integration**:
|
||
- **Selected pattern**: Strategy pattern with two implementations selected at factory time. Same shape as the existing `GRAPHITI_LLM_PROVIDER` branch.
|
||
- **Domain/feature boundaries**: Reranker construction and prompt/parse live in `ollama_reranker.py`. Wiring lives in `graphiti_adapter.py`. Config lives in `config.py`. No overlap.
|
||
- **Existing patterns preserved**: Double-checked-locking singleton; explicit `cross_encoder` injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; `Config` reads via `os.environ.get(..., default)`.
|
||
- **New components rationale**: `OllamaReranker` is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
|
||
- **Steering compliance**: Single OpenAI-SDK convention preserved; per-project `group_id` scoping unaffected; no new dependency.
|
||
|
||
### Technology Stack
|
||
|
||
| Layer | Choice / Version | Role in Feature | Notes |
|
||
|-------|------------------|-----------------|-------|
|
||
| Backend / Services | Python ≥3.11, async via `asyncio` | Hosts the new reranker class. | Inherits project minimum. |
|
||
| LLM client | `openai` SDK (already pinned, v2.x) | `AsyncOpenAI` chat completions against Ollama's `/v1`. | No new dependency. |
|
||
| Model | Ollama-served chat model, default `qwen2.5:3b` | Produces a numeric relevance score per passage. | Operator may override via `RERANKER_MODEL`. |
|
||
| Endpoint | Ollama's OpenAI-compatible `/v1` | Default `http://localhost:11434/v1`. | Reuses `EMBEDDING_BASE_URL` semantics. |
|
||
| Graph layer | `graphiti-core ≥ 0.3` | Consumes the new `CrossEncoderClient`. | No upstream change. |
|
||
|
||
## File Structure Plan
|
||
|
||
### Directory Structure
|
||
```
|
||
backend/app/
|
||
├── services/
|
||
│ ├── graphiti_adapter.py # MODIFIED — factory branches on RERANKER_PROVIDER
|
||
│ └── ollama_reranker.py # NEW — OllamaReranker(CrossEncoderClient)
|
||
├── config.py # MODIFIED — adds RERANKER_* attrs
|
||
└── utils/
|
||
└── logger.py # unchanged
|
||
|
||
repo-root/
|
||
├── .env.example # MODIFIED — adds RERANKER_* block
|
||
├── CLAUDE.md # MODIFIED — Required Environment Variables
|
||
└── README.md # MODIFIED — Ollama prerequisites note
|
||
```
|
||
|
||
### Modified Files
|
||
- `backend/app/services/graphiti_adapter.py` — Add small branch in `_get_graphiti()` that picks `OllamaReranker()` or `_PassthroughReranker()` based on `Config.RERANKER_PROVIDER`. Log the selection at INFO. `_PassthroughReranker` class is unchanged.
|
||
- `backend/app/config.py` — Add four new class attributes with documented defaults. No change to existing `validate()` (reranker has no mandatory key).
|
||
- `.env.example` — Add a four-line `RERANKER_*` block with comments mirroring the `EMBEDDING_*` style.
|
||
- `CLAUDE.md` — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
|
||
- `README.md` — Update the Ollama prerequisite section to mention `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large`.
|
||
|
||
> `_PassthroughReranker` stays in `graphiti_adapter.py` (unchanged contract); only the wiring around it changes.
|
||
|
||
## System Flows
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Search as _GraphNamespace.search
|
||
participant Graphiti as graphiti-core
|
||
participant Reranker as OllamaReranker.rank
|
||
participant Ollama as Ollama /v1/chat/completions
|
||
|
||
Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
|
||
Graphiti->>Graphiti: hybrid retrieval (RRF)
|
||
Graphiti->>Reranker: rank(query, [p1..pN])
|
||
par per-passage scoring
|
||
Reranker->>Ollama: chat.completions(prompt p1, temp=0)
|
||
Reranker->>Ollama: chat.completions(prompt p2, temp=0)
|
||
Reranker->>Ollama: chat.completions(prompt pN, temp=0)
|
||
end
|
||
alt all scores parsed
|
||
Reranker-->>Graphiti: sorted [(p, score), ...]
|
||
else any failure
|
||
Reranker->>Reranker: log WARNING, return passthrough order
|
||
Reranker-->>Graphiti: original order with synthetic scores
|
||
end
|
||
Graphiti-->>Search: ranked edges/nodes
|
||
Search-->>Tools: ranked results
|
||
```
|
||
|
||
**Decision points after diagram**:
|
||
- `temperature=0.0` makes the score deterministic per (query, passage, model) tuple.
|
||
- Per-passage failures (one bad parse out of N) downrank that passage to `0.0 - 0.001 * index` and continue; only whole-call exceptions degrade to passthrough.
|
||
- The reranker never raises; this isolates Graphiti from upstream noise even when `_GraphNamespace.search`'s existing exception swallow is removed in a future refactor.
|
||
|
||
## Requirements Traceability
|
||
|
||
| Requirement | Summary | Components | Interfaces | Flows |
|
||
|-------------|---------|------------|------------|-------|
|
||
| 1.1 | Default reranker is Ollama-backed | `_get_graphiti()`, `OllamaReranker` | Inline factory branch | Adapter init |
|
||
| 1.2 | No dependency on `OpenAIRerankerClient` | `_get_graphiti()` | Explicit `cross_encoder=` injection (unchanged behavior) | — |
|
||
| 1.3 | Unset → defaults to `ollama` | `Config.RERANKER_PROVIDER` | `os.environ.get('RERANKER_PROVIDER', 'ollama')` | — |
|
||
| 1.4 | No `gpt-4.1-nano` reference | All new files | — | — |
|
||
| 2.1 | Subclass `CrossEncoderClient.rank` | `OllamaReranker` | `async rank(query, passages) -> list[tuple[str, float]]` | Per-passage scoring |
|
||
| 2.2 | Uses `openai.AsyncOpenAI` | `OllamaReranker.__init__` | `AsyncOpenAI(base_url, api_key)` | — |
|
||
| 2.3 | Returns passages sorted descending | `OllamaReranker.rank` | Postcondition: descending by score | — |
|
||
| 2.4 | Empty input → empty output, no model call | `OllamaReranker.rank` | Guard at method entry | — |
|
||
| 2.5 | Preserves passage strings byte-for-byte | `OllamaReranker.rank` | Strings are echoed, never rewritten | — |
|
||
| 2.6 | Unparseable score → deterministic low fallback | `OllamaReranker.rank` | Internal `_parse_score` helper | Failure branch |
|
||
| 3.1 | `RERANKER_PROVIDER` env knob | `Config` | Class attr, default `ollama`, validated `{ollama, none}` | Adapter init |
|
||
| 3.2 | `RERANKER_MODEL` env knob | `Config` | Class attr, default `qwen2.5:3b` | — |
|
||
| 3.3 | `RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL` | `Config` | Class attr resolves at read time | — |
|
||
| 3.4 | `RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY` | `Config` | Class attr | — |
|
||
| 3.5 | Unknown value → `ValueError` | `_get_graphiti()` | `_ALLOWED_RERANKER_PROVIDERS` validation | Adapter init |
|
||
| 3.6 | Reads via `os.environ.get` only | `Config` | — | — |
|
||
| 4.1 | `none` keeps `_PassthroughReranker` | `_get_graphiti()` | Factory branch | Adapter init |
|
||
| 4.2 | Graph search remains functional under `none` | `_PassthroughReranker.rank` (unchanged) | — | — |
|
||
| 4.3 | INFO log announces selected provider | `_get_graphiti()` | `logger.info` line | Adapter init |
|
||
| 5.1 | WARNING log on rerank failure | `OllamaReranker.rank` | `logger.warning` with model + error class | Failure branch |
|
||
| 5.2 | No exception propagation to HTTP callers | `OllamaReranker.rank` (never raises) | — | — |
|
||
| 5.3 | Original order on whole-call failure | `OllamaReranker.rank` | Passthrough fallback inside method | Failure branch |
|
||
| 5.4 | `__init__` never raises | `OllamaReranker.__init__` | `AsyncOpenAI()` lazy I/O | Adapter init |
|
||
| 6.1 | `.env.example` documents the four vars | `.env.example` | — | — |
|
||
| 6.2 | `CLAUDE.md` lists the four vars | `CLAUDE.md` | — | — |
|
||
| 6.3 | `README.md` mentions `ollama pull <model>` | `README.md` | — | — |
|
||
| 6.4 | Old "follow-up" claim updated | `graphiti-neo4j-finalize/research.md` (or design.md) | — | — |
|
||
| 7.1 | Reranked order reaches `_GraphNamespace.search` | `OllamaReranker`, `_get_graphiti()` | Through Graphiti's own `search()` | End-to-end |
|
||
| 7.2 | No changes to report tools | n/a | n/a | — |
|
||
| 7.3 | `group_id` scoping unchanged | `_GraphNamespace.search` (unchanged) | — | — |
|
||
|
||
## Components and Interfaces
|
||
|
||
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|
||
|-----------|--------------|--------|--------------|--------------------------|-----------|
|
||
| `OllamaReranker` | Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1) | Service |
|
||
| `Config` (extended) | Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.1–3.6, 4.1 | `os.environ.get` (P0) | State (configuration) |
|
||
| `_get_graphiti()` (extended) | Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | `Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0) | Service |
|
||
| `.env.example`, `CLAUDE.md`, `README.md` | Docs | Communicate new knobs and Ollama prerequisite. | 6.1–6.4 | — | — |
|
||
|
||
---
|
||
|
||
### Backend / Services
|
||
|
||
#### `OllamaReranker`
|
||
|
||
| Field | Detail |
|
||
|-------|--------|
|
||
| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. |
|
||
| Requirements | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 |
|
||
|
||
**Responsibilities & Constraints**
|
||
- Subclass `graphiti_core.cross_encoder.client.CrossEncoderClient`; implement only `rank`.
|
||
- Use `openai.AsyncOpenAI`; no second SDK; no top-level network I/O in `__init__`.
|
||
- Preserve passage strings byte-for-byte; never rewrite or truncate.
|
||
- Never raise from `rank()`. On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
|
||
- Deterministic scoring: `temperature=0.0`, no randomness in fallback scores.
|
||
- Thread-safety: stateless beyond the immutable `AsyncOpenAI` client and string config; safe under Graphiti's concurrent search.
|
||
|
||
**Dependencies**
|
||
- Inbound: `_get_graphiti()` — instantiates a single instance and passes it as `cross_encoder=` to `Graphiti(...)` (P0).
|
||
- Outbound: `Ollama /v1/chat/completions` via `openai.AsyncOpenAI` (P0).
|
||
- External: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai` SDK (P0).
|
||
|
||
**Contracts**: Service [x]
|
||
|
||
##### Service Interface
|
||
|
||
```python
|
||
class OllamaReranker(CrossEncoderClient):
|
||
def __init__(
|
||
self,
|
||
*,
|
||
model: str,
|
||
base_url: str,
|
||
api_key: str,
|
||
) -> None: ...
|
||
|
||
async def rank(
|
||
self,
|
||
query: str,
|
||
passages: list[str],
|
||
) -> list[tuple[str, float]]:
|
||
"""
|
||
Score each passage's relevance to `query` and return
|
||
`(passage, score)` tuples sorted in descending order of score.
|
||
|
||
Preconditions:
|
||
- `passages` is a (possibly empty) list of strings.
|
||
|
||
Postconditions:
|
||
- len(return) == len(passages).
|
||
- return is sorted by score descending.
|
||
- For all i, return[i][0] is byte-identical to one of the inputs.
|
||
- For any rank() call, this method does not raise.
|
||
|
||
Invariants:
|
||
- Successfully-parsed scores fall in [0.0, 1.0].
|
||
- Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
|
||
and are strictly less than every successfully-parsed score.
|
||
"""
|
||
```
|
||
|
||
**Implementation Notes**
|
||
- **Integration**: Constructed inside `_get_graphiti()` when `Config.RERANKER_PROVIDER == "ollama"`; injected into `Graphiti(..., cross_encoder=...)`.
|
||
- **Validation**:
|
||
- Reject empty `passages` immediately with `return []`.
|
||
- Clip parsed `score` to `[0.0, 1.0]`.
|
||
- Treat any uncaught per-passage exception as parse failure and assign deterministic fallback `-0.001 * passage_index`.
|
||
- Treat any whole-call exception (e.g. connection refused) as graceful degrade: return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]`.
|
||
- **Risks**: Default `qwen2.5:3b` must be `ollama pull`-ed by operators; documented in README. If absent, R5 path kicks in.
|
||
|
||
---
|
||
|
||
### Backend / Config
|
||
|
||
#### `Config` (extended)
|
||
|
||
| Field | Detail |
|
||
|-------|--------|
|
||
| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. |
|
||
| Requirements | 1.3, 3.1–3.6, 4.1 |
|
||
|
||
**Responsibilities & Constraints**
|
||
- Read from `os.environ.get` only; no new dependency.
|
||
- `RERANKER_PROVIDER` default `ollama`; valid values: `ollama`, `none`.
|
||
- `RERANKER_MODEL` default `qwen2.5:3b`.
|
||
- `RERANKER_BASE_URL` default = `EMBEDDING_BASE_URL` value at module load time.
|
||
- `RERANKER_API_KEY` default = `EMBEDDING_API_KEY` value at module load time.
|
||
- Validation of `RERANKER_PROVIDER` happens in `_get_graphiti()` (not `Config.validate()`) to keep the validate-at-boot list focused on credential presence.
|
||
|
||
**Contracts**: State [x]
|
||
|
||
##### State Management
|
||
- **State model**: Read-only class attributes resolved once at import.
|
||
- **Persistence & consistency**: None; values come from environment.
|
||
- **Concurrency strategy**: Immutable after import; safe.
|
||
|
||
**Implementation Notes**
|
||
- **Integration**: Defaults for `RERANKER_BASE_URL` / `RERANKER_API_KEY` should reference the corresponding `EMBEDDING_*` env vars (not the resolved `Config.EMBEDDING_BASE_URL` constant) so an operator setting only `EMBEDDING_BASE_URL` still gets the reranker pointed at the same Ollama host without needing to set `RERANKER_BASE_URL` explicitly. Implementation reads `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`.
|
||
- **Validation**: None at config-load time. Provider value is validated by `_get_graphiti()`.
|
||
- **Risks**: An operator who overrides `EMBEDDING_BASE_URL` but not `RERANKER_BASE_URL` will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.
|
||
|
||
---
|
||
|
||
### Backend / Adapter
|
||
|
||
#### `_get_graphiti()` (extended)
|
||
|
||
| Field | Detail |
|
||
|-------|--------|
|
||
| Intent | Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice. |
|
||
| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 |
|
||
|
||
**Responsibilities & Constraints**
|
||
- Preserve double-checked locking and singleton semantics exactly.
|
||
- Read `Config.RERANKER_PROVIDER` once at construction; do not re-read.
|
||
- For `ollama`: construct `OllamaReranker(model=..., base_url=..., api_key=...)`.
|
||
- For `none`: construct `_PassthroughReranker()` (current behavior preserved).
|
||
- For any other value: raise `ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")` — mirrors the existing `_ALLOWED_GRAPHITI_PROVIDERS` validation pattern.
|
||
- Log at INFO once: `f"Initializing Graphiti reranker (provider={provider})..."`.
|
||
|
||
**Contracts**: Service [x]
|
||
|
||
##### Service Interface
|
||
|
||
```python
|
||
def _get_graphiti() -> Graphiti:
|
||
"""Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""
|
||
```
|
||
|
||
**Implementation Notes**
|
||
- **Integration**: Replaces the unconditional `cross_encoder=_PassthroughReranker()` at `graphiti_adapter.py:156` with a `cross_encoder=_build_reranker(provider)` call. The factory helper lives next to `_build_llm_and_embedder` in the same file.
|
||
- **Validation**: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
|
||
- **Risks**: A typo such as `RERANKER_PROVIDER=Ollama` (capitalized) would raise; the helper lowercases the value before comparison, matching `_get_graphiti`'s existing `(... or "openai").lower()` pattern.
|
||
|
||
---
|
||
|
||
### Documentation
|
||
|
||
| File | Change | Requirements |
|
||
|------|--------|--------------|
|
||
| `.env.example` | Add commented block with the four `RERANKER_*` vars and their defaults. Position adjacent to the existing `EMBEDDING_*` block. | 6.1 |
|
||
| `CLAUDE.md` | Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`. | 6.2 |
|
||
| `README.md` | In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments. | 6.3 |
|
||
| `.kiro/specs/graphiti-neo4j-finalize/research.md` | Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 |
|
||
|
||
> README also has `README-EN.md` and `README-ZH.md` — the canonical user-facing README is `README.md` per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.
|
||
|
||
## Data Models
|
||
Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the `list[tuple[str, float]]` already defined by `CrossEncoderClient.rank`.
|
||
|
||
## Error Handling
|
||
|
||
### Error Strategy
|
||
- **Construction errors**: None possible (no network in `__init__`; no required keys to validate).
|
||
- **Per-passage errors**: Caught inside `OllamaReranker.rank`. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
|
||
- **Whole-call errors** (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost `try/except` in `rank`. Logged at WARNING with model name and error class. Returns `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` — same shape as `_PassthroughReranker` so consumers cannot tell the difference structurally.
|
||
- **Configuration errors**: `_get_graphiti()` raises `ValueError` at startup if `RERANKER_PROVIDER` is unknown. The Flask app fails to boot — preferred over silent misconfiguration.
|
||
|
||
### Error Categories and Responses
|
||
| Category | Trigger | Response |
|
||
|----------|---------|----------|
|
||
| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. |
|
||
| User input (4xx-equivalent) | Unknown `RERANKER_PROVIDER` value | `ValueError` at startup; clear message naming allowed values. |
|
||
| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. |
|
||
|
||
### Monitoring
|
||
- INFO log at startup states the selected provider.
|
||
- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
|
||
- No metrics surface yet; can be added if the reranker becomes a hot path.
|
||
|
||
## Testing Strategy
|
||
|
||
This project intentionally keeps the test surface minimal (`backend/scripts/test_profile_format.py` is the lone pytest target). Per `steering/tech.md`, do **not** add a heavy test harness.
|
||
|
||
- **Unit-level verification** (manual, by the implementer, no committed test files unless small and clearly worth keeping):
|
||
1. Constructing `OllamaReranker` with a bad host does not raise; first `rank()` call logs WARNING and returns passthrough output.
|
||
2. `rank(query, [])` returns `[]` and does not call the client.
|
||
3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
|
||
4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
|
||
- **Integration smoke** (manual): With `qwen2.5:3b` pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. `RERANKER_PROVIDER=none`.
|
||
- **Boundary verification**: Grep that `gpt-4.1-nano` and `OpenAIRerankerClient` do not appear in any new code path.
|
||
|
||
## Supporting References
|
||
- `research.md` — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
|
||
- `gap-analysis.md` — Requirement-to-asset map.
|
||
- `.ticket/39.md` — Source ticket text.
|