24 KiB
Design — graphiti-ollama-reranker
Overview
Purpose: Replace the no-op _PassthroughReranker injected into Graphiti with a real Ollama-backed CrossEncoderClient, so that hybrid search results consumed by the ReportAgent tools (SearchResult, InsightForge, Panorama, Interview) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (RERANKER_PROVIDER, RERANKER_MODEL, RERANKER_BASE_URL, RERANKER_API_KEY) with Ollama-aligned defaults; an explicit RERANKER_PROVIDER=none preserves the passthrough for CI and slim containers.
Users: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.
Impact: Adds one new module under backend/app/services/, four Config attributes, a small selection branch in _get_graphiti(), and documentation in .env.example, CLAUDE.md, README.md. No data schema, no API, no UI changes. Behavior under RERANKER_PROVIDER=none is identical to today.
Goals
- Default Ollama-backed reranker producing one
(passage, score)tuple per input passage, sorted descending by score. - Env-driven configuration with sensible Ollama defaults inherited from existing
EMBEDDING_*settings. - Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
- Documentation parity with
EMBEDDING_*knobs in.env.example,CLAUDE.md, andREADME.md.
Non-Goals
- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
- Changing
LLM_MODEL_NAMEorEMBEDDING_MODELdefaults. - Upstream contributions to
graphiti-core. - Adding a
sentence-transformersor other non-openaireranker dependency.
Boundary Commitments
This Spec Owns
- The Ollama reranker implementation and its prompt/parse logic.
- The
RERANKER_PROVIDER,RERANKER_MODEL,RERANKER_BASE_URL,RERANKER_API_KEYsettings and their defaults. - The branch in
_get_graphiti()that selects between the Ollama reranker and the passthrough. - The startup INFO log line that announces the selected reranker.
- Documentation entries in
.env.example,CLAUDE.md"Required Environment Variables", andREADME.mdOllama prerequisites.
Out of Boundary
- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
- Per-passage retrieval (still owned by
_GraphNamespace.searchand Graphiti). - The
group_idscoping rules. - Any change to the four ReportAgent tools (
SearchResult,InsightForge,Panorama,Interview) — they receive reranked output transparently. - Implementation of additional reranker providers; this design covers only
ollamaandnone.
Allowed Dependencies
- Upstream library:
graphiti_core.cross_encoder.client.CrossEncoderClient(P0). - In-repo:
Config(backend/app/config.py),get_logger(backend/app/utils/logger.py),openai.AsyncOpenAI(already installed). - Existing factory:
_get_graphiti()continues to be the singleton chokepoint.
Revalidation Triggers
- If
graphiti-corechanges theCrossEncoderClient.ranksignature, this design must be revisited. - If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in
research.md). - If
Config.GRAPHITI_LLM_PROVIDERsemantics change in a way that re-couples LLM and reranker, this design must be checked.
Architecture
Existing Architecture Analysis
_get_graphiti()already injects an explicitcross_encoder=_PassthroughReranker()(line 156). The pattern of double-checked-locking singleton with provider switch (GRAPHITI_LLM_PROVIDER) is mature and must be preserved.- The persistent event loop (
_get_loop,_run) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does not need to schedule work onto_get_loop(). - All four ReportAgent tools call
_GraphNamespace.search, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.
Architecture Pattern & Boundary Map
graph LR
subgraph Config
EnvVars[RERANKER_*\nenv vars]
ConfigCls[Config attributes]
EnvVars --> ConfigCls
end
subgraph Adapter
Factory[_get_graphiti]
Passthrough[_PassthroughReranker]
OllamaCls[OllamaReranker]
Factory -->|provider=none| Passthrough
Factory -->|provider=ollama| OllamaCls
end
subgraph Graphiti
GraphitiCore[Graphiti instance]
Search[_GraphNamespace.search]
Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
end
ConfigCls --> Factory
Passthrough -->|injected as cross_encoder| GraphitiCore
OllamaCls -->|injected as cross_encoder| GraphitiCore
GraphitiCore --> Search
Search --> Tools
OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]
Architecture Integration:
- Selected pattern: Strategy pattern with two implementations selected at factory time. Same shape as the existing
GRAPHITI_LLM_PROVIDERbranch. - Domain/feature boundaries: Reranker construction and prompt/parse live in
ollama_reranker.py. Wiring lives ingraphiti_adapter.py. Config lives inconfig.py. No overlap. - Existing patterns preserved: Double-checked-locking singleton; explicit
cross_encoderinjection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged;Configreads viaos.environ.get(..., default). - New components rationale:
OllamaRerankeris a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients. - Steering compliance: Single OpenAI-SDK convention preserved; per-project
group_idscoping unaffected; no new dependency.
Technology Stack
| Layer | Choice / Version | Role in Feature | Notes |
|---|---|---|---|
| Backend / Services | Python ≥3.11, async via asyncio |
Hosts the new reranker class. | Inherits project minimum. |
| LLM client | openai SDK (already pinned, v2.x) |
AsyncOpenAI chat completions against Ollama's /v1. |
No new dependency. |
| Model | Ollama-served chat model, default qwen2.5:3b |
Produces a numeric relevance score per passage. | Operator may override via RERANKER_MODEL. |
| Endpoint | Ollama's OpenAI-compatible /v1 |
Default http://localhost:11434/v1. |
Reuses EMBEDDING_BASE_URL semantics. |
| Graph layer | graphiti-core ≥ 0.3 |
Consumes the new CrossEncoderClient. |
No upstream change. |
File Structure Plan
Directory Structure
backend/app/
├── services/
│ ├── graphiti_adapter.py # MODIFIED — factory branches on RERANKER_PROVIDER
│ └── ollama_reranker.py # NEW — OllamaReranker(CrossEncoderClient)
├── config.py # MODIFIED — adds RERANKER_* attrs
└── utils/
└── logger.py # unchanged
repo-root/
├── .env.example # MODIFIED — adds RERANKER_* block
├── CLAUDE.md # MODIFIED — Required Environment Variables
└── README.md # MODIFIED — Ollama prerequisites note
Modified Files
backend/app/services/graphiti_adapter.py— Add small branch in_get_graphiti()that picksOllamaReranker()or_PassthroughReranker()based onConfig.RERANKER_PROVIDER. Log the selection at INFO._PassthroughRerankerclass is unchanged.backend/app/config.py— Add four new class attributes with documented defaults. No change to existingvalidate()(reranker has no mandatory key)..env.example— Add a four-lineRERANKER_*block with comments mirroring theEMBEDDING_*style.CLAUDE.md— Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.README.md— Update the Ollama prerequisite section to mentionollama pull qwen2.5:3balongside the existingollama pull mxbai-embed-large.
_PassthroughRerankerstays ingraphiti_adapter.py(unchanged contract); only the wiring around it changes.
System Flows
sequenceDiagram
participant Search as _GraphNamespace.search
participant Graphiti as graphiti-core
participant Reranker as OllamaReranker.rank
participant Ollama as Ollama /v1/chat/completions
Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
Graphiti->>Graphiti: hybrid retrieval (RRF)
Graphiti->>Reranker: rank(query, [p1..pN])
par per-passage scoring
Reranker->>Ollama: chat.completions(prompt p1, temp=0)
Reranker->>Ollama: chat.completions(prompt p2, temp=0)
Reranker->>Ollama: chat.completions(prompt pN, temp=0)
end
alt all scores parsed
Reranker-->>Graphiti: sorted [(p, score), ...]
else any failure
Reranker->>Reranker: log WARNING, return passthrough order
Reranker-->>Graphiti: original order with synthetic scores
end
Graphiti-->>Search: ranked edges/nodes
Search-->>Tools: ranked results
Decision points after diagram:
temperature=0.0makes the score deterministic per (query, passage, model) tuple.- Per-passage failures (one bad parse out of N) downrank that passage to
0.0 - 0.001 * indexand continue; only whole-call exceptions degrade to passthrough. - The reranker never raises; this isolates Graphiti from upstream noise even when
_GraphNamespace.search's existing exception swallow is removed in a future refactor.
Requirements Traceability
| Requirement | Summary | Components | Interfaces | Flows |
|---|---|---|---|---|
| 1.1 | Default reranker is Ollama-backed | _get_graphiti(), OllamaReranker |
Inline factory branch | Adapter init |
| 1.2 | No dependency on OpenAIRerankerClient |
_get_graphiti() |
Explicit cross_encoder= injection (unchanged behavior) |
— |
| 1.3 | Unset → defaults to ollama |
Config.RERANKER_PROVIDER |
os.environ.get('RERANKER_PROVIDER', 'ollama') |
— |
| 1.4 | No gpt-4.1-nano reference |
All new files | — | — |
| 2.1 | Subclass CrossEncoderClient.rank |
OllamaReranker |
async rank(query, passages) -> list[tuple[str, float]] |
Per-passage scoring |
| 2.2 | Uses openai.AsyncOpenAI |
OllamaReranker.__init__ |
AsyncOpenAI(base_url, api_key) |
— |
| 2.3 | Returns passages sorted descending | OllamaReranker.rank |
Postcondition: descending by score | — |
| 2.4 | Empty input → empty output, no model call | OllamaReranker.rank |
Guard at method entry | — |
| 2.5 | Preserves passage strings byte-for-byte | OllamaReranker.rank |
Strings are echoed, never rewritten | — |
| 2.6 | Unparseable score → deterministic low fallback | OllamaReranker.rank |
Internal _parse_score helper |
Failure branch |
| 3.1 | RERANKER_PROVIDER env knob |
Config |
Class attr, default ollama, validated {ollama, none} |
Adapter init |
| 3.2 | RERANKER_MODEL env knob |
Config |
Class attr, default qwen2.5:3b |
— |
| 3.3 | RERANKER_BASE_URL defaults to EMBEDDING_BASE_URL |
Config |
Class attr resolves at read time | — |
| 3.4 | RERANKER_API_KEY defaults to EMBEDDING_API_KEY |
Config |
Class attr | — |
| 3.5 | Unknown value → ValueError |
_get_graphiti() |
_ALLOWED_RERANKER_PROVIDERS validation |
Adapter init |
| 3.6 | Reads via os.environ.get only |
Config |
— | — |
| 4.1 | none keeps _PassthroughReranker |
_get_graphiti() |
Factory branch | Adapter init |
| 4.2 | Graph search remains functional under none |
_PassthroughReranker.rank (unchanged) |
— | — |
| 4.3 | INFO log announces selected provider | _get_graphiti() |
logger.info line |
Adapter init |
| 5.1 | WARNING log on rerank failure | OllamaReranker.rank |
logger.warning with model + error class |
Failure branch |
| 5.2 | No exception propagation to HTTP callers | OllamaReranker.rank (never raises) |
— | — |
| 5.3 | Original order on whole-call failure | OllamaReranker.rank |
Passthrough fallback inside method | Failure branch |
| 5.4 | __init__ never raises |
OllamaReranker.__init__ |
AsyncOpenAI() lazy I/O |
Adapter init |
| 6.1 | .env.example documents the four vars |
.env.example |
— | — |
| 6.2 | CLAUDE.md lists the four vars |
CLAUDE.md |
— | — |
| 6.3 | README.md mentions ollama pull <model> |
README.md |
— | — |
| 6.4 | Old "follow-up" claim updated | graphiti-neo4j-finalize/research.md (or design.md) |
— | — |
| 7.1 | Reranked order reaches _GraphNamespace.search |
OllamaReranker, _get_graphiti() |
Through Graphiti's own search() |
End-to-end |
| 7.2 | No changes to report tools | n/a | n/a | — |
| 7.3 | group_id scoping unchanged |
_GraphNamespace.search (unchanged) |
— | — |
Components and Interfaces
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|---|---|---|---|---|---|
OllamaReranker |
Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | graphiti_core.cross_encoder.client.CrossEncoderClient (P0); openai.AsyncOpenAI (P0); Config (P0); get_logger (P1) |
Service |
Config (extended) |
Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.1–3.6, 4.1 | os.environ.get (P0) |
State (configuration) |
_get_graphiti() (extended) |
Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | Config (P0); OllamaReranker (P0); _PassthroughReranker (P0); Graphiti (P0) |
Service |
.env.example, CLAUDE.md, README.md |
Docs | Communicate new knobs and Ollama prerequisite. | 6.1–6.4 | — | — |
Backend / Services
OllamaReranker
| Field | Detail |
|---|---|
| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. |
| Requirements | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 |
Responsibilities & Constraints
- Subclass
graphiti_core.cross_encoder.client.CrossEncoderClient; implement onlyrank. - Use
openai.AsyncOpenAI; no second SDK; no top-level network I/O in__init__. - Preserve passage strings byte-for-byte; never rewrite or truncate.
- Never raise from
rank(). On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores. - Deterministic scoring:
temperature=0.0, no randomness in fallback scores. - Thread-safety: stateless beyond the immutable
AsyncOpenAIclient and string config; safe under Graphiti's concurrent search.
Dependencies
- Inbound:
_get_graphiti()— instantiates a single instance and passes it ascross_encoder=toGraphiti(...)(P0). - Outbound:
Ollama /v1/chat/completionsviaopenai.AsyncOpenAI(P0). - External:
graphiti_core.cross_encoder.client.CrossEncoderClient(P0);openaiSDK (P0).
Contracts: Service [x]
Service Interface
class OllamaReranker(CrossEncoderClient):
def __init__(
self,
*,
model: str,
base_url: str,
api_key: str,
) -> None: ...
async def rank(
self,
query: str,
passages: list[str],
) -> list[tuple[str, float]]:
"""
Score each passage's relevance to `query` and return
`(passage, score)` tuples sorted in descending order of score.
Preconditions:
- `passages` is a (possibly empty) list of strings.
Postconditions:
- len(return) == len(passages).
- return is sorted by score descending.
- For all i, return[i][0] is byte-identical to one of the inputs.
- For any rank() call, this method does not raise.
Invariants:
- Successfully-parsed scores fall in [0.0, 1.0].
- Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
and are strictly less than every successfully-parsed score.
"""
Implementation Notes
- Integration: Constructed inside
_get_graphiti()whenConfig.RERANKER_PROVIDER == "ollama"; injected intoGraphiti(..., cross_encoder=...). - Validation:
- Reject empty
passagesimmediately withreturn []. - Clip parsed
scoreto[0.0, 1.0]. - Treat any uncaught per-passage exception as parse failure and assign deterministic fallback
-0.001 * passage_index. - Treat any whole-call exception (e.g. connection refused) as graceful degrade: return
[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)].
- Reject empty
- Risks: Default
qwen2.5:3bmust beollama pull-ed by operators; documented in README. If absent, R5 path kicks in.
Backend / Config
Config (extended)
| Field | Detail |
|---|---|
| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. |
| Requirements | 1.3, 3.1–3.6, 4.1 |
Responsibilities & Constraints
- Read from
os.environ.getonly; no new dependency. RERANKER_PROVIDERdefaultollama; valid values:ollama,none.RERANKER_MODELdefaultqwen2.5:3b.RERANKER_BASE_URLdefault =EMBEDDING_BASE_URLvalue at module load time.RERANKER_API_KEYdefault =EMBEDDING_API_KEYvalue at module load time.- Validation of
RERANKER_PROVIDERhappens in_get_graphiti()(notConfig.validate()) to keep the validate-at-boot list focused on credential presence.
Contracts: State [x]
State Management
- State model: Read-only class attributes resolved once at import.
- Persistence & consistency: None; values come from environment.
- Concurrency strategy: Immutable after import; safe.
Implementation Notes
- Integration: Defaults for
RERANKER_BASE_URL/RERANKER_API_KEYshould reference the correspondingEMBEDDING_*env vars (not the resolvedConfig.EMBEDDING_BASE_URLconstant) so an operator setting onlyEMBEDDING_BASE_URLstill gets the reranker pointed at the same Ollama host without needing to setRERANKER_BASE_URLexplicitly. Implementation readsos.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1')). - Validation: None at config-load time. Provider value is validated by
_get_graphiti(). - Risks: An operator who overrides
EMBEDDING_BASE_URLbut notRERANKER_BASE_URLwill silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.
Backend / Adapter
_get_graphiti() (extended)
| Field | Detail |
|---|---|
| Intent | Select and inject the appropriate CrossEncoderClient based on Config.RERANKER_PROVIDER; log the choice. |
| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 |
Responsibilities & Constraints
- Preserve double-checked locking and singleton semantics exactly.
- Read
Config.RERANKER_PROVIDERonce at construction; do not re-read. - For
ollama: constructOllamaReranker(model=..., base_url=..., api_key=...). - For
none: construct_PassthroughReranker()(current behavior preserved). - For any other value: raise
ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")— mirrors the existing_ALLOWED_GRAPHITI_PROVIDERSvalidation pattern. - Log at INFO once:
f"Initializing Graphiti reranker (provider={provider})...".
Contracts: Service [x]
Service Interface
def _get_graphiti() -> Graphiti:
"""Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""
Implementation Notes
- Integration: Replaces the unconditional
cross_encoder=_PassthroughReranker()atgraphiti_adapter.py:156with across_encoder=_build_reranker(provider)call. The factory helper lives next to_build_llm_and_embedderin the same file. - Validation: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
- Risks: A typo such as
RERANKER_PROVIDER=Ollama(capitalized) would raise; the helper lowercases the value before comparison, matching_get_graphiti's existing(... or "openai").lower()pattern.
Documentation
| File | Change | Requirements |
|---|---|---|
.env.example |
Add commented block with the four RERANKER_* vars and their defaults. Position adjacent to the existing EMBEDDING_* block. |
6.1 |
CLAUDE.md |
Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about RERANKER_PROVIDER=none. |
6.2 |
README.md |
In the "Install Ollama and pull the default embedding model" section, add ollama pull qwen2.5:3b step (or reference the model variable). In the .env snippet, add the four RERANKER_* lines with brief comments. |
6.3 |
.kiro/specs/graphiti-neo4j-finalize/research.md |
Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 |
README also has
README-EN.mdandREADME-ZH.md— the canonical user-facing README isREADME.mdper the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.
Data Models
Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the list[tuple[str, float]] already defined by CrossEncoderClient.rank.
Error Handling
Error Strategy
- Construction errors: None possible (no network in
__init__; no required keys to validate). - Per-passage errors: Caught inside
OllamaReranker.rank. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once. - Whole-call errors (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost
try/exceptinrank. Logged at WARNING with model name and error class. Returns[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]— same shape as_PassthroughRerankerso consumers cannot tell the difference structurally. - Configuration errors:
_get_graphiti()raisesValueErrorat startup ifRERANKER_PROVIDERis unknown. The Flask app fails to boot — preferred over silent misconfiguration.
Error Categories and Responses
| Category | Trigger | Response |
|---|---|---|
| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. |
| User input (4xx-equivalent) | Unknown RERANKER_PROVIDER value |
ValueError at startup; clear message naming allowed values. |
| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. |
Monitoring
- INFO log at startup states the selected provider.
- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
- No metrics surface yet; can be added if the reranker becomes a hot path.
Testing Strategy
This project intentionally keeps the test surface minimal (backend/scripts/test_profile_format.py is the lone pytest target). Per steering/tech.md, do not add a heavy test harness.
- Unit-level verification (manual, by the implementer, no committed test files unless small and clearly worth keeping):
- Constructing
OllamaRerankerwith a bad host does not raise; firstrank()call logs WARNING and returns passthrough output. rank(query, [])returns[]and does not call the client.- Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
- Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
- Constructing
- Integration smoke (manual): With
qwen2.5:3bpulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs.RERANKER_PROVIDER=none. - Boundary verification: Grep that
gpt-4.1-nanoandOpenAIRerankerClientdo not appear in any new code path.
Supporting References
research.md— Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.gap-analysis.md— Requirement-to-asset map..ticket/39.md— Source ticket text.