24 KiB

Raw Blame History

Design — graphiti-ollama-reranker

Overview

Purpose: Replace the no-op _PassthroughReranker injected into Graphiti with a real Ollama-backed CrossEncoderClient, so that hybrid search results consumed by the ReportAgent tools (SearchResult, InsightForge, Panorama, Interview) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (RERANKER_PROVIDER, RERANKER_MODEL, RERANKER_BASE_URL, RERANKER_API_KEY) with Ollama-aligned defaults; an explicit RERANKER_PROVIDER=none preserves the passthrough for CI and slim containers.

Users: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.

Impact: Adds one new module under backend/app/services/, four Config attributes, a small selection branch in _get_graphiti(), and documentation in .env.example, CLAUDE.md, README.md. No data schema, no API, no UI changes. Behavior under RERANKER_PROVIDER=none is identical to today.

Goals

Default Ollama-backed reranker producing one (passage, score) tuple per input passage, sorted descending by score.
Env-driven configuration with sensible Ollama defaults inherited from existing EMBEDDING_* settings.
Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
Documentation parity with EMBEDDING_* knobs in .env.example, CLAUDE.md, and README.md.

Non-Goals

Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
Changing LLM_MODEL_NAME or EMBEDDING_MODEL defaults.
Upstream contributions to graphiti-core.
Adding a sentence-transformers or other non-openai reranker dependency.

Boundary Commitments

This Spec Owns

The Ollama reranker implementation and its prompt/parse logic.
The RERANKER_PROVIDER, RERANKER_MODEL, RERANKER_BASE_URL, RERANKER_API_KEY settings and their defaults.
The branch in _get_graphiti() that selects between the Ollama reranker and the passthrough.
The startup INFO log line that announces the selected reranker.
Documentation entries in .env.example, CLAUDE.md "Required Environment Variables", and README.md Ollama prerequisites.

Out of Boundary

Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
Per-passage retrieval (still owned by _GraphNamespace.search and Graphiti).
The group_id scoping rules.
Any change to the four ReportAgent tools (SearchResult, InsightForge, Panorama, Interview) — they receive reranked output transparently.
Implementation of additional reranker providers; this design covers only ollama and none.

Allowed Dependencies

Upstream library: graphiti_core.cross_encoder.client.CrossEncoderClient (P0).
In-repo: Config (backend/app/config.py), get_logger (backend/app/utils/logger.py), openai.AsyncOpenAI (already installed).
Existing factory: _get_graphiti() continues to be the singleton chokepoint.

Revalidation Triggers

If graphiti-core changes the CrossEncoderClient.rank signature, this design must be revisited.
If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in research.md).
If Config.GRAPHITI_LLM_PROVIDER semantics change in a way that re-couples LLM and reranker, this design must be checked.

Architecture

Existing Architecture Analysis

_get_graphiti() already injects an explicit cross_encoder=_PassthroughReranker() (line 156). The pattern of double-checked-locking singleton with provider switch (GRAPHITI_LLM_PROVIDER) is mature and must be preserved.
The persistent event loop (_get_loop, _run) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does not need to schedule work onto _get_loop().
All four ReportAgent tools call _GraphNamespace.search, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.

Architecture Pattern & Boundary Map

graph LR
    subgraph Config
        EnvVars[RERANKER_*\nenv vars]
        ConfigCls[Config attributes]
        EnvVars --> ConfigCls
    end

    subgraph Adapter
        Factory[_get_graphiti]
        Passthrough[_PassthroughReranker]
        OllamaCls[OllamaReranker]
        Factory -->|provider=none| Passthrough
        Factory -->|provider=ollama| OllamaCls
    end

    subgraph Graphiti
        GraphitiCore[Graphiti instance]
        Search[_GraphNamespace.search]
        Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
    end

    ConfigCls --> Factory
    Passthrough -->|injected as cross_encoder| GraphitiCore
    OllamaCls -->|injected as cross_encoder| GraphitiCore
    GraphitiCore --> Search
    Search --> Tools

    OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]

Architecture Integration:

Selected pattern: Strategy pattern with two implementations selected at factory time. Same shape as the existing GRAPHITI_LLM_PROVIDER branch.
Domain/feature boundaries: Reranker construction and prompt/parse live in ollama_reranker.py. Wiring lives in graphiti_adapter.py. Config lives in config.py. No overlap.
Existing patterns preserved: Double-checked-locking singleton; explicit cross_encoder injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; Config reads via os.environ.get(..., default).
New components rationale: OllamaReranker is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
Steering compliance: Single OpenAI-SDK convention preserved; per-project group_id scoping unaffected; no new dependency.

Technology Stack

Layer	Choice / Version	Role in Feature	Notes
Backend / Services	Python ≥3.11, async via `asyncio`	Hosts the new reranker class.	Inherits project minimum.
LLM client	`openai` SDK (already pinned, v2.x)	`AsyncOpenAI` chat completions against Ollama's `/v1`.	No new dependency.
Model	Ollama-served chat model, default `qwen2.5:3b`	Produces a numeric relevance score per passage.	Operator may override via `RERANKER_MODEL`.
Endpoint	Ollama's OpenAI-compatible `/v1`	Default `http://localhost:11434/v1`.	Reuses `EMBEDDING_BASE_URL` semantics.
Graph layer	`graphiti-core ≥ 0.3`	Consumes the new `CrossEncoderClient`.	No upstream change.

File Structure Plan

Directory Structure

backend/app/
├── services/
│   ├── graphiti_adapter.py        # MODIFIED — factory branches on RERANKER_PROVIDER
│   └── ollama_reranker.py         # NEW — OllamaReranker(CrossEncoderClient)
├── config.py                      # MODIFIED — adds RERANKER_* attrs
└── utils/
    └── logger.py                  # unchanged

repo-root/
├── .env.example                   # MODIFIED — adds RERANKER_* block
├── CLAUDE.md                      # MODIFIED — Required Environment Variables
└── README.md                      # MODIFIED — Ollama prerequisites note

Modified Files

backend/app/services/graphiti_adapter.py — Add small branch in _get_graphiti() that picks OllamaReranker() or _PassthroughReranker() based on Config.RERANKER_PROVIDER. Log the selection at INFO. _PassthroughReranker class is unchanged.
backend/app/config.py — Add four new class attributes with documented defaults. No change to existing validate() (reranker has no mandatory key).
.env.example — Add a four-line RERANKER_* block with comments mirroring the EMBEDDING_* style.
CLAUDE.md — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
README.md — Update the Ollama prerequisite section to mention ollama pull qwen2.5:3b alongside the existing ollama pull mxbai-embed-large.

_PassthroughReranker stays in graphiti_adapter.py (unchanged contract); only the wiring around it changes.

System Flows

sequenceDiagram
    participant Search as _GraphNamespace.search
    participant Graphiti as graphiti-core
    participant Reranker as OllamaReranker.rank
    participant Ollama as Ollama /v1/chat/completions

    Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
    Graphiti->>Graphiti: hybrid retrieval (RRF)
    Graphiti->>Reranker: rank(query, [p1..pN])
    par per-passage scoring
        Reranker->>Ollama: chat.completions(prompt p1, temp=0)
        Reranker->>Ollama: chat.completions(prompt p2, temp=0)
        Reranker->>Ollama: chat.completions(prompt pN, temp=0)
    end
    alt all scores parsed
        Reranker-->>Graphiti: sorted [(p, score), ...]
    else any failure
        Reranker->>Reranker: log WARNING, return passthrough order
        Reranker-->>Graphiti: original order with synthetic scores
    end
    Graphiti-->>Search: ranked edges/nodes
    Search-->>Tools: ranked results

Decision points after diagram:

temperature=0.0 makes the score deterministic per (query, passage, model) tuple.
Per-passage failures (one bad parse out of N) downrank that passage to 0.0 - 0.001 * index and continue; only whole-call exceptions degrade to passthrough.
The reranker never raises; this isolates Graphiti from upstream noise even when _GraphNamespace.search's existing exception swallow is removed in a future refactor.

Requirements Traceability

Requirement	Summary	Components	Interfaces	Flows
1.1	Default reranker is Ollama-backed	`_get_graphiti()`, `OllamaReranker`	Inline factory branch	Adapter init
1.2	No dependency on `OpenAIRerankerClient`	`_get_graphiti()`	Explicit `cross_encoder=` injection (unchanged behavior)	—
1.3	Unset → defaults to `ollama`	`Config.RERANKER_PROVIDER`	`os.environ.get('RERANKER_PROVIDER', 'ollama')`	—
1.4	No `gpt-4.1-nano` reference	All new files	—	—
2.1	Subclass `CrossEncoderClient.rank`	`OllamaReranker`	`async rank(query, passages) -> list[tuple[str, float]]`	Per-passage scoring
2.2	Uses `openai.AsyncOpenAI`	`OllamaReranker.__init__`	`AsyncOpenAI(base_url, api_key)`	—
2.3	Returns passages sorted descending	`OllamaReranker.rank`	Postcondition: descending by score	—
2.4	Empty input → empty output, no model call	`OllamaReranker.rank`	Guard at method entry	—
2.5	Preserves passage strings byte-for-byte	`OllamaReranker.rank`	Strings are echoed, never rewritten	—
2.6	Unparseable score → deterministic low fallback	`OllamaReranker.rank`	Internal `_parse_score` helper	Failure branch
3.1	`RERANKER_PROVIDER` env knob	`Config`	Class attr, default `ollama`, validated `{ollama, none}`	Adapter init
3.2	`RERANKER_MODEL` env knob	`Config`	Class attr, default `qwen2.5:3b`	—
3.3	`RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL`	`Config`	Class attr resolves at read time	—
3.4	`RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY`	`Config`	Class attr	—
3.5	Unknown value → `ValueError`	`_get_graphiti()`	`_ALLOWED_RERANKER_PROVIDERS` validation	Adapter init
3.6	Reads via `os.environ.get` only	`Config`	—	—
4.1	`none` keeps `_PassthroughReranker`	`_get_graphiti()`	Factory branch	Adapter init
4.2	Graph search remains functional under `none`	`_PassthroughReranker.rank` (unchanged)	—	—
4.3	INFO log announces selected provider	`_get_graphiti()`	`logger.info` line	Adapter init
5.1	WARNING log on rerank failure	`OllamaReranker.rank`	`logger.warning` with model + error class	Failure branch
5.2	No exception propagation to HTTP callers	`OllamaReranker.rank` (never raises)	—	—
5.3	Original order on whole-call failure	`OllamaReranker.rank`	Passthrough fallback inside method	Failure branch
5.4	`__init__` never raises	`OllamaReranker.__init__`	`AsyncOpenAI()` lazy I/O	Adapter init
6.1	`.env.example` documents the four vars	`.env.example`	—	—
6.2	`CLAUDE.md` lists the four vars	`CLAUDE.md`	—	—
6.3	`README.md` mentions `ollama pull <model>`	`README.md`	—	—
6.4	Old "follow-up" claim updated	`graphiti-neo4j-finalize/research.md` (or design.md)	—	—
7.1	Reranked order reaches `_GraphNamespace.search`	`OllamaReranker`, `_get_graphiti()`	Through Graphiti's own `search()`	End-to-end
7.2	No changes to report tools	n/a	n/a	—
7.3	`group_id` scoping unchanged	`_GraphNamespace.search` (unchanged)	—	—

Components and Interfaces

Component	Domain/Layer	Intent	Req Coverage	Key Dependencies (P0/P1)	Contracts
`OllamaReranker`	Backend / Services	Score passages against a query via Ollama chat completions.	1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1	`graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1)	Service
`Config` (extended)	Backend / Config	Expose four new reranker attrs with documented defaults.	1.3, 3.1–3.6, 4.1	`os.environ.get` (P0)	State (configuration)
`_get_graphiti()` (extended)	Backend / Adapter	Pick reranker implementation; validate provider; log selection.	1.1, 1.2, 3.5, 4.1, 4.3	`Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0)	Service
`.env.example`, `CLAUDE.md`, `README.md`	Docs	Communicate new knobs and Ollama prerequisite.	6.1–6.4	—	—

Backend / Services

`OllamaReranker`

Field	Detail
Intent	Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score.
Requirements	1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1

Responsibilities & Constraints

Subclass graphiti_core.cross_encoder.client.CrossEncoderClient; implement only rank.
Use openai.AsyncOpenAI; no second SDK; no top-level network I/O in __init__.
Preserve passage strings byte-for-byte; never rewrite or truncate.
Never raise from rank(). On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
Deterministic scoring: temperature=0.0, no randomness in fallback scores.
Thread-safety: stateless beyond the immutable AsyncOpenAI client and string config; safe under Graphiti's concurrent search.

Dependencies

Inbound: _get_graphiti() — instantiates a single instance and passes it as cross_encoder= to Graphiti(...) (P0).
Outbound: Ollama /v1/chat/completions via openai.AsyncOpenAI (P0).
External: graphiti_core.cross_encoder.client.CrossEncoderClient (P0); openai SDK (P0).

Contracts: Service [x]

Service Interface

class OllamaReranker(CrossEncoderClient):
    def __init__(
        self,
        *,
        model: str,
        base_url: str,
        api_key: str,
    ) -> None: ...

    async def rank(
        self,
        query: str,
        passages: list[str],
    ) -> list[tuple[str, float]]:
        """
        Score each passage's relevance to `query` and return
        `(passage, score)` tuples sorted in descending order of score.

        Preconditions:
            - `passages` is a (possibly empty) list of strings.

        Postconditions:
            - len(return) == len(passages).
            - return is sorted by score descending.
            - For all i, return[i][0] is byte-identical to one of the inputs.
            - For any rank() call, this method does not raise.

        Invariants:
            - Successfully-parsed scores fall in [0.0, 1.0].
            - Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
              and are strictly less than every successfully-parsed score.
        """

Implementation Notes

Integration: Constructed inside _get_graphiti() when Config.RERANKER_PROVIDER == "ollama"; injected into Graphiti(..., cross_encoder=...).
Validation:
- Reject empty passages immediately with return [].
- Clip parsed score to [0.0, 1.0].
- Treat any uncaught per-passage exception as parse failure and assign deterministic fallback -0.001 * passage_index.
- Treat any whole-call exception (e.g. connection refused) as graceful degrade: return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)].
Risks: Default qwen2.5:3b must be ollama pull-ed by operators; documented in README. If absent, R5 path kicks in.

Backend / Config

`Config` (extended)

Field	Detail
Intent	Surface env-driven configuration for the reranker with Ollama-aligned defaults.
Requirements	1.3, 3.1–3.6, 4.1

Responsibilities & Constraints

Read from os.environ.get only; no new dependency.
RERANKER_PROVIDER default ollama; valid values: ollama, none.
RERANKER_MODEL default qwen2.5:3b.
RERANKER_BASE_URL default = EMBEDDING_BASE_URL value at module load time.
RERANKER_API_KEY default = EMBEDDING_API_KEY value at module load time.
Validation of RERANKER_PROVIDER happens in _get_graphiti() (not Config.validate()) to keep the validate-at-boot list focused on credential presence.

Contracts: State [x]

State Management

State model: Read-only class attributes resolved once at import.
Persistence & consistency: None; values come from environment.
Concurrency strategy: Immutable after import; safe.

Implementation Notes

Integration: Defaults for RERANKER_BASE_URL / RERANKER_API_KEY should reference the corresponding EMBEDDING_* env vars (not the resolved Config.EMBEDDING_BASE_URL constant) so an operator setting only EMBEDDING_BASE_URL still gets the reranker pointed at the same Ollama host without needing to set RERANKER_BASE_URL explicitly. Implementation reads os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1')).
Validation: None at config-load time. Provider value is validated by _get_graphiti().
Risks: An operator who overrides EMBEDDING_BASE_URL but not RERANKER_BASE_URL will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.

Backend / Adapter

`_get_graphiti()` (extended)

Field	Detail
Intent	Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice.
Requirements	1.1, 1.2, 3.5, 4.1, 4.3

Responsibilities & Constraints

Preserve double-checked locking and singleton semantics exactly.
Read Config.RERANKER_PROVIDER once at construction; do not re-read.
For ollama: construct OllamaReranker(model=..., base_url=..., api_key=...).
For none: construct _PassthroughReranker() (current behavior preserved).
For any other value: raise ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')") — mirrors the existing _ALLOWED_GRAPHITI_PROVIDERS validation pattern.
Log at INFO once: f"Initializing Graphiti reranker (provider={provider})...".

Contracts: Service [x]

Service Interface

def _get_graphiti() -> Graphiti:
    """Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""

Implementation Notes

Integration: Replaces the unconditional cross_encoder=_PassthroughReranker() at graphiti_adapter.py:156 with a cross_encoder=_build_reranker(provider) call. The factory helper lives next to _build_llm_and_embedder in the same file.
Validation: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
Risks: A typo such as RERANKER_PROVIDER=Ollama (capitalized) would raise; the helper lowercases the value before comparison, matching _get_graphiti's existing (... or "openai").lower() pattern.

Documentation

File	Change	Requirements
`.env.example`	Add commented block with the four `RERANKER_` vars and their defaults. Position adjacent to the existing `EMBEDDING_` block.	6.1
`CLAUDE.md`	Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`.	6.2
`README.md`	In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments.	6.3
`.kiro/specs/graphiti-neo4j-finalize/research.md`	Update the "A real per-provider reranker is a follow-up" claim to point at this spec.	6.4

README also has README-EN.md and README-ZH.md — the canonical user-facing README is README.md per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.

Data Models

Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the list[tuple[str, float]] already defined by CrossEncoderClient.rank.

Error Handling

Error Strategy

Construction errors: None possible (no network in __init__; no required keys to validate).
Per-passage errors: Caught inside OllamaReranker.rank. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
Whole-call errors (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost try/except in rank. Logged at WARNING with model name and error class. Returns [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)] — same shape as _PassthroughReranker so consumers cannot tell the difference structurally.
Configuration errors: _get_graphiti() raises ValueError at startup if RERANKER_PROVIDER is unknown. The Flask app fails to boot — preferred over silent misconfiguration.

Error Categories and Responses

Category	Trigger	Response
System (5xx-equivalent)	Ollama unreachable, timeout	WARNING log; passthrough order; search succeeds.
User input (4xx-equivalent)	Unknown `RERANKER_PROVIDER` value	`ValueError` at startup; clear message naming allowed values.
Business rule	Model emits unparseable score	DEBUG log; per-passage fallback score; passage retained.

Monitoring

INFO log at startup states the selected provider.
WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
No metrics surface yet; can be added if the reranker becomes a hot path.

Testing Strategy

This project intentionally keeps the test surface minimal (backend/scripts/test_profile_format.py is the lone pytest target). Per steering/tech.md, do not add a heavy test harness.

Unit-level verification (manual, by the implementer, no committed test files unless small and clearly worth keeping):
1. Constructing OllamaReranker with a bad host does not raise; first rank() call logs WARNING and returns passthrough output.
2. rank(query, []) returns [] and does not call the client.
3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
Integration smoke (manual): With qwen2.5:3b pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. RERANKER_PROVIDER=none.
Boundary verification: Grep that gpt-4.1-nano and OpenAIRerankerClient do not appear in any new code path.

Supporting References

research.md — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
gap-analysis.md — Requirement-to-asset map.
.ticket/39.md — Source ticket text.

24 KiB Raw Blame History Unescape Escape

Design — graphiti-ollama-reranker

Overview

Goals

Non-Goals

Boundary Commitments

This Spec Owns

Out of Boundary

Allowed Dependencies

Revalidation Triggers

Architecture

Existing Architecture Analysis

Architecture Pattern & Boundary Map

Technology Stack

File Structure Plan

Directory Structure

Modified Files

System Flows

Requirements Traceability

Components and Interfaces

Backend / Services

OllamaReranker

Service Interface

Backend / Config

Config (extended)

State Management

Backend / Adapter

_get_graphiti() (extended)

Service Interface

Documentation

Data Models

Error Handling

Error Strategy

Error Categories and Responses

Monitoring

Testing Strategy

Supporting References

24 KiB

Raw Blame History

`OllamaReranker`

`Config` (extended)

`_get_graphiti()` (extended)