MicroFish/.kiro/specs/graphiti-ollama-reranker/design.md

24 KiB
Raw Blame History

Design — graphiti-ollama-reranker

Overview

Purpose: Replace the no-op _PassthroughReranker injected into Graphiti with a real Ollama-backed CrossEncoderClient, so that hybrid search results consumed by the ReportAgent tools (SearchResult, InsightForge, Panorama, Interview) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (RERANKER_PROVIDER, RERANKER_MODEL, RERANKER_BASE_URL, RERANKER_API_KEY) with Ollama-aligned defaults; an explicit RERANKER_PROVIDER=none preserves the passthrough for CI and slim containers.

Users: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.

Impact: Adds one new module under backend/app/services/, four Config attributes, a small selection branch in _get_graphiti(), and documentation in .env.example, CLAUDE.md, README.md. No data schema, no API, no UI changes. Behavior under RERANKER_PROVIDER=none is identical to today.

Goals

  • Default Ollama-backed reranker producing one (passage, score) tuple per input passage, sorted descending by score.
  • Env-driven configuration with sensible Ollama defaults inherited from existing EMBEDDING_* settings.
  • Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
  • Documentation parity with EMBEDDING_* knobs in .env.example, CLAUDE.md, and README.md.

Non-Goals

  • Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
  • Changing LLM_MODEL_NAME or EMBEDDING_MODEL defaults.
  • Upstream contributions to graphiti-core.
  • Adding a sentence-transformers or other non-openai reranker dependency.

Boundary Commitments

This Spec Owns

  • The Ollama reranker implementation and its prompt/parse logic.
  • The RERANKER_PROVIDER, RERANKER_MODEL, RERANKER_BASE_URL, RERANKER_API_KEY settings and their defaults.
  • The branch in _get_graphiti() that selects between the Ollama reranker and the passthrough.
  • The startup INFO log line that announces the selected reranker.
  • Documentation entries in .env.example, CLAUDE.md "Required Environment Variables", and README.md Ollama prerequisites.

Out of Boundary

  • Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
  • Per-passage retrieval (still owned by _GraphNamespace.search and Graphiti).
  • The group_id scoping rules.
  • Any change to the four ReportAgent tools (SearchResult, InsightForge, Panorama, Interview) — they receive reranked output transparently.
  • Implementation of additional reranker providers; this design covers only ollama and none.

Allowed Dependencies

  • Upstream library: graphiti_core.cross_encoder.client.CrossEncoderClient (P0).
  • In-repo: Config (backend/app/config.py), get_logger (backend/app/utils/logger.py), openai.AsyncOpenAI (already installed).
  • Existing factory: _get_graphiti() continues to be the singleton chokepoint.

Revalidation Triggers

  • If graphiti-core changes the CrossEncoderClient.rank signature, this design must be revisited.
  • If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in research.md).
  • If Config.GRAPHITI_LLM_PROVIDER semantics change in a way that re-couples LLM and reranker, this design must be checked.

Architecture

Existing Architecture Analysis

  • _get_graphiti() already injects an explicit cross_encoder=_PassthroughReranker() (line 156). The pattern of double-checked-locking singleton with provider switch (GRAPHITI_LLM_PROVIDER) is mature and must be preserved.
  • The persistent event loop (_get_loop, _run) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does not need to schedule work onto _get_loop().
  • All four ReportAgent tools call _GraphNamespace.search, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.

Architecture Pattern & Boundary Map

graph LR
    subgraph Config
        EnvVars[RERANKER_*\nenv vars]
        ConfigCls[Config attributes]
        EnvVars --> ConfigCls
    end

    subgraph Adapter
        Factory[_get_graphiti]
        Passthrough[_PassthroughReranker]
        OllamaCls[OllamaReranker]
        Factory -->|provider=none| Passthrough
        Factory -->|provider=ollama| OllamaCls
    end

    subgraph Graphiti
        GraphitiCore[Graphiti instance]
        Search[_GraphNamespace.search]
        Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
    end

    ConfigCls --> Factory
    Passthrough -->|injected as cross_encoder| GraphitiCore
    OllamaCls -->|injected as cross_encoder| GraphitiCore
    GraphitiCore --> Search
    Search --> Tools

    OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]

Architecture Integration:

  • Selected pattern: Strategy pattern with two implementations selected at factory time. Same shape as the existing GRAPHITI_LLM_PROVIDER branch.
  • Domain/feature boundaries: Reranker construction and prompt/parse live in ollama_reranker.py. Wiring lives in graphiti_adapter.py. Config lives in config.py. No overlap.
  • Existing patterns preserved: Double-checked-locking singleton; explicit cross_encoder injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; Config reads via os.environ.get(..., default).
  • New components rationale: OllamaReranker is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
  • Steering compliance: Single OpenAI-SDK convention preserved; per-project group_id scoping unaffected; no new dependency.

Technology Stack

Layer Choice / Version Role in Feature Notes
Backend / Services Python ≥3.11, async via asyncio Hosts the new reranker class. Inherits project minimum.
LLM client openai SDK (already pinned, v2.x) AsyncOpenAI chat completions against Ollama's /v1. No new dependency.
Model Ollama-served chat model, default qwen2.5:3b Produces a numeric relevance score per passage. Operator may override via RERANKER_MODEL.
Endpoint Ollama's OpenAI-compatible /v1 Default http://localhost:11434/v1. Reuses EMBEDDING_BASE_URL semantics.
Graph layer graphiti-core ≥ 0.3 Consumes the new CrossEncoderClient. No upstream change.

File Structure Plan

Directory Structure

backend/app/
├── services/
│   ├── graphiti_adapter.py        # MODIFIED — factory branches on RERANKER_PROVIDER
│   └── ollama_reranker.py         # NEW — OllamaReranker(CrossEncoderClient)
├── config.py                      # MODIFIED — adds RERANKER_* attrs
└── utils/
    └── logger.py                  # unchanged

repo-root/
├── .env.example                   # MODIFIED — adds RERANKER_* block
├── CLAUDE.md                      # MODIFIED — Required Environment Variables
└── README.md                      # MODIFIED — Ollama prerequisites note

Modified Files

  • backend/app/services/graphiti_adapter.py — Add small branch in _get_graphiti() that picks OllamaReranker() or _PassthroughReranker() based on Config.RERANKER_PROVIDER. Log the selection at INFO. _PassthroughReranker class is unchanged.
  • backend/app/config.py — Add four new class attributes with documented defaults. No change to existing validate() (reranker has no mandatory key).
  • .env.example — Add a four-line RERANKER_* block with comments mirroring the EMBEDDING_* style.
  • CLAUDE.md — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
  • README.md — Update the Ollama prerequisite section to mention ollama pull qwen2.5:3b alongside the existing ollama pull mxbai-embed-large.

_PassthroughReranker stays in graphiti_adapter.py (unchanged contract); only the wiring around it changes.

System Flows

sequenceDiagram
    participant Search as _GraphNamespace.search
    participant Graphiti as graphiti-core
    participant Reranker as OllamaReranker.rank
    participant Ollama as Ollama /v1/chat/completions

    Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
    Graphiti->>Graphiti: hybrid retrieval (RRF)
    Graphiti->>Reranker: rank(query, [p1..pN])
    par per-passage scoring
        Reranker->>Ollama: chat.completions(prompt p1, temp=0)
        Reranker->>Ollama: chat.completions(prompt p2, temp=0)
        Reranker->>Ollama: chat.completions(prompt pN, temp=0)
    end
    alt all scores parsed
        Reranker-->>Graphiti: sorted [(p, score), ...]
    else any failure
        Reranker->>Reranker: log WARNING, return passthrough order
        Reranker-->>Graphiti: original order with synthetic scores
    end
    Graphiti-->>Search: ranked edges/nodes
    Search-->>Tools: ranked results

Decision points after diagram:

  • temperature=0.0 makes the score deterministic per (query, passage, model) tuple.
  • Per-passage failures (one bad parse out of N) downrank that passage to 0.0 - 0.001 * index and continue; only whole-call exceptions degrade to passthrough.
  • The reranker never raises; this isolates Graphiti from upstream noise even when _GraphNamespace.search's existing exception swallow is removed in a future refactor.

Requirements Traceability

Requirement Summary Components Interfaces Flows
1.1 Default reranker is Ollama-backed _get_graphiti(), OllamaReranker Inline factory branch Adapter init
1.2 No dependency on OpenAIRerankerClient _get_graphiti() Explicit cross_encoder= injection (unchanged behavior)
1.3 Unset → defaults to ollama Config.RERANKER_PROVIDER os.environ.get('RERANKER_PROVIDER', 'ollama')
1.4 No gpt-4.1-nano reference All new files
2.1 Subclass CrossEncoderClient.rank OllamaReranker async rank(query, passages) -> list[tuple[str, float]] Per-passage scoring
2.2 Uses openai.AsyncOpenAI OllamaReranker.__init__ AsyncOpenAI(base_url, api_key)
2.3 Returns passages sorted descending OllamaReranker.rank Postcondition: descending by score
2.4 Empty input → empty output, no model call OllamaReranker.rank Guard at method entry
2.5 Preserves passage strings byte-for-byte OllamaReranker.rank Strings are echoed, never rewritten
2.6 Unparseable score → deterministic low fallback OllamaReranker.rank Internal _parse_score helper Failure branch
3.1 RERANKER_PROVIDER env knob Config Class attr, default ollama, validated {ollama, none} Adapter init
3.2 RERANKER_MODEL env knob Config Class attr, default qwen2.5:3b
3.3 RERANKER_BASE_URL defaults to EMBEDDING_BASE_URL Config Class attr resolves at read time
3.4 RERANKER_API_KEY defaults to EMBEDDING_API_KEY Config Class attr
3.5 Unknown value → ValueError _get_graphiti() _ALLOWED_RERANKER_PROVIDERS validation Adapter init
3.6 Reads via os.environ.get only Config
4.1 none keeps _PassthroughReranker _get_graphiti() Factory branch Adapter init
4.2 Graph search remains functional under none _PassthroughReranker.rank (unchanged)
4.3 INFO log announces selected provider _get_graphiti() logger.info line Adapter init
5.1 WARNING log on rerank failure OllamaReranker.rank logger.warning with model + error class Failure branch
5.2 No exception propagation to HTTP callers OllamaReranker.rank (never raises)
5.3 Original order on whole-call failure OllamaReranker.rank Passthrough fallback inside method Failure branch
5.4 __init__ never raises OllamaReranker.__init__ AsyncOpenAI() lazy I/O Adapter init
6.1 .env.example documents the four vars .env.example
6.2 CLAUDE.md lists the four vars CLAUDE.md
6.3 README.md mentions ollama pull <model> README.md
6.4 Old "follow-up" claim updated graphiti-neo4j-finalize/research.md (or design.md)
7.1 Reranked order reaches _GraphNamespace.search OllamaReranker, _get_graphiti() Through Graphiti's own search() End-to-end
7.2 No changes to report tools n/a n/a
7.3 group_id scoping unchanged _GraphNamespace.search (unchanged)

Components and Interfaces

Component Domain/Layer Intent Req Coverage Key Dependencies (P0/P1) Contracts
OllamaReranker Backend / Services Score passages against a query via Ollama chat completions. 1.1, 1.4, 2.12.6, 5.15.4, 7.1 graphiti_core.cross_encoder.client.CrossEncoderClient (P0); openai.AsyncOpenAI (P0); Config (P0); get_logger (P1) Service
Config (extended) Backend / Config Expose four new reranker attrs with documented defaults. 1.3, 3.13.6, 4.1 os.environ.get (P0) State (configuration)
_get_graphiti() (extended) Backend / Adapter Pick reranker implementation; validate provider; log selection. 1.1, 1.2, 3.5, 4.1, 4.3 Config (P0); OllamaReranker (P0); _PassthroughReranker (P0); Graphiti (P0) Service
.env.example, CLAUDE.md, README.md Docs Communicate new knobs and Ollama prerequisite. 6.16.4

Backend / Services

OllamaReranker

Field Detail
Intent Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score.
Requirements 1.1, 1.4, 2.12.6, 5.15.4, 7.1

Responsibilities & Constraints

  • Subclass graphiti_core.cross_encoder.client.CrossEncoderClient; implement only rank.
  • Use openai.AsyncOpenAI; no second SDK; no top-level network I/O in __init__.
  • Preserve passage strings byte-for-byte; never rewrite or truncate.
  • Never raise from rank(). On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
  • Deterministic scoring: temperature=0.0, no randomness in fallback scores.
  • Thread-safety: stateless beyond the immutable AsyncOpenAI client and string config; safe under Graphiti's concurrent search.

Dependencies

  • Inbound: _get_graphiti() — instantiates a single instance and passes it as cross_encoder= to Graphiti(...) (P0).
  • Outbound: Ollama /v1/chat/completions via openai.AsyncOpenAI (P0).
  • External: graphiti_core.cross_encoder.client.CrossEncoderClient (P0); openai SDK (P0).

Contracts: Service [x]

Service Interface
class OllamaReranker(CrossEncoderClient):
    def __init__(
        self,
        *,
        model: str,
        base_url: str,
        api_key: str,
    ) -> None: ...

    async def rank(
        self,
        query: str,
        passages: list[str],
    ) -> list[tuple[str, float]]:
        """
        Score each passage's relevance to `query` and return
        `(passage, score)` tuples sorted in descending order of score.

        Preconditions:
            - `passages` is a (possibly empty) list of strings.

        Postconditions:
            - len(return) == len(passages).
            - return is sorted by score descending.
            - For all i, return[i][0] is byte-identical to one of the inputs.
            - For any rank() call, this method does not raise.

        Invariants:
            - Successfully-parsed scores fall in [0.0, 1.0].
            - Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
              and are strictly less than every successfully-parsed score.
        """

Implementation Notes

  • Integration: Constructed inside _get_graphiti() when Config.RERANKER_PROVIDER == "ollama"; injected into Graphiti(..., cross_encoder=...).
  • Validation:
    • Reject empty passages immediately with return [].
    • Clip parsed score to [0.0, 1.0].
    • Treat any uncaught per-passage exception as parse failure and assign deterministic fallback -0.001 * passage_index.
    • Treat any whole-call exception (e.g. connection refused) as graceful degrade: return [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)].
  • Risks: Default qwen2.5:3b must be ollama pull-ed by operators; documented in README. If absent, R5 path kicks in.

Backend / Config

Config (extended)

Field Detail
Intent Surface env-driven configuration for the reranker with Ollama-aligned defaults.
Requirements 1.3, 3.13.6, 4.1

Responsibilities & Constraints

  • Read from os.environ.get only; no new dependency.
  • RERANKER_PROVIDER default ollama; valid values: ollama, none.
  • RERANKER_MODEL default qwen2.5:3b.
  • RERANKER_BASE_URL default = EMBEDDING_BASE_URL value at module load time.
  • RERANKER_API_KEY default = EMBEDDING_API_KEY value at module load time.
  • Validation of RERANKER_PROVIDER happens in _get_graphiti() (not Config.validate()) to keep the validate-at-boot list focused on credential presence.

Contracts: State [x]

State Management
  • State model: Read-only class attributes resolved once at import.
  • Persistence & consistency: None; values come from environment.
  • Concurrency strategy: Immutable after import; safe.

Implementation Notes

  • Integration: Defaults for RERANKER_BASE_URL / RERANKER_API_KEY should reference the corresponding EMBEDDING_* env vars (not the resolved Config.EMBEDDING_BASE_URL constant) so an operator setting only EMBEDDING_BASE_URL still gets the reranker pointed at the same Ollama host without needing to set RERANKER_BASE_URL explicitly. Implementation reads os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1')).
  • Validation: None at config-load time. Provider value is validated by _get_graphiti().
  • Risks: An operator who overrides EMBEDDING_BASE_URL but not RERANKER_BASE_URL will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.

Backend / Adapter

_get_graphiti() (extended)

Field Detail
Intent Select and inject the appropriate CrossEncoderClient based on Config.RERANKER_PROVIDER; log the choice.
Requirements 1.1, 1.2, 3.5, 4.1, 4.3

Responsibilities & Constraints

  • Preserve double-checked locking and singleton semantics exactly.
  • Read Config.RERANKER_PROVIDER once at construction; do not re-read.
  • For ollama: construct OllamaReranker(model=..., base_url=..., api_key=...).
  • For none: construct _PassthroughReranker() (current behavior preserved).
  • For any other value: raise ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')") — mirrors the existing _ALLOWED_GRAPHITI_PROVIDERS validation pattern.
  • Log at INFO once: f"Initializing Graphiti reranker (provider={provider})...".

Contracts: Service [x]

Service Interface
def _get_graphiti() -> Graphiti:
    """Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""

Implementation Notes

  • Integration: Replaces the unconditional cross_encoder=_PassthroughReranker() at graphiti_adapter.py:156 with a cross_encoder=_build_reranker(provider) call. The factory helper lives next to _build_llm_and_embedder in the same file.
  • Validation: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
  • Risks: A typo such as RERANKER_PROVIDER=Ollama (capitalized) would raise; the helper lowercases the value before comparison, matching _get_graphiti's existing (... or "openai").lower() pattern.

Documentation

File Change Requirements
.env.example Add commented block with the four RERANKER_* vars and their defaults. Position adjacent to the existing EMBEDDING_* block. 6.1
CLAUDE.md Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about RERANKER_PROVIDER=none. 6.2
README.md In the "Install Ollama and pull the default embedding model" section, add ollama pull qwen2.5:3b step (or reference the model variable). In the .env snippet, add the four RERANKER_* lines with brief comments. 6.3
.kiro/specs/graphiti-neo4j-finalize/research.md Update the "A real per-provider reranker is a follow-up" claim to point at this spec. 6.4

README also has README-EN.md and README-ZH.md — the canonical user-facing README is README.md per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.

Data Models

Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the list[tuple[str, float]] already defined by CrossEncoderClient.rank.

Error Handling

Error Strategy

  • Construction errors: None possible (no network in __init__; no required keys to validate).
  • Per-passage errors: Caught inside OllamaReranker.rank. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
  • Whole-call errors (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost try/except in rank. Logged at WARNING with model name and error class. Returns [(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)] — same shape as _PassthroughReranker so consumers cannot tell the difference structurally.
  • Configuration errors: _get_graphiti() raises ValueError at startup if RERANKER_PROVIDER is unknown. The Flask app fails to boot — preferred over silent misconfiguration.

Error Categories and Responses

Category Trigger Response
System (5xx-equivalent) Ollama unreachable, timeout WARNING log; passthrough order; search succeeds.
User input (4xx-equivalent) Unknown RERANKER_PROVIDER value ValueError at startup; clear message naming allowed values.
Business rule Model emits unparseable score DEBUG log; per-passage fallback score; passage retained.

Monitoring

  • INFO log at startup states the selected provider.
  • WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
  • No metrics surface yet; can be added if the reranker becomes a hot path.

Testing Strategy

This project intentionally keeps the test surface minimal (backend/scripts/test_profile_format.py is the lone pytest target). Per steering/tech.md, do not add a heavy test harness.

  • Unit-level verification (manual, by the implementer, no committed test files unless small and clearly worth keeping):
    1. Constructing OllamaReranker with a bad host does not raise; first rank() call logs WARNING and returns passthrough output.
    2. rank(query, []) returns [] and does not call the client.
    3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
    4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
  • Integration smoke (manual): With qwen2.5:3b pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. RERANKER_PROVIDER=none.
  • Boundary verification: Grep that gpt-4.1-nano and OpenAIRerankerClient do not appear in any new code path.

Supporting References

  • research.md — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
  • gap-analysis.md — Requirement-to-asset map.
  • .ticket/39.md — Source ticket text.