MicroFish/.kiro/specs/graphiti-ollama-reranker/design.md

# Design — graphiti-ollama-reranker

## Overview
**Purpose**: Replace the no-op `_PassthroughReranker` injected into Graphiti with a real Ollama-backed `CrossEncoderClient`, so that hybrid search results consumed by the ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) are ordered by model-judged relevance rather than Graphiti's RRF fallback ordering. Configuration is env-driven (`RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY`) with Ollama-aligned defaults; an explicit `RERANKER_PROVIDER=none` preserves the passthrough for CI and slim containers.

**Users**: Backend developers running the local-first stack against Ollama; operators deploying MiroFish behind any OpenAI-compatible reranker endpoint; CI users who explicitly disable reranking.

**Impact**: Adds one new module under `backend/app/services/`, four `Config` attributes, a small selection branch in `_get_graphiti()`, and documentation in `.env.example`, `CLAUDE.md`, `README.md`. No data schema, no API, no UI changes. Behavior under `RERANKER_PROVIDER=none` is identical to today.

### Goals
- Default Ollama-backed reranker producing one `(passage, score)` tuple per input passage, sorted descending by score.
- Env-driven configuration with sensible Ollama defaults inherited from existing `EMBEDDING_*` settings.
- Graceful degradation: Flask boots and graph search keeps working even when the Ollama service or the configured model is unavailable.
- Documentation parity with `EMBEDDING_*` knobs in `.env.example`, `CLAUDE.md`, and `README.md`.

### Non-Goals
- Building a Dashscope/OpenAI/Gemini reranker (out of scope per ticket #39).
- Changing `LLM_MODEL_NAME` or `EMBEDDING_MODEL` defaults.
- Upstream contributions to `graphiti-core`.
- Adding a `sentence-transformers` or other non-`openai` reranker dependency.

## Boundary Commitments

### This Spec Owns
- The Ollama reranker implementation and its prompt/parse logic.
- The `RERANKER_PROVIDER`, `RERANKER_MODEL`, `RERANKER_BASE_URL`, `RERANKER_API_KEY` settings and their defaults.
- The branch in `_get_graphiti()` that selects between the Ollama reranker and the passthrough.
- The startup INFO log line that announces the selected reranker.
- Documentation entries in `.env.example`, `CLAUDE.md` "Required Environment Variables", and `README.md` Ollama prerequisites.

### Out of Boundary
- Graphiti's own search ranking, hybrid retrieval, or embedding pipeline.
- Per-passage retrieval (still owned by `_GraphNamespace.search` and Graphiti).
- The `group_id` scoping rules.
- Any change to the four ReportAgent tools (`SearchResult`, `InsightForge`, `Panorama`, `Interview`) — they receive reranked output transparently.
- Implementation of additional reranker providers; this design covers only `ollama` and `none`.

### Allowed Dependencies
- Upstream library: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0).
- In-repo: `Config` (`backend/app/config.py`), `get_logger` (`backend/app/utils/logger.py`), `openai.AsyncOpenAI` (already installed).
- Existing factory: `_get_graphiti()` continues to be the singleton chokepoint.

### Revalidation Triggers
- If `graphiti-core` changes the `CrossEncoderClient.rank` signature, this design must be revisited.
- If a future spec adds a third reranker provider, the inline branch should be considered for promotion to a registry (Option C in `research.md`).
- If `Config.GRAPHITI_LLM_PROVIDER` semantics change in a way that re-couples LLM and reranker, this design must be checked.

## Architecture

### Existing Architecture Analysis
- `_get_graphiti()` already injects an explicit `cross_encoder=_PassthroughReranker()` (line 156). The pattern of double-checked-locking singleton with provider switch (`GRAPHITI_LLM_PROVIDER`) is mature and must be preserved.
- The persistent event loop (`_get_loop`, `_run`) is used for Graphiti async calls from the synchronous Flask layer. The reranker itself runs inside Graphiti's own awaited path; the new reranker therefore does **not** need to schedule work onto `_get_loop()`.
- All four ReportAgent tools call `_GraphNamespace.search`, which already swallows reranker exceptions into a logged warning. The new reranker tightens this further by handling its own errors internally so it never raises.

### Architecture Pattern & Boundary Map

```mermaid
graph LR
    subgraph Config
        EnvVars[RERANKER_*\nenv vars]
        ConfigCls[Config attributes]
        EnvVars --> ConfigCls
    end

    subgraph Adapter
        Factory[_get_graphiti]
        Passthrough[_PassthroughReranker]
        OllamaCls[OllamaReranker]
        Factory -->|provider=none| Passthrough
        Factory -->|provider=ollama| OllamaCls
    end

    subgraph Graphiti
        GraphitiCore[Graphiti instance]
        Search[_GraphNamespace.search]
        Tools[Report tools\nSearchResult, InsightForge,\nPanorama, Interview]
    end

    ConfigCls --> Factory
    Passthrough -->|injected as cross_encoder| GraphitiCore
    OllamaCls -->|injected as cross_encoder| GraphitiCore
    GraphitiCore --> Search
    Search --> Tools

    OllamaCls -->|chat.completions| Ollama[Ollama OpenAI\n-compatible endpoint]
```

**Architecture Integration**:
- **Selected pattern**: Strategy pattern with two implementations selected at factory time. Same shape as the existing `GRAPHITI_LLM_PROVIDER` branch.
- **Domain/feature boundaries**: Reranker construction and prompt/parse live in `ollama_reranker.py`. Wiring lives in `graphiti_adapter.py`. Config lives in `config.py`. No overlap.
- **Existing patterns preserved**: Double-checked-locking singleton; explicit `cross_encoder` injection (Graphiti never falls back to its OpenAI default); persistent event loop unchanged; `Config` reads via `os.environ.get(..., default)`.
- **New components rationale**: `OllamaReranker` is a new boundary because it owns external I/O against a different endpoint (the Ollama chat surface), separate from the existing OpenAI embedder/LLM clients.
- **Steering compliance**: Single OpenAI-SDK convention preserved; per-project `group_id` scoping unaffected; no new dependency.

### Technology Stack

| Layer | Choice / Version | Role in Feature | Notes |
|-------|------------------|-----------------|-------|
| Backend / Services | Python ≥3.11, async via `asyncio` | Hosts the new reranker class. | Inherits project minimum. |
| LLM client | `openai` SDK (already pinned, v2.x) | `AsyncOpenAI` chat completions against Ollama's `/v1`. | No new dependency. |
| Model | Ollama-served chat model, default `qwen2.5:3b` | Produces a numeric relevance score per passage. | Operator may override via `RERANKER_MODEL`. |
| Endpoint | Ollama's OpenAI-compatible `/v1` | Default `http://localhost:11434/v1`. | Reuses `EMBEDDING_BASE_URL` semantics. |
| Graph layer | `graphiti-core ≥ 0.3` | Consumes the new `CrossEncoderClient`. | No upstream change. |

## File Structure Plan

### Directory Structure
```
backend/app/
├── services/
│   ├── graphiti_adapter.py        # MODIFIED — factory branches on RERANKER_PROVIDER
│   └── ollama_reranker.py         # NEW — OllamaReranker(CrossEncoderClient)
├── config.py                      # MODIFIED — adds RERANKER_* attrs
└── utils/
    └── logger.py                  # unchanged

repo-root/
├── .env.example                   # MODIFIED — adds RERANKER_* block
├── CLAUDE.md                      # MODIFIED — Required Environment Variables
└── README.md                      # MODIFIED — Ollama prerequisites note
```

### Modified Files
- `backend/app/services/graphiti_adapter.py` — Add small branch in `_get_graphiti()` that picks `OllamaReranker()` or `_PassthroughReranker()` based on `Config.RERANKER_PROVIDER`. Log the selection at INFO. `_PassthroughReranker` class is unchanged.
- `backend/app/config.py` — Add four new class attributes with documented defaults. No change to existing `validate()` (reranker has no mandatory key).
- `.env.example` — Add a four-line `RERANKER_*` block with comments mirroring the `EMBEDDING_*` style.
- `CLAUDE.md` — Extend the "Required Environment Variables" code block under "Architecture" with the four new vars.
- `README.md` — Update the Ollama prerequisite section to mention `ollama pull qwen2.5:3b` alongside the existing `ollama pull mxbai-embed-large`.

> `_PassthroughReranker` stays in `graphiti_adapter.py` (unchanged contract); only the wiring around it changes.

## System Flows

```mermaid
sequenceDiagram
    participant Search as _GraphNamespace.search
    participant Graphiti as graphiti-core
    participant Reranker as OllamaReranker.rank
    participant Ollama as Ollama /v1/chat/completions

    Search->>Graphiti: search(query, group_ids=[gid], num_results=N)
    Graphiti->>Graphiti: hybrid retrieval (RRF)
    Graphiti->>Reranker: rank(query, [p1..pN])
    par per-passage scoring
        Reranker->>Ollama: chat.completions(prompt p1, temp=0)
        Reranker->>Ollama: chat.completions(prompt p2, temp=0)
        Reranker->>Ollama: chat.completions(prompt pN, temp=0)
    end
    alt all scores parsed
        Reranker-->>Graphiti: sorted [(p, score), ...]
    else any failure
        Reranker->>Reranker: log WARNING, return passthrough order
        Reranker-->>Graphiti: original order with synthetic scores
    end
    Graphiti-->>Search: ranked edges/nodes
    Search-->>Tools: ranked results
```

**Decision points after diagram**:
- `temperature=0.0` makes the score deterministic per (query, passage, model) tuple.
- Per-passage failures (one bad parse out of N) downrank that passage to `0.0 - 0.001 * index` and continue; only whole-call exceptions degrade to passthrough.
- The reranker never raises; this isolates Graphiti from upstream noise even when `_GraphNamespace.search`'s existing exception swallow is removed in a future refactor.

## Requirements Traceability

| Requirement | Summary | Components | Interfaces | Flows |
|-------------|---------|------------|------------|-------|
| 1.1 | Default reranker is Ollama-backed | `_get_graphiti()`, `OllamaReranker` | Inline factory branch | Adapter init |
| 1.2 | No dependency on `OpenAIRerankerClient` | `_get_graphiti()` | Explicit `cross_encoder=` injection (unchanged behavior) | — |
| 1.3 | Unset → defaults to `ollama` | `Config.RERANKER_PROVIDER` | `os.environ.get('RERANKER_PROVIDER', 'ollama')` | — |
| 1.4 | No `gpt-4.1-nano` reference | All new files | — | — |
| 2.1 | Subclass `CrossEncoderClient.rank` | `OllamaReranker` | `async rank(query, passages) -> list[tuple[str, float]]` | Per-passage scoring |
| 2.2 | Uses `openai.AsyncOpenAI` | `OllamaReranker.__init__` | `AsyncOpenAI(base_url, api_key)` | — |
| 2.3 | Returns passages sorted descending | `OllamaReranker.rank` | Postcondition: descending by score | — |
| 2.4 | Empty input → empty output, no model call | `OllamaReranker.rank` | Guard at method entry | — |
| 2.5 | Preserves passage strings byte-for-byte | `OllamaReranker.rank` | Strings are echoed, never rewritten | — |
| 2.6 | Unparseable score → deterministic low fallback | `OllamaReranker.rank` | Internal `_parse_score` helper | Failure branch |
| 3.1 | `RERANKER_PROVIDER` env knob | `Config` | Class attr, default `ollama`, validated `{ollama, none}` | Adapter init |
| 3.2 | `RERANKER_MODEL` env knob | `Config` | Class attr, default `qwen2.5:3b` | — |
| 3.3 | `RERANKER_BASE_URL` defaults to `EMBEDDING_BASE_URL` | `Config` | Class attr resolves at read time | — |
| 3.4 | `RERANKER_API_KEY` defaults to `EMBEDDING_API_KEY` | `Config` | Class attr | — |
| 3.5 | Unknown value → `ValueError` | `_get_graphiti()` | `_ALLOWED_RERANKER_PROVIDERS` validation | Adapter init |
| 3.6 | Reads via `os.environ.get` only | `Config` | — | — |
| 4.1 | `none` keeps `_PassthroughReranker` | `_get_graphiti()` | Factory branch | Adapter init |
| 4.2 | Graph search remains functional under `none` | `_PassthroughReranker.rank` (unchanged) | — | — |
| 4.3 | INFO log announces selected provider | `_get_graphiti()` | `logger.info` line | Adapter init |
| 5.1 | WARNING log on rerank failure | `OllamaReranker.rank` | `logger.warning` with model + error class | Failure branch |
| 5.2 | No exception propagation to HTTP callers | `OllamaReranker.rank` (never raises) | — | — |
| 5.3 | Original order on whole-call failure | `OllamaReranker.rank` | Passthrough fallback inside method | Failure branch |
| 5.4 | `__init__` never raises | `OllamaReranker.__init__` | `AsyncOpenAI()` lazy I/O | Adapter init |
| 6.1 | `.env.example` documents the four vars | `.env.example` | — | — |
| 6.2 | `CLAUDE.md` lists the four vars | `CLAUDE.md` | — | — |
| 6.3 | `README.md` mentions `ollama pull <model>` | `README.md` | — | — |
| 6.4 | Old "follow-up" claim updated | `graphiti-neo4j-finalize/research.md` (or design.md) | — | — |
| 7.1 | Reranked order reaches `_GraphNamespace.search` | `OllamaReranker`, `_get_graphiti()` | Through Graphiti's own `search()` | End-to-end |
| 7.2 | No changes to report tools | n/a | n/a | — |
| 7.3 | `group_id` scoping unchanged | `_GraphNamespace.search` (unchanged) | — | — |

## Components and Interfaces

| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|-----------|--------------|--------|--------------|--------------------------|-----------|
| `OllamaReranker` | Backend / Services | Score passages against a query via Ollama chat completions. | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 | `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai.AsyncOpenAI` (P0); `Config` (P0); `get_logger` (P1) | Service |
| `Config` (extended) | Backend / Config | Expose four new reranker attrs with documented defaults. | 1.3, 3.1–3.6, 4.1 | `os.environ.get` (P0) | State (configuration) |
| `_get_graphiti()` (extended) | Backend / Adapter | Pick reranker implementation; validate provider; log selection. | 1.1, 1.2, 3.5, 4.1, 4.3 | `Config` (P0); `OllamaReranker` (P0); `_PassthroughReranker` (P0); `Graphiti` (P0) | Service |
| `.env.example`, `CLAUDE.md`, `README.md` | Docs | Communicate new knobs and Ollama prerequisite. | 6.1–6.4 | — | — |

---

### Backend / Services

#### `OllamaReranker`

| Field | Detail |
|-------|--------|
| Intent | Score each passage's relevance to a query via an Ollama-served chat model, returning passages sorted descending by score. |
| Requirements | 1.1, 1.4, 2.1–2.6, 5.1–5.4, 7.1 |

**Responsibilities & Constraints**
- Subclass `graphiti_core.cross_encoder.client.CrossEncoderClient`; implement only `rank`.
- Use `openai.AsyncOpenAI`; no second SDK; no top-level network I/O in `__init__`.
- Preserve passage strings byte-for-byte; never rewrite or truncate.
- Never raise from `rank()`. On any failure path, log once at WARNING and fall back to passthrough order with deterministic synthetic scores.
- Deterministic scoring: `temperature=0.0`, no randomness in fallback scores.
- Thread-safety: stateless beyond the immutable `AsyncOpenAI` client and string config; safe under Graphiti's concurrent search.

**Dependencies**
- Inbound: `_get_graphiti()` — instantiates a single instance and passes it as `cross_encoder=` to `Graphiti(...)` (P0).
- Outbound: `Ollama /v1/chat/completions` via `openai.AsyncOpenAI` (P0).
- External: `graphiti_core.cross_encoder.client.CrossEncoderClient` (P0); `openai` SDK (P0).

**Contracts**: Service [x]

##### Service Interface

```python
class OllamaReranker(CrossEncoderClient):
    def __init__(
        self,
        *,
        model: str,
        base_url: str,
        api_key: str,
    ) -> None: ...

    async def rank(
        self,
        query: str,
        passages: list[str],
    ) -> list[tuple[str, float]]:
        """
        Score each passage's relevance to `query` and return
        `(passage, score)` tuples sorted in descending order of score.

        Preconditions:
            - `passages` is a (possibly empty) list of strings.

        Postconditions:
            - len(return) == len(passages).
            - return is sorted by score descending.
            - For all i, return[i][0] is byte-identical to one of the inputs.
            - For any rank() call, this method does not raise.

        Invariants:
            - Successfully-parsed scores fall in [0.0, 1.0].
            - Fallback scores assigned to unparseable passages fall in [-1.0, 0.0)
              and are strictly less than every successfully-parsed score.
        """
```

**Implementation Notes**
- **Integration**: Constructed inside `_get_graphiti()` when `Config.RERANKER_PROVIDER == "ollama"`; injected into `Graphiti(..., cross_encoder=...)`.
- **Validation**:
  - Reject empty `passages` immediately with `return []`.
  - Clip parsed `score` to `[0.0, 1.0]`.
  - Treat any uncaught per-passage exception as parse failure and assign deterministic fallback `-0.001 * passage_index`.
  - Treat any whole-call exception (e.g. connection refused) as graceful degrade: return `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]`.
- **Risks**: Default `qwen2.5:3b` must be `ollama pull`-ed by operators; documented in README. If absent, R5 path kicks in.

---

### Backend / Config

#### `Config` (extended)

| Field | Detail |
|-------|--------|
| Intent | Surface env-driven configuration for the reranker with Ollama-aligned defaults. |
| Requirements | 1.3, 3.1–3.6, 4.1 |

**Responsibilities & Constraints**
- Read from `os.environ.get` only; no new dependency.
- `RERANKER_PROVIDER` default `ollama`; valid values: `ollama`, `none`.
- `RERANKER_MODEL` default `qwen2.5:3b`.
- `RERANKER_BASE_URL` default = `EMBEDDING_BASE_URL` value at module load time.
- `RERANKER_API_KEY` default = `EMBEDDING_API_KEY` value at module load time.
- Validation of `RERANKER_PROVIDER` happens in `_get_graphiti()` (not `Config.validate()`) to keep the validate-at-boot list focused on credential presence.

**Contracts**: State [x]

##### State Management
- **State model**: Read-only class attributes resolved once at import.
- **Persistence & consistency**: None; values come from environment.
- **Concurrency strategy**: Immutable after import; safe.

**Implementation Notes**
- **Integration**: Defaults for `RERANKER_BASE_URL` / `RERANKER_API_KEY` should reference the corresponding `EMBEDDING_*` env vars (not the resolved `Config.EMBEDDING_BASE_URL` constant) so an operator setting only `EMBEDDING_BASE_URL` still gets the reranker pointed at the same Ollama host without needing to set `RERANKER_BASE_URL` explicitly. Implementation reads `os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1'))`.
- **Validation**: None at config-load time. Provider value is validated by `_get_graphiti()`.
- **Risks**: An operator who overrides `EMBEDDING_BASE_URL` but not `RERANKER_BASE_URL` will silently retarget the reranker too. This is intentional (single-host Ollama deployment) and documented.

---

### Backend / Adapter

#### `_get_graphiti()` (extended)

| Field | Detail |
|-------|--------|
| Intent | Select and inject the appropriate `CrossEncoderClient` based on `Config.RERANKER_PROVIDER`; log the choice. |
| Requirements | 1.1, 1.2, 3.5, 4.1, 4.3 |

**Responsibilities & Constraints**
- Preserve double-checked locking and singleton semantics exactly.
- Read `Config.RERANKER_PROVIDER` once at construction; do not re-read.
- For `ollama`: construct `OllamaReranker(model=..., base_url=..., api_key=...)`.
- For `none`: construct `_PassthroughReranker()` (current behavior preserved).
- For any other value: raise `ValueError("Unknown RERANKER_PROVIDER=%r; allowed: ('ollama', 'none')")` — mirrors the existing `_ALLOWED_GRAPHITI_PROVIDERS` validation pattern.
- Log at INFO once: `f"Initializing Graphiti reranker (provider={provider})..."`.

**Contracts**: Service [x]

##### Service Interface

```python
def _get_graphiti() -> Graphiti:
    """Singleton Graphiti factory; selects reranker via Config.RERANKER_PROVIDER."""
```

**Implementation Notes**
- **Integration**: Replaces the unconditional `cross_encoder=_PassthroughReranker()` at `graphiti_adapter.py:156` with a `cross_encoder=_build_reranker(provider)` call. The factory helper lives next to `_build_llm_and_embedder` in the same file.
- **Validation**: Provider validation raises before constructing the Graphiti instance, so misconfiguration fails fast and obvious.
- **Risks**: A typo such as `RERANKER_PROVIDER=Ollama` (capitalized) would raise; the helper lowercases the value before comparison, matching `_get_graphiti`'s existing `(... or "openai").lower()` pattern.

---

### Documentation

| File | Change | Requirements |
|------|--------|--------------|
| `.env.example` | Add commented block with the four `RERANKER_*` vars and their defaults. Position adjacent to the existing `EMBEDDING_*` block. | 6.1 |
| `CLAUDE.md` | Extend the "Required Environment Variables" code fence under "Architecture" → "Required Environment Variables" with the four new vars and a one-line note about `RERANKER_PROVIDER=none`. | 6.2 |
| `README.md` | In the "Install Ollama and pull the default embedding model" section, add `ollama pull qwen2.5:3b` step (or reference the model variable). In the `.env` snippet, add the four `RERANKER_*` lines with brief comments. | 6.3 |
| `.kiro/specs/graphiti-neo4j-finalize/research.md` | Update the "A real per-provider reranker is a follow-up" claim to point at this spec. | 6.4 |

> README also has `README-EN.md` and `README-ZH.md` — the canonical user-facing README is `README.md` per the existing structure. Other localized READMEs are out of scope unless a quick parity edit fits without translation work; if a Chinese translation already exists for the embedder section, the Chinese README receives the same one-line addition.

## Data Models
Not applicable. No persistent storage, no schema changes, no API payloads. The only structured value flowing through the system is the `list[tuple[str, float]]` already defined by `CrossEncoderClient.rank`.

## Error Handling

### Error Strategy
- **Construction errors**: None possible (no network in `__init__`; no required keys to validate).
- **Per-passage errors**: Caught inside `OllamaReranker.rank`. Logged at DEBUG once per failed passage (suppress spam). Passage receives a deterministic fallback score that places it after all successfully-scored passages but keeps it in the output exactly once.
- **Whole-call errors** (connection refused, 404 model not found, timeout, OpenAI SDK exception): Caught at the outermost `try/except` in `rank`. Logged at WARNING with model name and error class. Returns `[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]` — same shape as `_PassthroughReranker` so consumers cannot tell the difference structurally.
- **Configuration errors**: `_get_graphiti()` raises `ValueError` at startup if `RERANKER_PROVIDER` is unknown. The Flask app fails to boot — preferred over silent misconfiguration.

### Error Categories and Responses
| Category | Trigger | Response |
|----------|---------|----------|
| System (5xx-equivalent) | Ollama unreachable, timeout | WARNING log; passthrough order; search succeeds. |
| User input (4xx-equivalent) | Unknown `RERANKER_PROVIDER` value | `ValueError` at startup; clear message naming allowed values. |
| Business rule | Model emits unparseable score | DEBUG log; per-passage fallback score; passage retained. |

### Monitoring
- INFO log at startup states the selected provider.
- WARNING log on whole-call failure includes model and error class; aggregation systems can alert on rate.
- No metrics surface yet; can be added if the reranker becomes a hot path.

## Testing Strategy

This project intentionally keeps the test surface minimal (`backend/scripts/test_profile_format.py` is the lone pytest target). Per `steering/tech.md`, do **not** add a heavy test harness.

- **Unit-level verification** (manual, by the implementer, no committed test files unless small and clearly worth keeping):
  1. Constructing `OllamaReranker` with a bad host does not raise; first `rank()` call logs WARNING and returns passthrough output.
  2. `rank(query, [])` returns `[]` and does not call the client.
  3. Successful path returns the correct number of passages, sorted descending, every input echoed byte-for-byte.
  4. Bad JSON output for one passage out of N leaves that passage at the bottom; other passages keep their parsed scores.
- **Integration smoke** (manual): With `qwen2.5:3b` pulled, run a graph build and a report-tool search; confirm the WARNING log is absent and the result order changes vs. `RERANKER_PROVIDER=none`.
- **Boundary verification**: Grep that `gpt-4.1-nano` and `OpenAIRerankerClient` do not appear in any new code path.

## Supporting References
- `research.md` — Discovery findings, alternative scoring strategies, model-choice rationale, defensive parse pattern.
- `gap-analysis.md` — Requirement-to-asset map.
- `.ticket/39.md` — Source ticket text.