8.8 KiB
8.8 KiB
Implementation Plan
Foundation tasks introduce the four
RERANKER_*configuration knobs. Core tasks add the newOllamaRerankerand the factory selection branch. Integration tasks wire documentation parity. Validation closes the loop with a structural sweep.
Foundation
- 1. Add reranker configuration surface
- 1.1 Introduce four
RERANKER_*settings on theConfigclass- Add
RERANKER_PROVIDERwith defaultollama, read viaos.environ.get('RERANKER_PROVIDER', 'ollama'). - Add
RERANKER_MODELwith defaultqwen2.5:3b, read viaos.environ.get('RERANKER_MODEL', 'qwen2.5:3b'). - Add
RERANKER_BASE_URLwith default that chains to the embedding host:os.environ.get('RERANKER_BASE_URL', os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1')). Do not referenceConfig.EMBEDDING_BASE_URLdirectly; use the env-lookup form so behaviour stays consistent under reload patterns. - Add
RERANKER_API_KEYwith default that chains to the embedding key the same way (os.environ.get('RERANKER_API_KEY', os.environ.get('EMBEDDING_API_KEY', 'ollama'))). - Do not add the reranker to
Config.validate(); the provider has no mandatory credentials. - Observable completion: a Python REPL that imports
Configshows the four attributes with the documented defaults, and overridingEMBEDDING_BASE_URLin the environment is visible onConfig.RERANKER_BASE_URLtoo. - Requirements: 1.3, 3.1, 3.2, 3.3, 3.4, 3.6
- Add
Core
- 2. Implement the Ollama-backed reranker
- 2.1 Create the new reranker module with the
CrossEncoderClientsubclass- Define a new module under
backend/app/services/that hosts the reranker class. The class subclassesgraphiti_core.cross_encoder.client.CrossEncoderClientand implements only the asyncrankmethod. - Constructor accepts
model,base_url,api_keyas keyword arguments; it instantiatesopenai.AsyncOpenAI(base_url=..., api_key=...)but performs no network I/O so the Flask app can boot when Ollama is unreachable. rank(query, passages)short-circuits on emptypassagesand returns[]without any model call.- For each passage, send a single chat-completion request with
temperature=0.0and a deterministic system prompt asking for a JSON object{"score": <0.0..1.0>}describing the passage's relevance to the query. Useasyncio.gatherto run all per-passage requests concurrently. - Parse each model response defensively: strip any
<think>...</think>block, strip markdown code fences, attemptjson.loads, fall back to regex-extract the first floating-point number, clip the value to[0.0, 1.0]. On any per-passage failure, assign a deterministic fallback score of-0.001 * passage_indexand log at DEBUG once per failure naming the model and error class. The passage string is echoed byte-for-byte regardless of parse outcome. - Wrap the whole call in a
try/except. On a whole-call failure (connection refused, 404, timeout, etc.), log a single WARNING naming the model and error class, then return[(p, 1.0 - 0.01 * i) for i, p in enumerate(passages)]so search remains functional. The method must not raise. - Sort the returned list by score descending before returning.
- Observable completion: instantiating the new class with a deliberately bad
base_urldoes not raise; an async call torank("q", [])returns[]; an async call with two non-empty passages against a reachable Ollama returns two(passage, float)tuples in descending-score order, with every input passage byte-identical in the output. - Requirements: 1.4, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 5.1, 5.2, 5.3, 5.4, 7.1
- Boundary: OllamaReranker module
- Define a new module under
Integration
-
3. Wire the new reranker into the Graphiti factory
-
3.1 Select the reranker inside
_get_graphiti()based onConfig.RERANKER_PROVIDER- Introduce a small allow-list constant alongside
_ALLOWED_GRAPHITI_PROVIDERSenumerating("ollama", "none"). - Read
Config.RERANKER_PROVIDER, lowercase it, and validate against the allow-list. If the value is not in the allow-list, raiseValueErrorwith a message that names the offending value and lists the accepted values — same shape as the existingGRAPHITI_LLM_PROVIDERvalidation. - For
ollama, construct the newOllamaReranker(model=Config.RERANKER_MODEL, base_url=Config.RERANKER_BASE_URL, api_key=Config.RERANKER_API_KEY)and pass it as thecross_encoder=argument toGraphiti(...). - For
none, continue to pass_PassthroughReranker()as today; do not change the passthrough class. - Add one INFO log line at construction time that announces the selected reranker provider (sibling of the existing "Initializing Graphiti client (provider=...)" log).
- Preserve the double-checked locking and singleton pattern exactly. The provider is read once at first construction; do not re-read at runtime.
- Observable completion: with
RERANKER_PROVIDERunset, app startup logsInitializing Graphiti reranker (provider=ollama)...and Graphiti is constructed with theOllamaReranker. WithRERANKER_PROVIDER=none, the log reportsnoneand Graphiti uses_PassthroughReranker. WithRERANKER_PROVIDER=banana,_get_graphiti()raisesValueErrorlisting('ollama', 'none'). - Requirements: 1.1, 1.2, 3.5, 4.1, 4.2, 4.3
- Depends: 1.1, 2.1
- Introduce a small allow-list constant alongside
-
4. Update operator-facing documentation
-
4.1 (P) Add the new env knobs to
.env.example(deferred — sandbox hook blocks all.env*access; see HANDOFF.md)- Insert a four-line
RERANKER_*block adjacent to the existingEMBEDDING_*block, mirroring the comment style (default, accepted values, and a one-line note thatRERANKER_PROVIDER=nonedisables reranking). - Observable completion: opening
.env.exampleshows the four new variables with documented defaults, positioned next to the embedding block. - Requirements: 6.1
- Boundary: .env.example
- Depends: 1.1
- Insert a four-line
-
4.2 (P) Extend the
Required Environment Variablessnippet inCLAUDE.md- Add the four
RERANKER_*variables to the existing fenced code block under "Required Environment Variables" inCLAUDE.md, keeping the same comment style used for theEMBEDDING_*block. - Observable completion:
CLAUDE.mddocuments the four reranker variables next to the embedding block and includes a note thatRERANKER_PROVIDER=nonekeeps the previous passthrough behaviour. - Requirements: 6.2
- Boundary: CLAUDE.md
- Depends: 1.1
- Add the four
-
4.3 (P) Document the Ollama pull prerequisite and env block in
README.md- In the existing "Install Ollama and pull the default embedding model" section, add a parallel
ollama pull qwen2.5:3bstep (or note that the model used for reranking must be pulled, using the documented default). - In the
.envsnippet under "Configure Environment Variables", add the fourRERANKER_*lines with brief comments mirroring the embedding-block style. - Treat
README-EN.mdandREADME-ZH.mdtranslations as out of scope for this ticket — translation belongs to the active i18n workstream and would otherwise drift. - Observable completion:
README.mdshows theollama pull qwen2.5:3bstep and the four reranker env lines in the.envsnippet. - Requirements: 6.3
- Boundary: README.md
- Depends: 1.1
- In the existing "Install Ollama and pull the default embedding model" section, add a parallel
-
4.4 (P) Update the stale follow-up claim in the prior spec
- In
.kiro/specs/graphiti-neo4j-finalize/research.md, find the "A real per-provider reranker is a follow-up" text and either replace it with a pointer to this spec or note that follow-up has shipped undergraphiti-ollama-reranker. The constraint is that no remaining documentation continues to claim the reranker remains a deferred passthrough. - Observable completion: a grep for "real per-provider reranker is a follow-up" across
.kiro/specs/returns either zero hits or a pointer note tographiti-ollama-reranker. - Requirements: 6.4
- Boundary: .kiro/specs/graphiti-neo4j-finalize/research.md
- In
Validation
- 5. Structural verification sweep
- 5.1 Grep for legacy reranker references and verify the new wiring is reachable
- Grep
backend/app/services/forgpt-4.1-nanoandOpenAIRerankerClient; both must return zero hits in code paths owned by this spec. - Grep
backend/app/services/graphiti_adapter.pyfor the symbol of the new reranker class; confirm there is exactly one import site and one use site (the_get_graphiti()branch). - Confirm the four ReportAgent tools (
SearchResult,InsightForge,Panorama,Interview) require no source changes by grepping forclient.graph.search(call sites and verifying the kwarg shape is unchanged. - Confirm
_GraphNamespace.searchstill filters bygroup_id(no regression to project isolation). - Observable completion: a short verification summary captured during implementation lists each grep outcome with the expected zero / single hit, and the report-tool call sites are unchanged.
- Requirements: 1.4, 7.1, 7.2, 7.3
- Depends: 3.1
- Grep