MicroFish/.kiro/specs/graphiti-ollama-embedder/research.md

104 lines
9.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Research & Design Decisions — graphiti-ollama-embedder
## Summary
- **Feature**: `graphiti-ollama-embedder`
- **Discovery Scope**: Extension (small, narrowly scoped change to an existing adapter + supporting docs)
- **Key Findings**:
- The Graphiti `OpenAIEmbedder` already accepts an arbitrary `base_url` and `api_key`. Pointing it at Ollama's OpenAI-compatible `/v1/embeddings` endpoint requires **no code change** — only documentation.
- The silent placeholder-UUID fallback in `_GraphNamespace.add_batch` violates the project's existing background-task error-handling contract (`error-handling.md`: "Long-running tasks must always reach a terminal state"). The plumbing to surface a failure already exists in `_build_graph_worker`.
- `mxbai-embed-large` is the only widely-available local embedder that matches Graphiti's hard-coded `EMBEDDING_DIM = 1024`. Smaller models (`nomic-embed-text` at 768) would silently mis-fit Neo4j vector indexes and are out of scope.
## Research Log
### Ollama's OpenAI-compatible embeddings API
- **Context**: Verify that no Ollama-specific Graphiti embedder class is required.
- **Sources Consulted**: Existing code at `backend/app/services/graphiti_adapter.py:92115` (`OpenAIEmbedderConfig` accepts arbitrary `base_url`); ticket #18 description; Graphiti `embedder/client.py:22` (`EMBEDDING_DIM = 1024`).
- **Findings**:
- Ollama exposes `POST /v1/embeddings` mirroring the OpenAI shape.
- The current `_build_llm_and_embedder("openai")` branch already uses `EMBEDDING_API_KEY or LLM_API_KEY` and `EMBEDDING_BASE_URL or LLM_BASE_URL`, so any OpenAI-compatible endpoint just works.
- Ollama ignores the auth header but `OpenAIEmbedderConfig` requires a non-empty `api_key`; the literal string `"ollama"` is the de-facto convention.
- **Implications**: This is a documentation-only ask for R1. No new provider literal, no new factory branch.
### Failure-propagation contract
- **Context**: Confirm that removing the broad `except` in `_GraphNamespace.add_batch` will result in `Task.status = FAILED` in the UI.
- **Sources Consulted**:
- `.kiro/steering/error-handling.md` § Background Task Errors — outer `except Exception` in worker calls `fail_task(task_id, str(e))`.
- `backend/app/services/graph_builder.py:289308``add_text_batches` already wraps `client.graph.add_batch` in `try/except` and re-raises after a localized progress message.
- `backend/app/services/graph_builder.py:231234``_build_graph_worker` catches every exception and calls `self.task_manager.fail_task(task_id, error_msg)` with a full traceback.
- **Findings**: The chain `add_episode → _GraphNamespace.add_batch → add_text_batches → _build_graph_worker → fail_task` is intact except for the swallow at the adapter layer. Removing the swallow is sufficient; no caller-side change is required.
- **Implications**: R2.3 / R2.5 are realized for free as soon as R2.2 is implemented.
### Single vs. batch ingestion path
- **Context**: Determine whether the single-episode `_GraphNamespace.add(...)` (line 441) needs a parallel fix.
- **Sources Consulted**: `graphiti_adapter.py:441453`. No `try/except`; exceptions bubble naturally.
- **Findings**: Only the batch path swallows. The single path already complies.
- **Implications**: Fix is local to `add_batch`. Do not introduce symmetric handling in `add(...)`.
### Logging level
- **Context**: Decide between `WARNING` and `ERROR` for the failure log line.
- **Sources Consulted**: `.kiro/steering/error-handling.md` § Logging:
- `ERROR` — task failure, unrecoverable exception
- `WARNING` — retry triggered, transient failure, recovered state
- **Findings**: A failure that terminates the surrounding task is unrecoverable from the task's perspective, so `ERROR` is correct. The current `WARNING` is mislabelled.
- **Implications**: R2.4 — change to `logger.exception(...)` (which logs at ERROR with traceback).
### Documentation surfaces
- **Context**: Decide which files need updating to satisfy R1.
- **Sources Consulted**: `.env.example` (canonical config), `CLAUDE.md` lines 6082, `README.md` lines 148165, `docker-compose.yml` lines 2137.
- **Findings**: All four are appropriate. `README.md` already has a placeholder for "non-OpenAI provider" and is the natural home for the `curl` smoke test snippet. `docker-compose.yml` benefits from one additional comment about `host.docker.internal`.
- **Implications**: Update all four; keep edits minimal and additive.
## Architecture Pattern Evaluation
| Option | Description | Strengths | Risks / Limitations | Notes |
|--------|-------------|-----------|---------------------|-------|
| A. Drop swallow + docs | Remove `except` block in `add_batch`; update four docs files | Smallest surface; honors steering rules; symmetric with `add()` | Loses (broken) "best effort" intent | Recommended |
| B. Narrow + retry | Catch only transient classes (`httpx.TimeoutException`, `openai.APIConnectionError`); use `retry_with_backoff` from `app/utils/retry.py`; raise everything else | Adds resilience to genuine network blips | More moving parts; would also need to update `add()` for symmetry | Defer to follow-up |
| C. New `ollama` provider literal | Extend `_build_llm_and_embedder` with a third branch | Symmetric with `openai`/`gemini` | Explicitly out of scope per ticket; duplicate code path (Ollama is OpenAI-SDK with custom `base_url`) | Rejected |
## Design Decisions
### Decision: Adopt Option A (drop the placeholder fallback entirely; documentation only for Ollama support)
- **Context**: R2 mandates that embedding failures during graph build surface as visible task failures. R1 mandates documentation for an Ollama embedder. The adapter already supports any OpenAI-compatible base URL.
- **Alternatives Considered**:
1. **Option B (narrow + retry)** — keep a small `except` clause for transient errors and use the project's `retry_with_backoff`.
2. **Option C (new provider literal)** — add an `ollama` branch in `_build_llm_and_embedder`.
- **Selected Approach**:
- In `_GraphNamespace.add_batch`, replace the `try/except Exception` block with a straightforward call. Failures from `_run(self._g.add_episode(...))` propagate to the caller.
- Use `logger.exception(...)` immediately before re-raise is unnecessary — `_build_graph_worker` already calls `logger.exception(f"task {task_id} failed")` per the error-handling steering. To honor R2.4 explicitly without double-logging, wrap the call in a narrow `try/except: logger.exception(...); raise` so the adapter-level context (`group_id`, episode index) is captured before bubbling.
- Update `.env.example`, `CLAUDE.md`, `README.md`, and `docker-compose.yml` to document Ollama configuration (R1).
- **Rationale**:
- The ticket explicitly lists transient-retry behavior and per-provider factory as out of scope.
- Steering's error-handling chapter forbids catch-and-continue in service code.
- Smaller surface = lower regression risk.
- **Trade-offs**:
- +Visibility: real config errors now surface at the UI.
- +Code symmetry: `add()` and `add_batch()` behave the same on failure.
- One-time noise: operators whose graph builds were "succeeding" only because of the silent fallback will now see a failed task. This is the intended correction; mention in PR body.
- **Follow-up**:
- If transient blips become an operational issue, revisit Option B in a separate ticket using `retry_with_backoff` against `_g.add_episode`.
### Decision: Use `logger.exception(...)` not `logger.error(...)`
- **Context**: R2.4 requires ERROR-level logging of the underlying exception.
- **Alternatives Considered**: `logger.error(str(e))` (no traceback), `logger.warning(...)` (current behavior).
- **Selected Approach**: `logger.exception("Episode add failed (group_id=%s)", graph_id)` then `raise`.
- **Rationale**: `logger.exception` logs at ERROR with the full traceback, which is what the steering doc prescribes for unrecoverable adapter failures.
- **Trade-offs**: A small amount of duplication if `_build_graph_worker` also logs via `logger.exception`. Acceptable — the two log lines describe different layers (adapter vs. task) and have different identifying context.
### Decision: Document Ollama under the existing OpenAI provider, not as a separate provider literal
- **Context**: The ticket lists "per-provider embedder factory" as out of scope; Ollama is already reachable via the existing `openai` branch.
- **Selected Approach**: Document Ollama as a configuration *choice* of the existing `openai` Graphiti provider (set the three `EMBEDDING_*` env vars).
- **Rationale**: Avoids code duplication and matches the ticket's scope.
## Risks & Mitigations
- **Risk**: Operators currently relying on the silent fallback see new failed tasks. **Mitigation**: PR body calls this out explicitly with a "what changed" note pointing at the embedder env vars.
- **Risk**: The `except` is removed but a transient timeout intermittently fails the entire graph build. **Mitigation**: Documented as a known follow-up (Option B). Acceptable today because the alternative was an empty graph that *looked* successful.
- **Risk**: Documentation drifts between `.env.example`, `CLAUDE.md`, `README.md`. **Mitigation**: Keep all four edits in this PR and reference the same env-var triple verbatim.
## References
- Ticket #18`.ticket/18.md` (snapshot in this repo)
- Steering — `.kiro/steering/error-handling.md` § Background Task Errors and § Logging
- Steering — `.kiro/steering/tech.md` § Key Libraries (`graphiti-core` adapter rule)
- Code — `backend/app/services/graphiti_adapter.py:92115, :441475`
- Code — `backend/app/services/graph_builder.py:143234, :256310`