104 lines
9.5 KiB
Markdown
104 lines
9.5 KiB
Markdown
# Research & Design Decisions — graphiti-ollama-embedder
|
||
|
||
## Summary
|
||
- **Feature**: `graphiti-ollama-embedder`
|
||
- **Discovery Scope**: Extension (small, narrowly scoped change to an existing adapter + supporting docs)
|
||
- **Key Findings**:
|
||
- The Graphiti `OpenAIEmbedder` already accepts an arbitrary `base_url` and `api_key`. Pointing it at Ollama's OpenAI-compatible `/v1/embeddings` endpoint requires **no code change** — only documentation.
|
||
- The silent placeholder-UUID fallback in `_GraphNamespace.add_batch` violates the project's existing background-task error-handling contract (`error-handling.md`: "Long-running tasks must always reach a terminal state"). The plumbing to surface a failure already exists in `_build_graph_worker`.
|
||
- `mxbai-embed-large` is the only widely-available local embedder that matches Graphiti's hard-coded `EMBEDDING_DIM = 1024`. Smaller models (`nomic-embed-text` at 768) would silently mis-fit Neo4j vector indexes and are out of scope.
|
||
|
||
## Research Log
|
||
|
||
### Ollama's OpenAI-compatible embeddings API
|
||
- **Context**: Verify that no Ollama-specific Graphiti embedder class is required.
|
||
- **Sources Consulted**: Existing code at `backend/app/services/graphiti_adapter.py:92–115` (`OpenAIEmbedderConfig` accepts arbitrary `base_url`); ticket #18 description; Graphiti `embedder/client.py:22` (`EMBEDDING_DIM = 1024`).
|
||
- **Findings**:
|
||
- Ollama exposes `POST /v1/embeddings` mirroring the OpenAI shape.
|
||
- The current `_build_llm_and_embedder("openai")` branch already uses `EMBEDDING_API_KEY or LLM_API_KEY` and `EMBEDDING_BASE_URL or LLM_BASE_URL`, so any OpenAI-compatible endpoint just works.
|
||
- Ollama ignores the auth header but `OpenAIEmbedderConfig` requires a non-empty `api_key`; the literal string `"ollama"` is the de-facto convention.
|
||
- **Implications**: This is a documentation-only ask for R1. No new provider literal, no new factory branch.
|
||
|
||
### Failure-propagation contract
|
||
- **Context**: Confirm that removing the broad `except` in `_GraphNamespace.add_batch` will result in `Task.status = FAILED` in the UI.
|
||
- **Sources Consulted**:
|
||
- `.kiro/steering/error-handling.md` § Background Task Errors — outer `except Exception` in worker calls `fail_task(task_id, str(e))`.
|
||
- `backend/app/services/graph_builder.py:289–308` — `add_text_batches` already wraps `client.graph.add_batch` in `try/except` and re-raises after a localized progress message.
|
||
- `backend/app/services/graph_builder.py:231–234` — `_build_graph_worker` catches every exception and calls `self.task_manager.fail_task(task_id, error_msg)` with a full traceback.
|
||
- **Findings**: The chain `add_episode → _GraphNamespace.add_batch → add_text_batches → _build_graph_worker → fail_task` is intact except for the swallow at the adapter layer. Removing the swallow is sufficient; no caller-side change is required.
|
||
- **Implications**: R2.3 / R2.5 are realized for free as soon as R2.2 is implemented.
|
||
|
||
### Single vs. batch ingestion path
|
||
- **Context**: Determine whether the single-episode `_GraphNamespace.add(...)` (line 441) needs a parallel fix.
|
||
- **Sources Consulted**: `graphiti_adapter.py:441–453`. No `try/except`; exceptions bubble naturally.
|
||
- **Findings**: Only the batch path swallows. The single path already complies.
|
||
- **Implications**: Fix is local to `add_batch`. Do not introduce symmetric handling in `add(...)`.
|
||
|
||
### Logging level
|
||
- **Context**: Decide between `WARNING` and `ERROR` for the failure log line.
|
||
- **Sources Consulted**: `.kiro/steering/error-handling.md` § Logging:
|
||
- `ERROR` — task failure, unrecoverable exception
|
||
- `WARNING` — retry triggered, transient failure, recovered state
|
||
- **Findings**: A failure that terminates the surrounding task is unrecoverable from the task's perspective, so `ERROR` is correct. The current `WARNING` is mislabelled.
|
||
- **Implications**: R2.4 — change to `logger.exception(...)` (which logs at ERROR with traceback).
|
||
|
||
### Documentation surfaces
|
||
- **Context**: Decide which files need updating to satisfy R1.
|
||
- **Sources Consulted**: `.env.example` (canonical config), `CLAUDE.md` lines 60–82, `README.md` lines 148–165, `docker-compose.yml` lines 21–37.
|
||
- **Findings**: All four are appropriate. `README.md` already has a placeholder for "non-OpenAI provider" and is the natural home for the `curl` smoke test snippet. `docker-compose.yml` benefits from one additional comment about `host.docker.internal`.
|
||
- **Implications**: Update all four; keep edits minimal and additive.
|
||
|
||
## Architecture Pattern Evaluation
|
||
|
||
| Option | Description | Strengths | Risks / Limitations | Notes |
|
||
|--------|-------------|-----------|---------------------|-------|
|
||
| A. Drop swallow + docs | Remove `except` block in `add_batch`; update four docs files | Smallest surface; honors steering rules; symmetric with `add()` | Loses (broken) "best effort" intent | Recommended |
|
||
| B. Narrow + retry | Catch only transient classes (`httpx.TimeoutException`, `openai.APIConnectionError`); use `retry_with_backoff` from `app/utils/retry.py`; raise everything else | Adds resilience to genuine network blips | More moving parts; would also need to update `add()` for symmetry | Defer to follow-up |
|
||
| C. New `ollama` provider literal | Extend `_build_llm_and_embedder` with a third branch | Symmetric with `openai`/`gemini` | Explicitly out of scope per ticket; duplicate code path (Ollama is OpenAI-SDK with custom `base_url`) | Rejected |
|
||
|
||
## Design Decisions
|
||
|
||
### Decision: Adopt Option A (drop the placeholder fallback entirely; documentation only for Ollama support)
|
||
- **Context**: R2 mandates that embedding failures during graph build surface as visible task failures. R1 mandates documentation for an Ollama embedder. The adapter already supports any OpenAI-compatible base URL.
|
||
- **Alternatives Considered**:
|
||
1. **Option B (narrow + retry)** — keep a small `except` clause for transient errors and use the project's `retry_with_backoff`.
|
||
2. **Option C (new provider literal)** — add an `ollama` branch in `_build_llm_and_embedder`.
|
||
- **Selected Approach**:
|
||
- In `_GraphNamespace.add_batch`, replace the `try/except Exception` block with a straightforward call. Failures from `_run(self._g.add_episode(...))` propagate to the caller.
|
||
- Use `logger.exception(...)` immediately before re-raise is unnecessary — `_build_graph_worker` already calls `logger.exception(f"task {task_id} failed")` per the error-handling steering. To honor R2.4 explicitly without double-logging, wrap the call in a narrow `try/except: logger.exception(...); raise` so the adapter-level context (`group_id`, episode index) is captured before bubbling.
|
||
- Update `.env.example`, `CLAUDE.md`, `README.md`, and `docker-compose.yml` to document Ollama configuration (R1).
|
||
- **Rationale**:
|
||
- The ticket explicitly lists transient-retry behavior and per-provider factory as out of scope.
|
||
- Steering's error-handling chapter forbids catch-and-continue in service code.
|
||
- Smaller surface = lower regression risk.
|
||
- **Trade-offs**:
|
||
- +Visibility: real config errors now surface at the UI.
|
||
- +Code symmetry: `add()` and `add_batch()` behave the same on failure.
|
||
- −One-time noise: operators whose graph builds were "succeeding" only because of the silent fallback will now see a failed task. This is the intended correction; mention in PR body.
|
||
- **Follow-up**:
|
||
- If transient blips become an operational issue, revisit Option B in a separate ticket using `retry_with_backoff` against `_g.add_episode`.
|
||
|
||
### Decision: Use `logger.exception(...)` not `logger.error(...)`
|
||
- **Context**: R2.4 requires ERROR-level logging of the underlying exception.
|
||
- **Alternatives Considered**: `logger.error(str(e))` (no traceback), `logger.warning(...)` (current behavior).
|
||
- **Selected Approach**: `logger.exception("Episode add failed (group_id=%s)", graph_id)` then `raise`.
|
||
- **Rationale**: `logger.exception` logs at ERROR with the full traceback, which is what the steering doc prescribes for unrecoverable adapter failures.
|
||
- **Trade-offs**: A small amount of duplication if `_build_graph_worker` also logs via `logger.exception`. Acceptable — the two log lines describe different layers (adapter vs. task) and have different identifying context.
|
||
|
||
### Decision: Document Ollama under the existing OpenAI provider, not as a separate provider literal
|
||
- **Context**: The ticket lists "per-provider embedder factory" as out of scope; Ollama is already reachable via the existing `openai` branch.
|
||
- **Selected Approach**: Document Ollama as a configuration *choice* of the existing `openai` Graphiti provider (set the three `EMBEDDING_*` env vars).
|
||
- **Rationale**: Avoids code duplication and matches the ticket's scope.
|
||
|
||
## Risks & Mitigations
|
||
- **Risk**: Operators currently relying on the silent fallback see new failed tasks. **Mitigation**: PR body calls this out explicitly with a "what changed" note pointing at the embedder env vars.
|
||
- **Risk**: The `except` is removed but a transient timeout intermittently fails the entire graph build. **Mitigation**: Documented as a known follow-up (Option B). Acceptable today because the alternative was an empty graph that *looked* successful.
|
||
- **Risk**: Documentation drifts between `.env.example`, `CLAUDE.md`, `README.md`. **Mitigation**: Keep all four edits in this PR and reference the same env-var triple verbatim.
|
||
|
||
## References
|
||
- Ticket #18 — `.ticket/18.md` (snapshot in this repo)
|
||
- Steering — `.kiro/steering/error-handling.md` § Background Task Errors and § Logging
|
||
- Steering — `.kiro/steering/tech.md` § Key Libraries (`graphiti-core` adapter rule)
|
||
- Code — `backend/app/services/graphiti_adapter.py:92–115, :441–475`
|
||
- Code — `backend/app/services/graph_builder.py:143–234, :256–310`
|