23 KiB
Design: graph-build-empty-fix
Overview
Purpose: Restore non-empty knowledge-graph builds under the post-migration Graphiti + Neo4j stack and migrate the embedding pipeline to a local-by-default model so the documented happy path produces a working pipeline end-to-end.
Users: MiroFish maintainers and operators running a fresh checkout, plus existing operators who already pinned EMBEDDING_* to a remote provider.
Impact: Flips three default values in backend/app/config.py (EMBEDDING_MODEL, EMBEDDING_BASE_URL, EMBEDDING_API_KEY) so the embedder targets a local Ollama instance with mxbai-embed-large, adds a non-zero-node-count gate to the graph-build worker's completion path, and updates README.md / CLAUDE.md / docker-compose.yml comments / .env.example so the documentation matches the new defaults. No new env var, no new dependency, no new provider branch in _build_llm_and_embedder — Ollama is reached through the existing "openai" provider against its OpenAI-compatible /v1 endpoint.
Goals
- Default
.env-free configuration produces a non-empty(:Entity {group_id})set in Neo4j for the uploaded seed material. - Any silent "succeeded but empty" graph-build outcome is converted into a
Task.status = FAILEDwith an actionable error. - Existing OpenAI- / Gemini-compatible operators are unaffected on the happy path.
- Documentation (README, CLAUDE.md, docker-compose.yml,
.env.example) reflects the new default unambiguously.
Non-Goals
- Startup-time embedder health probe that refuses to boot on dim/model mismatch.
- Tunable
EMBEDDING_DIM(768/1536 support) — explicit follow-up. - New provider branch in
_build_llm_and_embedder(e.g., a dedicated"ollama"enum). - Bundling Ollama or any model binary in
docker-compose.yml. - Auto-rebuilding or invalidating project graphs created before this change.
- LLM-side default change — only embedding defaults move.
Boundary Commitments
This Spec Owns
- The three
EMBEDDING_*default values inbackend/app/config.py. - The
.env.exampleblock ordering / commenting that presents Ollama as active and OpenAI/Gemini as fallbacks. - A non-zero-node-count gate in
GraphBuilderService._build_graph_workerthat converts an empty-graph completion into afail_task(...). - Wording of the embedder section in
README.md,CLAUDE.md, anddocker-compose.ymlcomments. - One new locale key (
progress.emptyGraphFailure) inlocales/en.jsonandlocales/zh.jsonfor the gate's failure message.
Out of Boundary
- Any change to
_build_llm_and_embedder's provider factory beyond what these defaults exercise. - Changes to
_recover_stuck_projects— itscount > 0gate already matches the contract. - The single-episode
_GraphNamespace.add(...)path (already raises naturally). - The graphiti-core dependency version.
- Pre-existing project-graph migration / backfill.
Allowed Dependencies
backend/app/services/graphiti_adapter.py(read-only — the OpenAI branch is already Ollama-compatible).backend/app/models/task.pyTaskManager.fail_task/complete_task.backend/app/utils/locale.tfor the new failure message.- Existing loud-failure contract from spec
graphiti-ollama-embedder.
Revalidation Triggers
- A graphiti-core upgrade that changes
EMBEDDING_DIMaway from 1024. - A change to the recovery contract in
_recover_stuck_projects. - Introduction of a new embedder provider branch (would invalidate the "Ollama-via-openai-branch" assumption).
- Any new long-running task type built on the same pattern would need a parallel non-zero-count gate.
Architecture
Existing Architecture Analysis
- Embedder construction:
_build_llm_and_embedder(graphiti_adapter.py:92-139) branches onGRAPHITI_LLM_PROVIDER∈ {openai,gemini}. The"openai"branch composesOpenAIEmbedder(OpenAIEmbedderConfig(api_key, base_url, embedding_model))where each field falls back fromEMBEDDING_*toLLM_*. Ollama's/v1/embeddingsis OpenAI-shape-compatible, so the existing branch suffices. - Graph-build worker:
_build_graph_worker(graph_builder.py:140-230) ingests chunks viaadd_text_batches→_GraphNamespace.add_batch, waits on a no-opepisode.getpoll, fetches_get_graph_info(graph_id), then callscomplete_taskwith the resulting node/edge counts. Failure-path is a broadexcept Exception→ traceback →fail_task(task_id, error_msg). - Loud-failure contract (from spec #18):
_GraphNamespace.add_batchlogs the underlyingadd_episodeexception atERRORandraises — no placeholder UUID return path. - Startup recovery:
_recover_stuck_projects(__init__.py:88-109) promotesGRAPH_BUILDING→GRAPH_COMPLETEDonly whencount(:Entity {group_id}) > 0.
These patterns are preserved; this design extends _build_graph_worker with a symmetric count > 0 check before complete_task is called.
Architecture Pattern & Boundary Map
graph TB
EnvFile[dotenv]
Config[Config]
Adapter[GraphitiAdapter _build_llm_and_embedder]
Embedder[OpenAIEmbedder]
Ollama[Local Ollama mxbai_embed_large]
Worker[_build_graph_worker]
Neo4j[Neo4j Vector Index 1024 dim]
TaskMgr[TaskManager]
Recovery[_recover_stuck_projects]
EnvFile --> Config
Config --> Adapter
Adapter --> Embedder
Embedder --> Ollama
Worker --> Adapter
Adapter --> Neo4j
Worker --> Neo4j
Worker --> TaskMgr
Recovery --> Neo4j
Recovery --> TaskMgr
Architecture Integration:
- Selected pattern: Defaults-flip + completion-gate (Option C from the gap analysis). Preserves the existing layered flow; adds one synchronous read inside the worker.
- Domain / feature boundaries:
Configowns env-driven defaults;GraphitiAdapterowns provider construction;GraphBuilderServiceowns the worker lifecycle and the newcount > 0gate;_recover_stuck_projectsowns the symmetric startup-side gate. No cross-cutting changes. - Existing patterns preserved: Single-Graphiti-singleton; persistent event loop; loud
add_batch; broad workerexcept Exception;group_id-scoped reads. - New components rationale: None. Only one new locale key and a ~5-line gate inside the existing worker.
- Steering compliance: Stays inside the adapter (
database.md); reachesfail_taskon the unhappy path (error-handling.md); configuration centralised inconfig.py(structure.md); per-projectgroup_idfilter preserved.
Technology Stack & Alignment
| Layer | Choice / Version | Role in Feature | Notes |
|---|---|---|---|
| Backend / Services | Python ≥3.11, Flask 3.0 | Hosts the unchanged graph-build worker and the new non-zero-count gate | Existing stack; no change. |
| Data / Storage | Neo4j 5.x Community + graphiti-core ≥ 0.3 |
Owns the 1024-dim vector index that the embedder must match | EMBEDDING_DIM = 1024 is a graphiti-core invariant; not exposed. |
| External | Ollama (operator-managed) + mxbai-embed-large (1024-dim) |
New default embedding provider, reached over OpenAI-shaped /v1/embeddings |
Reached via http://localhost:11434/v1 in host mode; http://host.docker.internal:11434/v1 in Docker. |
| Frontend / CLI | Vue 3 + vue-i18n |
Renders the new "graph build produced 0 entities" failure message | One new locale key in locales/en.json and locales/zh.json. |
File Structure Plan
Modified Files
backend/app/config.py— Change the threeEMBEDDING_*defaults (lines 42, 52, 53). No new fields.backend/app/services/graph_builder.py— After_get_graph_info(graph_id)in_build_graph_worker, checkgraph_info.node_count > 0; if zero, calltask_manager.fail_task(task_id, …)with at('progress.emptyGraphFailure')message andreturninstead ofcomplete_task. Log atERRORlevel.locales/en.json— Addprogress.emptyGraphFailure(English).locales/zh.json— Addprogress.emptyGraphFailure(Chinese, mirroring the existingprogress.*style).README.md— In the env-block code fence (around lines 163-173), move the Ollama lines out of comments and demote the OpenAI/Gemini line to a commented fallback example. Adjust the surrounding prose so the Ollama prerequisite (ollama pull mxbai-embed-large) is part of the default setup checklist alongside Neo4j.CLAUDE.md— In the "Required Environment Variables" section (around lines 72-80), state that the active defaultEMBEDDING_MODELismxbai-embed-largevia Ollama; demote OpenAI/Gemini to "Other supported configurations".docker-compose.yml— Tighten the L31-33 comment so it points operators at the.env.exampleOllama block as the active default rather than as an optional override.
Hook-Protected File (operator-coordinated)
.env.example— The block layout must end up: uncommentedEMBEDDING_BASE_URL=http://host.docker.internal:11434/v1,EMBEDDING_API_KEY=ollama,EMBEDDING_MODEL=mxbai-embed-large; OpenAI and Gemini examples remain present but as commented blocks below. The implementation phase produces the exact diff and either coordinates the edit with the developer or records it inHANDOFF.md.
Directory structure is unchanged; no new files are introduced.
System Flows
Graph-build completion gate
sequenceDiagram
participant API as graph_bp
participant Worker as _build_graph_worker
participant Adapter as GraphitiAdapter
participant Neo as Neo4j
participant Task as TaskManager
API->>Worker: start (text, ontology, group_id)
Worker->>Adapter: add_batch(chunks)
Adapter->>Neo: add_episode per chunk (entities, edges)
Adapter-->>Worker: episode_uuids OR raises
alt add_batch raised
Worker->>Task: fail_task(err)
else add_batch returned
Worker->>Adapter: _get_graph_info(group_id)
Adapter-->>Worker: GraphInfo(node_count, edge_count)
alt node_count == 0
Worker->>Task: fail_task("graph build produced 0 entities")
else node_count > 0
Worker->>Task: complete_task(graph_info)
end
end
Key decisions captured by the diagram:
- The gate runs after the existing
_get_graph_infocall so it costs one extra branch, not an extra Neo4j round-trip. - The gate fires only when
add_batchreturned without raising — it is strictly a defense for "succeeded but empty," not a replacement for the loud-failure contract. - Edge count is not part of the gate: the contract from
_recover_stuck_projectsis "non-zero entities ⇒ COMPLETED", and edges may legitimately lag entities in some graphiti-core flows.
Requirements Traceability
| Requirement | Summary | Components | Interfaces | Flows |
|---|---|---|---|---|
| 1.1 | Reproduce on main defaults before fixing |
Implementation log (PR) | — | — |
| 1.2 | Document root cause(s) in PR + design.md | design.md Overview, research.md Research Log |
— | — |
| 1.3 | If dim-mismatch, record dims | research.md Research Log → Embedder construction path |
— | — |
| 1.4 | If a new silent path is found, remediate via R4 | graph_builder.py worker gate |
TaskManager.fail_task |
Graph-build completion gate |
| 1.5 | Post-fix reproduction writes non-zero entities | End-to-end smoke (PR description) | — | — |
| 2.1 | Config.EMBEDDING_* defaults point to local Ollama |
backend/app/config.py |
— | — |
| 2.2 | .env.example presents Ollama uncommented |
.env.example |
— | — |
| 2.3 | Default config end-to-end produces non-empty graph | All modified files | — | Graph-build completion gate |
| 2.4 | No reachable Ollama ⇒ Task.FAILED with named error |
graph_builder.py worker (existing except) |
TaskManager.fail_task |
— |
| 2.5 | Ollama goes through existing _build_llm_and_embedder openai branch |
graphiti_adapter.py (read-only) |
— | — |
| 3.1 | Keep EMBEDDING_DIM = 1024; default model is 1024-dim |
config.py, CLAUDE.md |
— | — |
| 3.2 | CLAUDE.md states the 1024 invariant and rules out 768-dim | CLAUDE.md |
— | — |
| 3.3 | Dim-mismatch override ⇒ loud Task.FAILED |
graph_builder.py worker gate + existing loud add_batch |
TaskManager.fail_task |
Graph-build completion gate |
| 3.4 | No new EMBEDDING_DIM env var |
— | — | — |
| 4.1 | Preserve loud add_batch from #18 |
graphiti_adapter.py (read-only) |
— | — |
| 4.2 | Remediate any new silent call site found in R1 | graph_builder.py worker gate |
TaskManager.fail_task |
Graph-build completion gate |
| 4.3 | Embedder-construction failure ⇒ worker Task.FAILED |
graph_builder.py worker (existing except) |
TaskManager.fail_task |
— |
| 4.4 | Log propagated failure at ERROR before fail_task |
graph_builder.py worker gate |
logger.error / logger.exception |
— |
| 4.5 | GRAPH_COMPLETED only when node_count > 0 |
graph_builder.py worker gate |
TaskManager.complete_task |
Graph-build completion gate |
| 5.1 | Existing OpenAI/Gemini configs unchanged behavior | graphiti_adapter.py (read-only), config.py |
— | — |
| 5.2 | No new env var | — | — | — |
| 5.3 | Pre-existing 1536-dim graphs remain readable when operator keeps their override | config.py (override-wins semantics unchanged) |
— | — |
| 5.4 | GRAPHITI_LLM_PROVIDER default stays openai |
config.py (unchanged) |
— | — |
| 6.1 | CLAUDE.md describes Ollama as default | CLAUDE.md |
— | — |
| 6.2 | README setup names ollama pull prerequisite |
README.md |
— | — |
| 6.3 | docker-compose / README documents host.docker.internal:11434 | docker-compose.yml, README.md |
— | — |
| 6.4 | One-line curl smoke test in docs |
README.md (already present, retain) |
— | — |
| 7.1 | Profile generation reads the new graph | End-to-end smoke (PR description) | — | — |
| 7.2 | Report-agent tools return non-empty results | End-to-end smoke (PR description) | — | — |
| 7.3 | PR documents the smoke-test path | PR description | — | — |
| 7.4 | If smoke test not run, PR says so explicitly | PR description | — | — |
Components and Interfaces
| Component | Domain/Layer | Intent | Req Coverage | Key Dependencies (P0/P1) | Contracts |
|---|---|---|---|---|---|
Config (modified) |
Backend / config | Owns the three EMBEDDING_* defaults that flip from OpenAI to Ollama |
2.1, 5.4 | dotenv (P0) | State |
GraphBuilderService._build_graph_worker (modified) |
Backend / services | Adds the non-zero-node-count gate before complete_task |
1.4, 3.3, 4.2, 4.4, 4.5 | _get_graph_info (P0), TaskManager (P0), locale.t (P1) |
Batch |
| Locale entries (new key) | Shared / i18n | One key (progress.emptyGraphFailure) so the gate's message is translated |
4.4, 4.5 | vue-i18n (P1), utils.locale.t (P1) |
State |
Docs set (README.md, CLAUDE.md, docker-compose.yml, .env.example) |
Docs | Updates the documented happy path to local-by-default | 2.2, 6.1, 6.2, 6.3, 6.4 | — | — |
Backend / Config
Config (modified)
| Field | Detail |
|---|---|
| Intent | Flip the three EMBEDDING_* defaults from OpenAI to Ollama. |
| Requirements | 2.1, 5.4 |
Responsibilities & Constraints
- Owns the env-driven embedder defaults consumed by
_build_llm_and_embedder. - Must not introduce a new env var or remove any existing one.
- Operator-set
EMBEDDING_*continues to win over the defaults (override semantics unchanged).
Dependencies
- Inbound:
_build_llm_and_embedder(P0) reads the three values. - Outbound: none.
- External: dotenv (P0) loads the
.envfile before class evaluation.
Contracts: State ☑.
State Management
- State model: Three module-level class attributes on
Config:EMBEDDING_MODEL = os.environ.get('EMBEDDING_MODEL', 'mxbai-embed-large')EMBEDDING_BASE_URL = os.environ.get('EMBEDDING_BASE_URL', 'http://localhost:11434/v1')EMBEDDING_API_KEY = os.environ.get('EMBEDDING_API_KEY', 'ollama')
- Persistence & consistency: Read once at import; no runtime mutation.
- Concurrency strategy: N/A (read-only after import).
Implementation Notes
- Integration:
_build_llm_and_embedder's existing fallbackConfig.EMBEDDING_API_KEY or Config.LLM_API_KEYcontinues to work; with the new defaults, the fallback is no longer triggered on a clean checkout. - Validation: None added — embedder errors continue to surface via the worker's existing
except. - Risks: An operator who previously relied on "leave
EMBEDDING_*unset to inheritLLM_*" will, after this change, hit Ollama athttp://localhost:11434/v1instead. README and CLAUDE.md call this out under "Backwards compatibility".
Backend / Services
GraphBuilderService._build_graph_worker (modified)
| Field | Detail |
|---|---|
| Intent | Convert "graph build succeeded but produced 0 entities" into a Task.FAILED. |
| Requirements | 1.4, 3.3, 4.2, 4.4, 4.5 |
Responsibilities & Constraints
- Preserves the existing 5-stage progression (create → set ontology → split → batch → wait → fetch info → complete).
- New behavior: after
graph_info = self._get_graph_info(graph_id), ifgraph_info.node_count == 0, calltask_manager.fail_task(...)with a localised error andreturn(skipcomplete_task). - Must log at
ERRORlevel before thefail_taskcall so server logs carry the diagnostic ahead of the task envelope. - Must not weaken the existing
except Exceptionbranch — the gate is additional, not a replacement.
Dependencies
- Inbound:
graph_bp(P0) invokesbuild_graph_asyncwhich calls this worker. - Outbound:
_get_graph_info(graph_id)(P0),TaskManager.fail_task(P0),TaskManager.complete_task(P0),utils.locale.t(P1). - External:
logger.error(P1).
Contracts: Batch ☑.
Batch / Job Contract
- Trigger:
build_graph_asyncspawns the worker thread from aPOST /api/graph/buildrequest. - Input / validation: Unchanged from current contract (
text,ontology,graph_name,chunk_size,chunk_overlap,batch_size,locale). - Output / destination:
Taskenvelope onTaskManager. On success:Task.status = COMPLETED,Task.result = {graph_id, graph_info, chunks_processed}. On gate trip:Task.status = FAILED,Task.error = t('progress.emptyGraphFailure'). - Idempotency & recovery: Unchanged.
_recover_stuck_projectscontinues to gate oncount(:Entity) > 0; the worker's new gate makes the live-completion path symmetric.
Implementation Notes
- Integration: One block inserted between the existing
graph_info = self._get_graph_info(graph_id)(line ~219) andself.task_manager.complete_task(...)(line ~221). Approximately 5 lines. - Validation: Confirmed empirically that
_get_graph_inforeturnsnode_count == 0when Neo4j holds no(:Entity {group_id})rows. - Risks: A worker that ran on a misconfigured embedder would previously surface via the existing
except Exception(becauseadd_batchre-raises). The new gate catches the residual case where graphiti-core returns successfully but writes nothing — the exact failure mode the ticket reports.
Shared / i18n
progress.emptyGraphFailure (new locale key)
| Field | Detail |
|---|---|
| Intent | Localised failure message for the new gate. |
| Requirements | 4.4, 4.5 |
Contracts: State ☑.
State Management
- State model: One additional entry in the
progressnamespace oflocales/en.jsonandlocales/zh.json. - Persistence & consistency: File-based locales loaded by
vue-i18n(frontend) andutils.locale(backend). Keys must exist in both files; theprogress.*namespace is the established home for graph-build status strings. - Concurrency strategy: N/A.
Implementation Notes
- Integration: Backend calls
t('progress.emptyGraphFailure')from_build_graph_worker. Frontend renders the same key inStep1GraphBuild.vue's failure surface (no code change — it already displaysTask.error). - Validation: Smoke-test the key resolves in both locales (
set_locale('en')/set_locale('zh')). - Risks: None — additive change.
Error Handling
Error Strategy
This spec contributes one new error case (progress.emptyGraphFailure) and re-uses the existing transport (TaskManager.fail_task → polling endpoint → frontend renders Task.error).
Error Categories and Responses
- Embedder unreachable (e.g., Ollama not running): caught by
_build_graph_worker's existingexcept Exceptionafteradd_batchraises.Task.FAILEDwith the underlying connection error. - Dim-mismatch override (operator points
EMBEDDING_MODELat a non-1024-dim model): caught byadd_batch's loud-failure contract (Neo4j or graphiti-core raises).Task.FAILEDwith the underlying dim-mismatch error. - Empty graph after a clean
add_batch(new case): caught by the gate.Task.FAILEDwitht('progress.emptyGraphFailure'). Logged atERRORbefore thefail_taskcall.
Monitoring
- Existing
logger.exception/logger.errorlines ingraphiti_adapter.pyandgraph_builder.pycarry the underlying error. - New
logger.error('graph build produced 0 entities for group_id=%s', graph_id)line precedes the gate'sfail_taskcall.
Testing Strategy
- Unit-level smoke (manual, scripted): With Ollama down →
npm run dev→ start a graph build → expectTask.status = FAILEDandTask.errorcontaining a connectivity message. With Ollama up andmxbai-embed-largepulled → expectTask.status = COMPLETEDandgraph_info.node_count > 0. - Configuration smoke: Confirm
Config.EMBEDDING_MODEL,Config.EMBEDDING_BASE_URL,Config.EMBEDDING_API_KEYresolve to the new defaults when.envis empty. - Backwards-compat smoke: With
.envsettingEMBEDDING_*to OpenAI's values, confirm_build_llm_and_embedderconstructs the OpenAI embedder exactly as before (no observable change). - Gate unit-style test: Patch
_get_graph_infoto returnGraphInfo(graph_id=…, node_count=0, edge_count=0, entity_types=[])and assert the worker callsfail_taskwith the localised key (no real pytest harness expansion — short repro in PR description is sufficient given the existing minimal test policy). - End-to-end (Req 7): Graph build → env-setup (profile generation) → report-agent query on a representative seed file. PR description documents the run; if the maintainer cannot run it locally, the PR description states that explicitly.
Migration Strategy
flowchart LR
Start[merge to main]
Pull[operator runs ollama pull mxbai_embed_large]
Restart[restart backend]
Build[start fresh graph build]
Verify[verify entities in Neo4j]
Done[done]
Start --> Pull
Pull --> Restart
Restart --> Build
Build --> Verify
Verify --> Done
- Phase 1 (no operator action): For operators with explicit
EMBEDDING_*overrides — no change. Pre-existing project graphs remain readable. - Phase 2 (default-using operators): One-time
ollama pull mxbai-embed-largeand restart. Pre-existing project graphs created against the previous default (1536-dim text-embedding-3-small with a Dashscope LLM key, which most likely already produced empty graphs per the ticket) are invalidated; operators rebuild them. - Rollback trigger: If an operator cannot run Ollama, they re-add the OpenAI or Gemini
EMBEDDING_*block to.env(the README's commented fallback) and restart. No code rollback required. - Validation checkpoint: After the first graph build under the new defaults, the
node_count > 0gate proves the migration succeeded.