5.0 KiB
Database / Knowledge Graph Standards
The "database" in MiroFish is Neo4j accessed via Graphiti, not a relational store. There is no SQL, no migrations file, no ORM. Generic relational guidance does not apply — these are the project-specific patterns.
Architecture
- Engine: Neo4j 5.x Community over
bolt://. - Graph layer:
graphiti-core≥ 0.3 — handles node/edge writes, embeddings, hybrid search, reranking. - Adapter:
backend/app/services/graphiti_adapter.pyis the only module that importsgraphiti_coredirectly. Every other module talks to the graph through this adapter.
The adapter exposes a Zep-Cloud-shaped namespace
(client.graph.add_episode(...), client.graph.search(...), etc.) so
legacy zep_* services kept their existing call sites after the
migration. New code should use the same surface — do not introduce a
parallel API.
Core Rule: group_id Isolation
Every read or write to the graph must be scoped by the project's
group_id. The graph is multi-tenant by construction; cross-project
access is not permitted and is grounds for rejecting a change in review.
- A project's
group_idlives on itsProjectmodel and never changes after creation. - When constructing search filters, episode adds, or node/edge fetches,
always pass
group_id=project.group_id(or the equivalentgroup_ids=[...]). - If you need data spanning projects (e.g. an admin view), aggregate
per-project at the API layer; do not query the graph without a
group_idfilter.
Adapter Patterns That Must Stay Intact
These are non-obvious and break subtly when violated:
- Single Graphiti singleton.
_get_graphiti()lazily constructs oneGraphitiinstance for the whole process. Do not instantiateGraphitiin services or tests. - Persistent event loop in a dedicated thread. All async graph calls
are dispatched through
_run(coro)onto a single background event loop (seegraphiti-event-loopthread). The Neo4j async driver is bound to whichever loop opened it; crossing loops corrupts the driver state. Never callasyncio.run(...)on a Graphiti coroutine, and never schedule one on a request thread's loop. - Indices and constraints on first init.
build_indices_and_constraints()runs once when the singleton is created. New required indexes go through Graphiti's mechanisms, not raw Cypher in services.
What Belongs in the Graph
- Entities — Domain objects extracted by the ontology generator (people, organizations, concepts, events, etc.).
- Edges — Relationships between entities, typed per the project's generated ontology.
- Episodes — The raw text/units the entities were derived from; Graphiti owns chunking and embedding.
What does not belong in the graph:
- Project / task metadata (lives in in-memory
ProjectManagerandTaskManager). - Simulation state (owned by OASIS subprocesses).
- User-uploaded files (filesystem only — paths, not contents, are passed through the API).
Schema & Ontology
- Ontology (entity types + edge types) is generated per project by
the LLM in step 1, stored on the
Projectmodel, and used to constrain extraction during graph build. - There is no global, hand-maintained schema file. Don't add one — the ontology is intentionally per-project.
- Reasoning-model outputs from ontology generation are stripped of
<think>blocks and code fences before JSON parsing (seetech.md's "reasoning-model output stripping" decision).
Embeddings
EMBEDDING_MODELis configurable per provider:- OpenAI default:
text-embedding-3-small - Gemini:
text-embedding-004/gemini-embedding-001
- OpenAI default:
- Embedding model selection lives in
config.py. Don't hard-code it in services. - Switching embedding model invalidates existing project graphs — document this if you add an option that changes the default.
Query Patterns
- Read via the adapter's search methods (hybrid RRF recipes are wired
in
graphiti_adapter.py); avoid raw Cypher in feature code. - If a feature genuinely requires raw Cypher, add it as a method on the
adapter, scoped by
group_id, with a comment explaining why Graphiti's API is insufficient. - Pagination over Graphiti results uses
utils/zep_paging.py(legacy name, still applicable).
Startup Recovery
_recover_stuck_projects runs on app boot and promotes any project
left in GRAPH_BUILDING to GRAPH_COMPLETED if the graph already has
that project's nodes — handling the case where the original task was
killed by a restart. Any new long-running graph operation must
either:
- Be safe to re-run from the start, OR
- Add an analogous recovery path so a restart mid-task doesn't strand the project.
Backups
Graph data is treated as regenerable from seed material, not as durable user data — there is no project-managed backup/restore. If a deployment requires durability, that's an operator concern (Neo4j backups), not a feature-code one.
Focus on patterns and decisions. No environment-specific settings.