MicroFish/.kiro/steering/database.md

5.0 KiB

Database / Knowledge Graph Standards

The "database" in MiroFish is Neo4j accessed via Graphiti, not a relational store. There is no SQL, no migrations file, no ORM. Generic relational guidance does not apply — these are the project-specific patterns.

Architecture

  • Engine: Neo4j 5.x Community over bolt://.
  • Graph layer: graphiti-core ≥ 0.3 — handles node/edge writes, embeddings, hybrid search, reranking.
  • Adapter: backend/app/services/graphiti_adapter.py is the only module that imports graphiti_core directly. Every other module talks to the graph through this adapter.

The adapter exposes a Zep-Cloud-shaped namespace (client.graph.add_episode(...), client.graph.search(...), etc.) so legacy zep_* services kept their existing call sites after the migration. New code should use the same surface — do not introduce a parallel API.

Core Rule: group_id Isolation

Every read or write to the graph must be scoped by the project's group_id. The graph is multi-tenant by construction; cross-project access is not permitted and is grounds for rejecting a change in review.

  • A project's group_id lives on its Project model and never changes after creation.
  • When constructing search filters, episode adds, or node/edge fetches, always pass group_id=project.group_id (or the equivalent group_ids=[...]).
  • If you need data spanning projects (e.g. an admin view), aggregate per-project at the API layer; do not query the graph without a group_id filter.

Adapter Patterns That Must Stay Intact

These are non-obvious and break subtly when violated:

  • Single Graphiti singleton. _get_graphiti() lazily constructs one Graphiti instance for the whole process. Do not instantiate Graphiti in services or tests.
  • Persistent event loop in a dedicated thread. All async graph calls are dispatched through _run(coro) onto a single background event loop (see graphiti-event-loop thread). The Neo4j async driver is bound to whichever loop opened it; crossing loops corrupts the driver state. Never call asyncio.run(...) on a Graphiti coroutine, and never schedule one on a request thread's loop.
  • Indices and constraints on first init. build_indices_and_constraints() runs once when the singleton is created. New required indexes go through Graphiti's mechanisms, not raw Cypher in services.

What Belongs in the Graph

  • Entities — Domain objects extracted by the ontology generator (people, organizations, concepts, events, etc.).
  • Edges — Relationships between entities, typed per the project's generated ontology.
  • Episodes — The raw text/units the entities were derived from; Graphiti owns chunking and embedding.

What does not belong in the graph:

  • Project / task metadata (lives in in-memory ProjectManager and TaskManager).
  • Simulation state (owned by OASIS subprocesses).
  • User-uploaded files (filesystem only — paths, not contents, are passed through the API).

Schema & Ontology

  • Ontology (entity types + edge types) is generated per project by the LLM in step 1, stored on the Project model, and used to constrain extraction during graph build.
  • There is no global, hand-maintained schema file. Don't add one — the ontology is intentionally per-project.
  • Reasoning-model outputs from ontology generation are stripped of <think> blocks and code fences before JSON parsing (see tech.md's "reasoning-model output stripping" decision).

Embeddings

  • EMBEDDING_MODEL is configurable per provider:
    • OpenAI default: text-embedding-3-small
    • Gemini: text-embedding-004 / gemini-embedding-001
  • Embedding model selection lives in config.py. Don't hard-code it in services.
  • Switching embedding model invalidates existing project graphs — document this if you add an option that changes the default.

Query Patterns

  • Read via the adapter's search methods (hybrid RRF recipes are wired in graphiti_adapter.py); avoid raw Cypher in feature code.
  • If a feature genuinely requires raw Cypher, add it as a method on the adapter, scoped by group_id, with a comment explaining why Graphiti's API is insufficient.
  • Pagination over Graphiti results uses utils/zep_paging.py (legacy name, still applicable).

Startup Recovery

_recover_stuck_projects runs on app boot and promotes any project left in GRAPH_BUILDING to GRAPH_COMPLETED if the graph already has that project's nodes — handling the case where the original task was killed by a restart. Any new long-running graph operation must either:

  1. Be safe to re-run from the start, OR
  2. Add an analogous recovery path so a restart mid-task doesn't strand the project.

Backups

Graph data is treated as regenerable from seed material, not as durable user data — there is no project-managed backup/restore. If a deployment requires durability, that's an operator concern (Neo4j backups), not a feature-code one.


Focus on patterns and decisions. No environment-specific settings.