MicroFish/.kiro/steering/tech.md

6.2 KiB

Technology Stack

Architecture

A two-tier web app with a long-running background-task core:

  • Frontend (Vue 3 + Vite) — Single-page UI orchestrating the 5-step workflow. Polls the backend for task progress; renders the knowledge graph with D3.
  • Backend (Flask + uv) — Stateless HTTP API on top of in-memory Project and Task models. Heavy work (ontology extraction, graph build, profile generation, simulation, report) runs as background tasks tracked through Task and exposed via polling endpoints.
  • Knowledge graph — Neo4j is the durable store; Graphiti is the write/read layer. All queries are scoped by per-project group_id.
  • Simulation — CAMEL-OASIS executes in subprocesses; the Flask app communicates with them only through services/simulation_ipc.py.

The system favors process isolation for the simulator and in-memory state with restart recovery for project/task tracking, rather than a classic job queue + persistent DB.

Core Technologies

  • Backend language: Python ≥3.11, ≤3.12
  • Backend framework: Flask 3.0 + flask-cors
  • Backend tooling: uv for dependency management
  • Frontend framework: Vue 3.5 + Vue Router 4 + vue-i18n 11
  • Frontend tooling: Vite 7
  • Graph DB: Neo4j 5.x (Community) via bolt://
  • Graph layer: graphiti-core ≥ 0.3
  • Simulation: camel-oasis 0.2.5 + camel-ai 0.2.78
  • LLM access: OpenAI SDK against any OpenAI-compatible endpoint

Key Libraries

Only the libraries that shape how new code is written:

  • openai — Sole LLM client; new providers are integrated by changing LLM_BASE_URL/LLM_MODEL_NAME, not by adding a second SDK.
  • graphiti-core — All graph reads/writes go through the graphiti_adapter; do not call Neo4j drivers directly from feature code.
  • camel-oasis / camel-ai — Pinned versions; upgrading either requires re-validating the simulation pipeline end-to-end.
  • PyMuPDF, charset-normalizer, chardet — File ingestion; encoding detection is mandatory because seed material is frequently non-UTF-8 (notably mixed Chinese/English).
  • pydantic v2 — Used for structured LLM output / validation.
  • axios (frontend) — All API calls go through src/api/*.js services with a 5-min timeout and exponential retry; components must not call fetch/axios directly.
  • d3 v7 — Knowledge-graph visualization in GraphPanel.vue.

Development Standards

Type Safety

  • Python: type hints where the surrounding file uses them. Don't retrofit hints into untyped modules just for consistency.
  • Frontend: plain JavaScript, not TypeScript. Use JSDoc only when it improves clarity.

Code Quality

  • No enforced linter or formatter in this repo by design. Match the surrounding file's style. Discuss with the user before introducing ESLint/Prettier/Ruff/Black.
  • 4-space indentation everywhere.
  • Python: snake_case. Existing files mix English and Chinese in comments/docstrings — preserve both; do not translate one into the other unless asked.

Testing

  • pytest is wired (backend/scripts/test_profile_format.py) but coverage is intentionally minimal. Don't add a heavy test harness without discussing scope.
  • For UI changes, run npm run dev and exercise the feature in a browser; type-check/test passes do not prove feature correctness here.

Internationalization

  • User-visible strings live in repo-root /locales/*.json (en.json, zh.json, languages.json). The frontend/vite.config.js aliases @locales to that root folder so the backend logger and frontend share the same keys.
  • Backend logger messages are part of the i18n surface — translate keys, not raw log lines, when adding new logs that surface to users.

Development Environment

Required Tools

Tool Version
Node.js ≥18
Python ≥3.11, ≤3.12
uv latest
Neo4j 5.x Community
Docker optional

Common Commands

# Setup (one-shot)
npm run setup:all

# Dev (backend on :5001, frontend on :3000 with /api proxy)
npm run dev

# Run individually
npm run backend
npm run frontend

# Build frontend
npm run build

# Backend tests
cd backend && uv run python -m pytest

# Full stack (incl. Neo4j)
docker compose up

Key Technical Decisions

  • Neo4j + Graphiti replaces Zep Cloud. Several services still carry the legacy zep_* filename prefix (zep_tools.py, zep_entity_reader.py, zep_graph_memory_updater.py). New code must not depend on Zep Cloud. The ZEP_API_KEY env var is kept (empty string is fine) only for backwards compatibility.
  • Per-project graph isolation via group_id. Every Graphiti read or write must filter by the project's group_id. There is no cross-project graph access.
  • Reasoning-model output stripping. Models like MiniMax and GLM emit <think> blocks and markdown fences; outputs are stripped before JSON parsing (see commit 985f89f). New LLM-output parsers must do the same.
  • Background tasks via Task model, not a queue. Anything taking more than a few seconds returns immediately and tracks progress on a Task object the frontend polls. There is no Celery/RQ/etc.
  • Startup recovery for stuck projects. On boot, _recover_stuck_projects promotes projects in GRAPH_BUILDING to GRAPH_COMPLETED if Neo4j already has their nodes. New long-running task types should follow the same recovery pattern.
  • Subprocess cleanup is centralized. SimulationRunner.register_cleanup() registers a shutdown hook so simulation subprocesses die with the app. Don't spawn subprocesses outside this path.
  • Configuration is a single Python file. backend/app/config.py holds LLM, Neo4j, embedding, chunking, OASIS, and ReportAgent settings. Prefer extending it over scattering env-var reads through the codebase.
  • Default simulation parameters. Max 10 rounds. Twitter actions: CREATE_POST, LIKE_POST, REPOST, FOLLOW, QUOTE_POST, DO_NOTHING. Reddit additionally: CREATE_COMMENT, LIKE_COMMENT, DISLIKE_*, SEARCH_*, TREND, REFRESH, MUTE. Changes go in config.py, not per-call.

Document standards and patterns, not every dependency