6.2 KiB

Raw Blame History

Technology Stack

Architecture

A two-tier web app with a long-running background-task core:

Frontend (Vue 3 + Vite) — Single-page UI orchestrating the 5-step workflow. Polls the backend for task progress; renders the knowledge graph with D3.
Backend (Flask + uv) — Stateless HTTP API on top of in-memory Project and Task models. Heavy work (ontology extraction, graph build, profile generation, simulation, report) runs as background tasks tracked through Task and exposed via polling endpoints.
Knowledge graph — Neo4j is the durable store; Graphiti is the write/read layer. All queries are scoped by per-project group_id.
Simulation — CAMEL-OASIS executes in subprocesses; the Flask app communicates with them only through services/simulation_ipc.py.

The system favors process isolation for the simulator and in-memory state with restart recovery for project/task tracking, rather than a classic job queue + persistent DB.

Core Technologies

Backend language: Python ≥3.11, ≤3.12
Backend framework: Flask 3.0 + flask-cors
Backend tooling: uv for dependency management
Frontend framework: Vue 3.5 + Vue Router 4 + vue-i18n 11
Frontend tooling: Vite 7
Graph DB: Neo4j 5.x (Community) via bolt://
Graph layer: graphiti-core ≥ 0.3
Simulation: camel-oasis 0.2.5 + camel-ai 0.2.78
LLM access: OpenAI SDK against any OpenAI-compatible endpoint

Key Libraries

Only the libraries that shape how new code is written:

openai — Sole LLM client; new providers are integrated by changing LLM_BASE_URL/LLM_MODEL_NAME, not by adding a second SDK.
graphiti-core — All graph reads/writes go through the graphiti_adapter; do not call Neo4j drivers directly from feature code.
camel-oasis / camel-ai — Pinned versions; upgrading either requires re-validating the simulation pipeline end-to-end.
PyMuPDF, charset-normalizer, chardet — File ingestion; encoding detection is mandatory because seed material is frequently non-UTF-8 (notably mixed Chinese/English).
pydantic v2 — Used for structured LLM output / validation.
axios (frontend) — All API calls go through src/api/*.js services with a 5-min timeout and exponential retry; components must not call fetch/axios directly.
d3 v7 — Knowledge-graph visualization in GraphPanel.vue.

Development Standards

Type Safety

Python: type hints where the surrounding file uses them. Don't retrofit hints into untyped modules just for consistency.
Frontend: plain JavaScript, not TypeScript. Use JSDoc only when it improves clarity.

Code Quality

No enforced linter or formatter in this repo by design. Match the surrounding file's style. Discuss with the user before introducing ESLint/Prettier/Ruff/Black.
4-space indentation everywhere.
Python: snake_case. Existing files mix English and Chinese in comments/docstrings — preserve both; do not translate one into the other unless asked.

Testing

pytest is wired (backend/scripts/test_profile_format.py) but coverage is intentionally minimal. Don't add a heavy test harness without discussing scope.
For UI changes, run npm run dev and exercise the feature in a browser; type-check/test passes do not prove feature correctness here.

Internationalization

User-visible strings live in repo-root /locales/*.json (en.json, zh.json, languages.json). The frontend/vite.config.js aliases @locales to that root folder so the backend logger and frontend share the same keys.
Backend logger messages are part of the i18n surface — translate keys, not raw log lines, when adding new logs that surface to users.

Development Environment

Required Tools

Tool	Version
Node.js	≥18
Python	≥3.11, ≤3.12
`uv`	latest
Neo4j	5.x Community
Docker	optional

Common Commands

# Setup (one-shot)
npm run setup:all

# Dev (backend on :5001, frontend on :3000 with /api proxy)
npm run dev

# Run individually
npm run backend
npm run frontend

# Build frontend
npm run build

# Backend tests
cd backend && uv run python -m pytest

# Full stack (incl. Neo4j)
docker compose up

Key Technical Decisions

Neo4j + Graphiti replaces Zep Cloud. Several services still carry the legacy zep_* filename prefix (zep_tools.py, zep_entity_reader.py, zep_graph_memory_updater.py). New code must not depend on Zep Cloud. The ZEP_API_KEY env var is kept (empty string is fine) only for backwards compatibility.
Per-project graph isolation via group_id. Every Graphiti read or write must filter by the project's group_id. There is no cross-project graph access.
Reasoning-model output stripping. Models like MiniMax and GLM emit <think> blocks and markdown fences; outputs are stripped before JSON parsing (see commit 985f89f). New LLM-output parsers must do the same.
Background tasks via Task model, not a queue. Anything taking more than a few seconds returns immediately and tracks progress on a Task object the frontend polls. There is no Celery/RQ/etc.
Startup recovery for stuck projects. On boot, _recover_stuck_projects promotes projects in GRAPH_BUILDING to GRAPH_COMPLETED if Neo4j already has their nodes. New long-running task types should follow the same recovery pattern.
Subprocess cleanup is centralized. SimulationRunner.register_cleanup() registers a shutdown hook so simulation subprocesses die with the app. Don't spawn subprocesses outside this path.
Configuration is a single Python file. backend/app/config.py holds LLM, Neo4j, embedding, chunking, OASIS, and ReportAgent settings. Prefer extending it over scattering env-var reads through the codebase.
Default simulation parameters. Max 10 rounds. Twitter actions: CREATE_POST, LIKE_POST, REPOST, FOLLOW, QUOTE_POST, DO_NOTHING. Reddit additionally: CREATE_COMMENT, LIKE_COMMENT, DISLIKE_*, SEARCH_*, TREND, REFRESH, MUTE. Changes go in config.py, not per-call.

Document standards and patterns, not every dependency

6.2 KiB Raw Blame History