From d69d1976d8eb04fe4f7e5d5a13a8eb814dc1aa83 Mon Sep 17 00:00:00 2001 From: Ubuntu Date: Sun, 26 Apr 2026 17:40:18 +0000 Subject: [PATCH] docs(plan): add Fase 1 implementation plan for persistence infrastructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers 9 tasks with TDD approach: - Task 1: Dependencies (SQLAlchemy, Alembic, Azure Blob, bcrypt, flask-jwt-extended) - Task 2: Config vars (DATABASE_URL, STORAGE_TYPE, AZURE_STORAGE_*, JWT_*) - Task 3: SQLAlchemy models (all 11 entities) - Task 4: Alembic migrations - Task 5: StorageService (Protocol + LocalFS + AzureBlob + factory) - Task 6: Flask app factory injection - Task 7: TaskManager refactor → DB - Task 8: ProjectManager refactor → DB + Storage - Task 9: Test suite + e2e verification Co-Authored-By: Claude Sonnet 4.6 --- .../plans/2026-04-26-persistencia-fase1.md | 1743 +++++++++++++++++ 1 file changed, 1743 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-26-persistencia-fase1.md diff --git a/docs/superpowers/plans/2026-04-26-persistencia-fase1.md b/docs/superpowers/plans/2026-04-26-persistencia-fase1.md new file mode 100644 index 00000000..bf7562b3 --- /dev/null +++ b/docs/superpowers/plans/2026-04-26-persistencia-fase1.md @@ -0,0 +1,1743 @@ +# Persistència Fase 1: Infraestructura Base + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Substituir la persistència JSON+memòria per SQLAlchemy 2.x (SQLite dev / PostgreSQL prod) i un `StorageService` abstracte (LocalFS dev / Azure Blob prod), de manera que projectes i tasques sobrevisquin reinicis del servidor. + +**Architecture:** S'afegeix una capa de BD via SQLAlchemy sota els `Manager` existents. `ProjectManager` i `TaskManager` es refactoritzen per llegir/escriure a la BD en comptes de JSON i memòria. L'`StorageService` substitueix totes les operacions directes de fitxers. L'app factory (`create_app`) injecta la sessió DB i el storage com a extensions Flask. + +**Tech Stack:** SQLAlchemy 2.x, Alembic, `flask-sqlalchemy`, `azure-storage-blob`, `bcrypt` (per a fases posteriors, s'afegeix al `pyproject.toml` ara) + +--- + +## Mapa de fitxers + +### Nous fitxers a crear + +| Fitxer | Responsabilitat | +|--------|-----------------| +| `backend/app/db.py` | Engine SQLAlchemy, `Base`, `get_db()` session factory, `init_db()` | +| `backend/app/models/db_models.py` | Tots els models SQLAlchemy (Project, ProjectFile, Ontology, Graph, Simulation, Report, Task, SystemConfig, User, InvitationToken, PasswordResetToken) | +| `backend/app/storage/__init__.py` | Exporta `StorageService`, `get_storage()` | +| `backend/app/storage/protocol.py` | `StorageService` Protocol (interfície) | +| `backend/app/storage/local.py` | `LocalFSStorage` (pathlib) | +| `backend/app/storage/azure_blob.py` | `AzureBlobStorage` (azure-storage-blob) | +| `backend/app/storage/factory.py` | `create_storage_service()` — selecció per STORAGE_TYPE | +| `backend/alembic.ini` | Config Alembic | +| `backend/alembic/env.py` | Entorn Alembic (llegeix DATABASE_URL) | +| `backend/alembic/versions/0001_initial_schema.py` | Migració inicial (totes les taules) | +| `backend/tests/test_db_models.py` | Tests dels models SQLAlchemy | +| `backend/tests/test_storage.py` | Tests de LocalFSStorage | +| `backend/tests/test_project_manager_db.py` | Tests de ProjectManager amb BD | +| `backend/tests/test_task_manager_db.py` | Tests de TaskManager amb BD | + +### Fitxers a modificar + +| Fitxer | Canvi | +|--------|-------| +| `backend/pyproject.toml` | Afegir `sqlalchemy`, `alembic`, `flask-sqlalchemy`, `azure-storage-blob`, `bcrypt`, `flask-jwt-extended` | +| `backend/app/config.py` | Afegir `DATABASE_URL`, `STORAGE_TYPE`, `STORAGE_LOCAL_PATH`, `AZURE_STORAGE_*`, `JWT_SECRET`, `JWT_REFRESH_SECRET` | +| `backend/app/__init__.py` | Inicialitzar DB + Storage a `create_app()`; substituir auth provisional per `flask-jwt-extended` stub | +| `backend/app/models/project.py` | Refactoritzar `ProjectManager` per usar BD + `StorageService` | +| `backend/app/models/task.py` | Refactoritzar `TaskManager` per usar BD | +| `backend/tests/conftest.py` | Afegir fixtures de BD en memòria i storage temporal | + +--- + +## Task 1: Afegir dependències + +**Files:** +- Modify: `backend/pyproject.toml` + +- [ ] **Step 1: Afegir dependències al pyproject.toml** + +```toml +# backend/pyproject.toml — secció dependencies, afegir: + "sqlalchemy>=2.0.0", + "alembic>=1.13.0", + "flask-sqlalchemy>=3.1.0", + "azure-storage-blob>=12.19.0", + "bcrypt>=4.1.0", + "flask-jwt-extended>=4.6.0", +``` + +- [ ] **Step 2: Instal·lar dependències** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend +uv sync +``` + +Expected: sense errors. `uv sync` actualitza el `.venv`. + +- [ ] **Step 3: Verificar importació** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend +.venv/bin/python -c "import sqlalchemy; import alembic; import flask_sqlalchemy; print('OK')" +``` + +Expected: `OK` + +- [ ] **Step 4: Commit** + +```bash +git add backend/pyproject.toml +git commit -m "chore(deps): add SQLAlchemy, Alembic, Azure Blob, bcrypt, flask-jwt-extended" +``` + +--- + +## Task 2: Afegir variables de configuració + +**Files:** +- Modify: `backend/app/config.py` + +- [ ] **Step 1: Escriure test de la nova configuració** + +Afegir a `backend/tests/test_db_models.py` (el fitxer el crearem al Task 3, però el test de config el posem a un fitxer nou): + +```python +# backend/tests/test_config.py +import os +import pytest + + +def test_database_url_default(): + """DATABASE_URL per defecte ha de ser SQLite""" + from backend.app.config import Config + assert Config.DATABASE_URL.startswith("sqlite") + + +def test_storage_type_default(): + from backend.app.config import Config + assert Config.STORAGE_TYPE == "local" + + +def test_storage_local_path_exists(): + from backend.app.config import Config + assert Config.STORAGE_LOCAL_PATH is not None +``` + +- [ ] **Step 2: Executar test per verificar que falla** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia +.venv/bin/pytest backend/tests/test_config.py -v 2>/dev/null || \ +backend/.venv/bin/pytest backend/tests/test_config.py -v +``` + +Expected: `AttributeError: type object 'Config' has no attribute 'DATABASE_URL'` + +- [ ] **Step 3: Afegir configuració a config.py** + +Afegir al final de la classe `Config` (just abans del mètode `validate`): + +```python + # ── Persistència ────────────────────────────────────────────── + # Base de dades + DATABASE_URL = os.environ.get('DATABASE_URL', 'sqlite:///mirofish_dev.db') + + # Storage de fitxers + STORAGE_TYPE = os.environ.get('STORAGE_TYPE', 'local') # local | azure + STORAGE_LOCAL_PATH = os.environ.get( + 'STORAGE_LOCAL_PATH', + os.path.join(os.path.dirname(__file__), '../uploads') + ) + AZURE_STORAGE_CONNECTION_STRING = os.environ.get('AZURE_STORAGE_CONNECTION_STRING', '') + AZURE_STORAGE_CONTAINER = os.environ.get('AZURE_STORAGE_CONTAINER', 'mirofish') + + # JWT (per a la Fase 2 d'autenticació — definits aquí perquè flask-jwt-extended els necessita en create_app) + JWT_SECRET_KEY = os.environ.get('JWT_SECRET', 'change-me-in-production') + JWT_REFRESH_SECRET_KEY = os.environ.get('JWT_REFRESH_SECRET', 'change-me-refresh-in-production') + JWT_ACCESS_TOKEN_EXPIRES_HOURS = int(os.environ.get('JWT_ACCESS_TOKEN_EXPIRES_HOURS', '8')) + JWT_REFRESH_TOKEN_EXPIRES_DAYS = int(os.environ.get('JWT_REFRESH_TOKEN_EXPIRES_DAYS', '7')) +``` + +- [ ] **Step 4: Executar test per verificar que passa** + +```bash +backend/.venv/bin/pytest backend/tests/test_config.py -v +``` + +Expected: 3 passed + +- [ ] **Step 5: Commit** + +```bash +git add backend/app/config.py backend/tests/test_config.py +git commit -m "feat(config): add DATABASE_URL, STORAGE_TYPE, AZURE_STORAGE_*, JWT config vars" +``` + +--- + +## Task 3: Crear models SQLAlchemy + +**Files:** +- Create: `backend/app/models/db_models.py` +- Create: `backend/app/db.py` +- Create: `backend/tests/test_db_models.py` + +- [ ] **Step 1: Crear backend/app/db.py** + +```python +# backend/app/db.py +"""SQLAlchemy engine, session factory i Base declarativa.""" +from contextlib import contextmanager +from sqlalchemy import create_engine +from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session +from typing import Generator + + +class Base(DeclarativeBase): + pass + + +_engine = None +_SessionLocal = None + + +def init_db(database_url: str) -> None: + global _engine, _SessionLocal + connect_args = {"check_same_thread": False} if database_url.startswith("sqlite") else {} + _engine = create_engine(database_url, connect_args=connect_args, echo=False) + _SessionLocal = sessionmaker(bind=_engine, autocommit=False, autoflush=False) + Base.metadata.create_all(_engine) + + +@contextmanager +def get_session() -> Generator[Session, None, None]: + """Context manager de sessió SQLAlchemy.""" + if _SessionLocal is None: + raise RuntimeError("Database not initialized. Call init_db() first.") + db = _SessionLocal() + try: + yield db + except Exception: + db.rollback() + raise + finally: + db.close() +``` + +- [ ] **Step 2: Crear backend/app/models/db_models.py** + +```python +# backend/app/models/db_models.py +"""Models SQLAlchemy per a tota la persistència de MiroFish.""" +import uuid +from datetime import datetime +from typing import Optional +from sqlalchemy import ( + String, Integer, Text, Boolean, DateTime, JSON, + ForeignKey, UniqueConstraint +) +from sqlalchemy.orm import Mapped, mapped_column, relationship +from ..db import Base + + +def _uuid() -> str: + return str(uuid.uuid4()) + + +def _now() -> datetime: + return datetime.utcnow() + + +class UserModel(Base): + __tablename__ = "users" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False) + name: Mapped[str] = mapped_column(String(255), nullable=False, default="") + password_hash: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + role: Mapped[str] = mapped_column(String(20), nullable=False, default="user") + status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending") + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + + projects: Mapped[list["ProjectModel"]] = relationship( + back_populates="owner", cascade="all, delete-orphan" + ) + invitation_tokens: Mapped[list["InvitationTokenModel"]] = relationship( + back_populates="user", cascade="all, delete-orphan" + ) + password_reset_tokens: Mapped[list["PasswordResetTokenModel"]] = relationship( + back_populates="user", cascade="all, delete-orphan" + ) + + +class ProjectModel(Base): + __tablename__ = "projects" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + user_id: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=True + ) + name: Mapped[str] = mapped_column(String(255), nullable=False, default="Unnamed Project") + status: Mapped[str] = mapped_column(String(50), nullable=False, default="created") + analysis_summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + simulation_requirement: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + chunk_size: Mapped[int] = mapped_column(Integer, default=500) + chunk_overlap: Mapped[int] = mapped_column(Integer, default=50) + active_task_id: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("tasks.id", ondelete="SET NULL"), nullable=True + ) + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + + owner: Mapped[Optional["UserModel"]] = relationship(back_populates="projects") + files: Mapped[list["ProjectFileModel"]] = relationship( + back_populates="project", cascade="all, delete-orphan" + ) + ontologies: Mapped[list["OntologyModel"]] = relationship( + back_populates="project", cascade="all, delete-orphan" + ) + graphs: Mapped[list["GraphModel"]] = relationship( + back_populates="project", cascade="all, delete-orphan" + ) + simulations: Mapped[list["SimulationModel"]] = relationship( + back_populates="project", cascade="all, delete-orphan" + ) + reports: Mapped[list["ReportModel"]] = relationship( + back_populates="project", cascade="all, delete-orphan" + ) + + +class ProjectFileModel(Base): + __tablename__ = "project_files" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + project_id: Mapped[str] = mapped_column( + String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False + ) + original_name: Mapped[str] = mapped_column(String(255), nullable=False) + storage_path: Mapped[str] = mapped_column(Text, nullable=False) + size: Mapped[int] = mapped_column(Integer, default=0) + mime_type: Mapped[str] = mapped_column(String(100), default="application/octet-stream") + file_type: Mapped[str] = mapped_column(String(30), default="upload") # upload | extracted_text + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + + project: Mapped["ProjectModel"] = relationship(back_populates="files") + + +class OntologyModel(Base): + __tablename__ = "ontologies" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + project_id: Mapped[str] = mapped_column( + String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False + ) + version: Mapped[int] = mapped_column(Integer, default=1) + entity_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) + edge_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + + project: Mapped["ProjectModel"] = relationship(back_populates="ontologies") + graphs: Mapped[list["GraphModel"]] = relationship(back_populates="ontology") + + +class GraphModel(Base): + __tablename__ = "graphs" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + project_id: Mapped[str] = mapped_column( + String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False + ) + ontology_id: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("ontologies.id", ondelete="SET NULL"), nullable=True + ) + backend: Mapped[str] = mapped_column(String(20), default="zep") # zep | graphiti + external_id: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + status: Mapped[str] = mapped_column(String(20), default="building") # building | ready | failed + node_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + edge_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + + project: Mapped["ProjectModel"] = relationship(back_populates="graphs") + ontology: Mapped[Optional["OntologyModel"]] = relationship(back_populates="graphs") + simulations: Mapped[list["SimulationModel"]] = relationship(back_populates="graph") + reports: Mapped[list["ReportModel"]] = relationship(back_populates="graph") + + +class SimulationModel(Base): + __tablename__ = "simulations" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + project_id: Mapped[str] = mapped_column( + String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False + ) + graph_id: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True + ) + status: Mapped[str] = mapped_column(String(30), default="prepared") + platform: Mapped[str] = mapped_column(String(20), default="twitter") # twitter | reddit | both + config: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) + profiles_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + db_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + actions_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + rounds_total: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) + rounds_completed: Mapped[int] = mapped_column(Integer, default=0) + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + + project: Mapped["ProjectModel"] = relationship(back_populates="simulations") + graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="simulations") + reports: Mapped[list["ReportModel"]] = relationship(back_populates="simulation") + + +class ReportModel(Base): + __tablename__ = "reports" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + project_id: Mapped[str] = mapped_column( + String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False + ) + simulation_id: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("simulations.id", ondelete="SET NULL"), nullable=True + ) + graph_id: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True + ) + status: Mapped[str] = mapped_column(String(30), default="generating") + outline: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) + storage_prefix: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + + project: Mapped["ProjectModel"] = relationship(back_populates="reports") + simulation: Mapped[Optional["SimulationModel"]] = relationship(back_populates="reports") + graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="reports") + + +class TaskModel(Base): + __tablename__ = "tasks" + + id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) + task_type: Mapped[str] = mapped_column(String(100), nullable=False) + entity_type: Mapped[Optional[str]] = mapped_column(String(50), nullable=True) + entity_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True) + status: Mapped[str] = mapped_column(String(20), default="pending") + progress: Mapped[int] = mapped_column(Integer, default=0) + message: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + result: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) + error: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + progress_detail: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) + created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + + +class SystemConfigModel(Base): + __tablename__ = "system_config" + + key: Mapped[str] = mapped_column(String(100), primary_key=True) + value: Mapped[Optional[str]] = mapped_column(Text, nullable=True) + value_type: Mapped[str] = mapped_column(String(20), default="string") + group: Mapped[str] = mapped_column(String(50), default="general") + label: Mapped[str] = mapped_column(String(255), default="") + description: Mapped[str] = mapped_column(Text, default="") + is_secret: Mapped[bool] = mapped_column(Boolean, default=False) + updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) + updated_by: Mapped[Optional[str]] = mapped_column( + String(36), ForeignKey("users.id", ondelete="SET NULL"), nullable=True + ) + + +class InvitationTokenModel(Base): + __tablename__ = "invitation_tokens" + + token: Mapped[str] = mapped_column(String(36), primary_key=True) + user_id: Mapped[str] = mapped_column( + String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False) + used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True) + + user: Mapped["UserModel"] = relationship(back_populates="invitation_tokens") + + +class PasswordResetTokenModel(Base): + __tablename__ = "password_reset_tokens" + + token: Mapped[str] = mapped_column(String(36), primary_key=True) + user_id: Mapped[str] = mapped_column( + String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False + ) + expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False) + used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True) + + user: Mapped["UserModel"] = relationship(back_populates="password_reset_tokens") +``` + +- [ ] **Step 3: Crear test dels models** + +```python +# backend/tests/test_db_models.py +import pytest +from sqlalchemy import create_engine +from sqlalchemy.orm import sessionmaker +from backend.app.db import Base, init_db, get_session +from backend.app.models.db_models import ( + ProjectModel, TaskModel, OntologyModel, GraphModel, + SimulationModel, ReportModel, UserModel +) + + +@pytest.fixture +def db_session(): + """Sessió SQLite en memòria per a tests.""" + from backend.app import db as db_module + db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) + db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) + Base.metadata.create_all(db_module._engine) + session = db_module._SessionLocal() + yield session + session.close() + Base.metadata.drop_all(db_module._engine) + db_module._engine = None + db_module._SessionLocal = None + + +def test_create_project(db_session): + proj = ProjectModel(id="proj-1", name="Test Project") + db_session.add(proj) + db_session.commit() + result = db_session.get(ProjectModel, "proj-1") + assert result.name == "Test Project" + assert result.status == "created" + assert result.chunk_size == 500 + + +def test_create_task(db_session): + task = TaskModel(id="task-1", task_type="graph_build", entity_type="project", entity_id="proj-1") + db_session.add(task) + db_session.commit() + result = db_session.get(TaskModel, "task-1") + assert result.status == "pending" + assert result.progress == 0 + + +def test_project_cascade_delete(db_session): + proj = ProjectModel(id="proj-del", name="Del Project") + db_session.add(proj) + db_session.flush() + ont = OntologyModel(id="ont-1", project_id="proj-del", version=1) + db_session.add(ont) + db_session.commit() + db_session.delete(proj) + db_session.commit() + assert db_session.get(OntologyModel, "ont-1") is None + + +def test_task_set_null_on_delete(db_session): + task = TaskModel(id="task-del", task_type="graph_build") + proj = ProjectModel(id="proj-2", name="P2", active_task_id="task-del") + db_session.add_all([task, proj]) + db_session.commit() + db_session.delete(task) + db_session.commit() + db_session.expire(proj) + refreshed = db_session.get(ProjectModel, "proj-2") + assert refreshed.active_task_id is None + + +def test_graph_linked_to_ontology(db_session): + proj = ProjectModel(id="proj-g", name="Graph Project") + ont = OntologyModel(id="ont-g", project_id="proj-g", version=1) + graph = GraphModel(id="graph-1", project_id="proj-g", ontology_id="ont-g", backend="zep") + db_session.add_all([proj, ont, graph]) + db_session.commit() + result = db_session.get(GraphModel, "graph-1") + assert result.ontology_id == "ont-g" + assert result.backend == "zep" +``` + +- [ ] **Step 4: Executar tests dels models** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia +backend/.venv/bin/pytest backend/tests/test_db_models.py -v +``` + +Expected: 5 passed + +- [ ] **Step 5: Commit** + +```bash +git add backend/app/db.py backend/app/models/db_models.py backend/tests/test_db_models.py +git commit -m "feat(db): add SQLAlchemy Base, session factory, and all ORM models" +``` + +--- + +## Task 4: Configurar Alembic + +**Files:** +- Create: `backend/alembic.ini` +- Create: `backend/alembic/env.py` +- Create: `backend/alembic/script.py.mako` +- Create: `backend/alembic/versions/0001_initial_schema.py` + +- [ ] **Step 1: Inicialitzar Alembic** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend +backend/.venv/bin/alembic init alembic +``` + +Expected: crea `alembic/` i `alembic.ini` + +- [ ] **Step 2: Actualitzar alembic.ini** + +Substituir la línia `sqlalchemy.url = ...` a `alembic.ini`: + +```ini +# Canviar aquesta línia: +sqlalchemy.url = driver://user:pass@localhost/dbname +# Per: +sqlalchemy.url = sqlite:///mirofish_dev.db +``` + +I afegir just sota `[alembic]`: +```ini +script_location = alembic +``` + +- [ ] **Step 3: Actualitzar alembic/env.py** + +Substituir el contingut complet d'`alembic/env.py`: + +```python +# backend/alembic/env.py +import os +import sys +from logging.config import fileConfig +from sqlalchemy import engine_from_config, pool +from alembic import context + +# Afegir el backend al path perquè els imports funcionin +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) + +from app.db import Base +import app.models.db_models # noqa: F401 — registra tots els models al Base + +config = context.config + +# Llegir DATABASE_URL de l'entorn (prioritat sobre alembic.ini) +db_url = os.environ.get('DATABASE_URL', config.get_main_option('sqlalchemy.url')) +config.set_main_option('sqlalchemy.url', db_url) + +if config.config_file_name is not None: + fileConfig(config.config_file_name) + +target_metadata = Base.metadata + + +def run_migrations_offline(): + url = config.get_main_option("sqlalchemy.url") + context.configure(url=url, target_metadata=target_metadata, literal_binds=True) + with context.begin_transaction(): + context.run_migrations() + + +def run_migrations_online(): + connectable = engine_from_config( + config.get_section(config.config_ini_section, {}), + prefix="sqlalchemy.", + poolclass=pool.NullPool, + ) + with connectable.connect() as connection: + context.configure(connection=connection, target_metadata=target_metadata) + with context.begin_transaction(): + context.run_migrations() + + +if context.is_offline_mode(): + run_migrations_offline() +else: + run_migrations_online() +``` + +- [ ] **Step 4: Generar migració inicial** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend +backend/.venv/bin/alembic revision --autogenerate -m "initial_schema" +``` + +Expected: crea `alembic/versions/XXXX_initial_schema.py` amb totes les taules + +- [ ] **Step 5: Aplicar migració** + +```bash +backend/.venv/bin/alembic upgrade head +``` + +Expected: `Running upgrade -> XXXX, initial_schema` + +- [ ] **Step 6: Verificar que la BD té les taules** + +```bash +backend/.venv/bin/python -c " +import sqlite3 +conn = sqlite3.connect('mirofish_dev.db') +tables = conn.execute(\"SELECT name FROM sqlite_master WHERE type='table'\").fetchall() +print([t[0] for t in tables]) +conn.close() +" +``` + +Expected: llista que inclou `projects`, `tasks`, `users`, `ontologies`, `graphs`, `simulations`, `reports`, `system_config` + +- [ ] **Step 7: Commit** + +```bash +git add backend/alembic.ini backend/alembic/ backend/mirofish_dev.db +git commit -m "feat(alembic): add initial schema migration for all SQLAlchemy models" +``` + +--- + +## Task 5: Implementar StorageService + +**Files:** +- Create: `backend/app/storage/__init__.py` +- Create: `backend/app/storage/protocol.py` +- Create: `backend/app/storage/local.py` +- Create: `backend/app/storage/azure_blob.py` +- Create: `backend/app/storage/factory.py` +- Create: `backend/tests/test_storage.py` + +- [ ] **Step 1: Crear el directori i el Protocol** + +```python +# backend/app/storage/protocol.py +"""Interfície abstracta per a la capa de storage de fitxers.""" +from typing import IO, Iterator, Protocol, runtime_checkable + + +@runtime_checkable +class StorageService(Protocol): + def upload(self, path: str, data: bytes | IO, content_type: str = "application/octet-stream") -> None: + ... + + def download(self, path: str) -> bytes: + ... + + def download_stream(self, path: str) -> IO: + ... + + def delete(self, path: str) -> None: + ... + + def delete_prefix(self, prefix: str) -> None: + """Esborra tots els fitxers que comencen per prefix.""" + ... + + def exists(self, path: str) -> bool: + ... + + def list(self, prefix: str = "") -> list[str]: + """Retorna paths relatius sota el prefix.""" + ... + + def public_url(self, path: str) -> str | None: + """URL pública si el backend ho suporta, None si no.""" + ... +``` + +- [ ] **Step 2: Crear LocalFSStorage** + +```python +# backend/app/storage/local.py +"""Adapter de storage per a filesystem local.""" +import io +import os +import shutil +from pathlib import Path +from .protocol import StorageService + + +class LocalFSStorage: + """Implementació de StorageService per a filesystem local.""" + + def __init__(self, base_path: str) -> None: + self._base = Path(base_path).resolve() + self._base.mkdir(parents=True, exist_ok=True) + + def _safe_path(self, relative: str) -> Path: + """Resol el path i valida que estigui dins del base per evitar path traversal.""" + resolved = (self._base / relative).resolve() + if not str(resolved).startswith(str(self._base)): + raise ValueError(f"Path traversal detectat: {relative!r}") + return resolved + + def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None: + dest = self._safe_path(path) + dest.parent.mkdir(parents=True, exist_ok=True) + if isinstance(data, bytes): + dest.write_bytes(data) + else: + with open(dest, "wb") as f: + shutil.copyfileobj(data, f) + + def download(self, path: str) -> bytes: + return self._safe_path(path).read_bytes() + + def download_stream(self, path: str) -> io.BytesIO: + return io.BytesIO(self.download(path)) + + def delete(self, path: str) -> None: + p = self._safe_path(path) + if p.exists(): + p.unlink() + + def delete_prefix(self, prefix: str) -> None: + p = self._safe_path(prefix) + if p.is_dir(): + shutil.rmtree(p) + elif p.exists(): + p.unlink() + + def exists(self, path: str) -> bool: + return self._safe_path(path).exists() + + def list(self, prefix: str = "") -> list[str]: + base = self._safe_path(prefix) if prefix else self._base + if not base.exists(): + return [] + result = [] + for p in base.rglob("*"): + if p.is_file(): + result.append(str(p.relative_to(self._base))) + return result + + def public_url(self, path: str) -> str | None: + return None +``` + +- [ ] **Step 3: Crear AzureBlobStorage** + +```python +# backend/app/storage/azure_blob.py +"""Adapter de storage per a Azure Blob Storage.""" +import io +from .protocol import StorageService + + +class AzureBlobStorage: + """Implementació de StorageService per a Azure Blob Storage.""" + + def __init__(self, connection_string: str, container_name: str) -> None: + from azure.storage.blob import BlobServiceClient + self._client = BlobServiceClient.from_connection_string(connection_string) + self._container = container_name + self._ensure_container() + + def _ensure_container(self) -> None: + container_client = self._client.get_container_client(self._container) + if not container_client.exists(): + container_client.create_container() + + def _blob_client(self, path: str): + return self._client.get_blob_client(container=self._container, blob=path) + + def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None: + blob = self._blob_client(path) + if isinstance(data, bytes): + blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type}) + else: + blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type}) + + def download(self, path: str) -> bytes: + return self._blob_client(path).download_blob().readall() + + def download_stream(self, path: str) -> io.BytesIO: + return io.BytesIO(self.download(path)) + + def delete(self, path: str) -> None: + self._blob_client(path).delete_blob(delete_snapshots="include") + + def delete_prefix(self, prefix: str) -> None: + container = self._client.get_container_client(self._container) + blobs = container.list_blobs(name_starts_with=prefix) + for blob in blobs: + container.delete_blob(blob.name, delete_snapshots="include") + + def exists(self, path: str) -> bool: + return self._blob_client(path).exists() + + def list(self, prefix: str = "") -> list[str]: + container = self._client.get_container_client(self._container) + return [b.name for b in container.list_blobs(name_starts_with=prefix)] + + def public_url(self, path: str) -> str | None: + return self._blob_client(path).url +``` + +- [ ] **Step 4: Crear factory** + +```python +# backend/app/storage/factory.py +"""Selecciona la implementació de StorageService per STORAGE_TYPE.""" +import os +from .protocol import StorageService + + +def create_storage_service() -> StorageService: + storage_type = os.environ.get("STORAGE_TYPE", "local") + match storage_type: + case "azure": + from .azure_blob import AzureBlobStorage + conn_str = os.environ.get("AZURE_STORAGE_CONNECTION_STRING", "") + container = os.environ.get("AZURE_STORAGE_CONTAINER", "mirofish") + if not conn_str: + raise RuntimeError("AZURE_STORAGE_CONNECTION_STRING no configurada per STORAGE_TYPE=azure") + return AzureBlobStorage(conn_str, container) + case _: + from .local import LocalFSStorage + base = os.environ.get("STORAGE_LOCAL_PATH", + os.path.join(os.path.dirname(__file__), "../../../uploads")) + return LocalFSStorage(base) +``` + +- [ ] **Step 5: Crear __init__.py del package** + +```python +# backend/app/storage/__init__.py +from .protocol import StorageService +from .factory import create_storage_service + +__all__ = ["StorageService", "create_storage_service"] +``` + +- [ ] **Step 6: Escriure tests de LocalFSStorage** + +```python +# backend/tests/test_storage.py +import io +import pytest +import tempfile +import os +from backend.app.storage.local import LocalFSStorage + + +@pytest.fixture +def storage(tmp_path): + return LocalFSStorage(str(tmp_path)) + + +def test_upload_and_download_bytes(storage): + storage.upload("foo/bar.txt", b"hello world", "text/plain") + assert storage.download("foo/bar.txt") == b"hello world" + + +def test_upload_and_download_stream(storage): + data = io.BytesIO(b"stream data") + storage.upload("test/stream.bin", data) + result = storage.download("test/stream.bin") + assert result == b"stream data" + + +def test_exists(storage): + assert not storage.exists("not/there.txt") + storage.upload("yes.txt", b"x") + assert storage.exists("yes.txt") + + +def test_delete(storage): + storage.upload("del.txt", b"bye") + storage.delete("del.txt") + assert not storage.exists("del.txt") + + +def test_delete_prefix(storage): + storage.upload("dir/a.txt", b"a") + storage.upload("dir/b.txt", b"b") + storage.delete_prefix("dir") + assert not storage.exists("dir/a.txt") + assert not storage.exists("dir/b.txt") + + +def test_list(storage): + storage.upload("root/x.txt", b"x") + storage.upload("root/y.txt", b"y") + paths = storage.list("root") + assert len(paths) == 2 + assert all("root" in p for p in paths) + + +def test_path_traversal_blocked(storage): + with pytest.raises(ValueError, match="Path traversal"): + storage._safe_path("../../etc/passwd") + + +def test_public_url_is_none(storage): + storage.upload("f.txt", b"x") + assert storage.public_url("f.txt") is None +``` + +- [ ] **Step 7: Executar tests de storage** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia +backend/.venv/bin/pytest backend/tests/test_storage.py -v +``` + +Expected: 8 passed + +- [ ] **Step 8: Commit** + +```bash +git add backend/app/storage/ backend/tests/test_storage.py +git commit -m "feat(storage): add StorageService protocol, LocalFSStorage, AzureBlobStorage, factory" +``` + +--- + +## Task 6: Injectar DB i Storage a Flask + +**Files:** +- Modify: `backend/app/__init__.py` + +- [ ] **Step 1: Actualitzar create_app per inicialitzar DB i Storage** + +Afegir just després de `app = Flask(__name__)` i `app.config.from_object(...)`: + +```python + # Inicialitzar BD + from .db import init_db + init_db(app.config['DATABASE_URL']) + + # Inicialitzar Storage + from .storage import create_storage_service + app.extensions['storage'] = create_storage_service() +``` + +I afegir una funció helper al final del fitxer (fora de `create_app`): + +```python +def get_storage(): + """Accés al StorageService des de qualsevol context Flask.""" + from flask import current_app + return current_app.extensions['storage'] +``` + +- [ ] **Step 2: Verificar que l'app arrenca correctament** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia +DATABASE_URL=sqlite:///test_startup.db STORAGE_TYPE=local \ + backend/.venv/bin/python -c " +from backend.app import create_app +app = create_app() +print('App created OK') +print('Storage:', app.extensions.get('storage')) +" +``` + +Expected: `App created OK` + `Storage: ` + +- [ ] **Step 3: Netejar fitxer de test** + +```bash +rm -f /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend/test_startup.db +``` + +- [ ] **Step 4: Commit** + +```bash +git add backend/app/__init__.py +git commit -m "feat(app): inject SQLAlchemy DB and StorageService into Flask app factory" +``` + +--- + +## Task 7: Refactoritzar TaskManager → BD + +**Files:** +- Modify: `backend/app/models/task.py` +- Create: `backend/tests/test_task_manager_db.py` + +El `TaskManager` actual és in-memory. El refactoritzem per usar la BD via `get_session()`. Mantenim la mateixa interfície pública (`create_task`, `get_task`, `update_task`, `complete_task`, `fail_task`, `list_tasks`) per no trencar cap cridador. + +- [ ] **Step 1: Escriure els tests del nou TaskManager** + +```python +# backend/tests/test_task_manager_db.py +import pytest +from sqlalchemy import create_engine +from sqlalchemy.orm import sessionmaker +from backend.app.db import Base +import backend.app.db as db_module +from backend.app.models.db_models import TaskModel + + +@pytest.fixture(autouse=True) +def isolated_db(): + """BD SQLite en memòria per a cada test.""" + db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) + db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) + Base.metadata.create_all(db_module._engine) + yield + Base.metadata.drop_all(db_module._engine) + db_module._engine = None + db_module._SessionLocal = None + + +def test_create_and_get_task(): + from backend.app.models.task import TaskManager + tm = TaskManager() + task_id = tm.create_task("graph_build", {"project_id": "proj-1"}) + task = tm.get_task(task_id) + assert task is not None + assert task["task_type"] == "graph_build" + assert task["status"] == "pending" + assert task["progress"] == 0 + + +def test_update_task_progress(): + from backend.app.models.task import TaskManager + tm = TaskManager() + task_id = tm.create_task("ontology_generate") + tm.update_task(task_id, progress=50, message="Halfway") + task = tm.get_task(task_id) + assert task["progress"] == 50 + assert task["message"] == "Halfway" + + +def test_complete_task(): + from backend.app.models.task import TaskManager + tm = TaskManager() + task_id = tm.create_task("graph_build") + tm.complete_task(task_id, {"graph_id": "g-1"}) + task = tm.get_task(task_id) + assert task["status"] == "completed" + assert task["progress"] == 100 + assert task["result"]["graph_id"] == "g-1" + + +def test_fail_task(): + from backend.app.models.task import TaskManager + tm = TaskManager() + task_id = tm.create_task("simulation_prepare") + tm.fail_task(task_id, "LLM timeout") + task = tm.get_task(task_id) + assert task["status"] == "failed" + assert task["error"] == "LLM timeout" + + +def test_task_survives_new_manager_instance(): + """La tasca ha d'estar a la BD, no a la memòria.""" + from backend.app.models.task import TaskManager + tm1 = TaskManager() + task_id = tm1.create_task("graph_build") + # Crear una nova instància (simula reinici) + TaskManager._instance = None + tm2 = TaskManager() + task = tm2.get_task(task_id) + assert task is not None + assert task["task_id"] == task_id + + +def test_list_tasks(): + from backend.app.models.task import TaskManager + tm = TaskManager() + tm.create_task("graph_build") + tm.create_task("graph_build") + tm.create_task("ontology_generate") + all_tasks = tm.list_tasks() + assert len(all_tasks) == 3 + graph_tasks = tm.list_tasks(task_type="graph_build") + assert len(graph_tasks) == 2 +``` + +- [ ] **Step 2: Executar tests per verificar que fallen** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia +backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v +``` + +Expected: `test_task_survives_new_manager_instance` FAIL (perquè ara és in-memory) + +- [ ] **Step 3: Refactoritzar TaskManager** + +Substituir el contingut de `backend/app/models/task.py`: + +```python +"""Task state management — persistent via SQLAlchemy.""" +import uuid +import threading +from datetime import datetime +from enum import Enum +from typing import Dict, Any, Optional, List + +from ..db import get_session +from ..models.db_models import TaskModel +from ..utils.locale import t + + +class TaskStatus(str, Enum): + PENDING = "pending" + PROCESSING = "processing" + COMPLETED = "completed" + FAILED = "failed" + + +class TaskManager: + """Task manager — thread-safe, persistent via SQLAlchemy.""" + + _instance = None + _lock = threading.Lock() + + def __new__(cls): + if cls._instance is None: + with cls._lock: + if cls._instance is None: + cls._instance = super().__new__(cls) + return cls._instance + + def create_task(self, task_type: str, metadata: Optional[Dict] = None) -> str: + task_id = str(uuid.uuid4()) + with get_session() as db: + task = TaskModel( + id=task_id, + task_type=task_type, + status="pending", + progress=0, + progress_detail=metadata or {}, + ) + db.add(task) + db.commit() + return task_id + + def get_task(self, task_id: str) -> Optional[Dict[str, Any]]: + with get_session() as db: + task = db.get(TaskModel, task_id) + if task is None: + return None + return self._to_dict(task) + + def update_task( + self, + task_id: str, + status: Optional[str] = None, + progress: Optional[int] = None, + message: Optional[str] = None, + result: Optional[Dict] = None, + error: Optional[str] = None, + progress_detail: Optional[Dict] = None, + ) -> None: + with get_session() as db: + task = db.get(TaskModel, task_id) + if task is None: + return + if status is not None: + task.status = status + if progress is not None: + task.progress = progress + if message is not None: + task.message = message + if result is not None: + task.result = result + if error is not None: + task.error = error + if progress_detail is not None: + task.progress_detail = progress_detail + task.updated_at = datetime.utcnow() + db.commit() + + def complete_task(self, task_id: str, result: Dict) -> None: + self.update_task( + task_id, + status=TaskStatus.COMPLETED, + progress=100, + message=t("progress.taskComplete"), + result=result, + ) + + def fail_task(self, task_id: str, error: str) -> None: + self.update_task( + task_id, + status=TaskStatus.FAILED, + message=t("progress.taskFailed"), + error=error, + ) + + def list_tasks(self, task_type: Optional[str] = None) -> List[Dict[str, Any]]: + from sqlalchemy import select, desc + with get_session() as db: + stmt = select(TaskModel).order_by(desc(TaskModel.created_at)) + if task_type: + stmt = stmt.where(TaskModel.task_type == task_type) + tasks = db.execute(stmt).scalars().all() + return [self._to_dict(t) for t in tasks] + + def cleanup_old_tasks(self, max_age_hours: int = 24) -> None: + from datetime import timedelta + from sqlalchemy import delete + cutoff = datetime.utcnow() - timedelta(hours=max_age_hours) + with get_session() as db: + db.execute( + delete(TaskModel).where( + TaskModel.created_at < cutoff, + TaskModel.status.in_(["completed", "failed"]), + ) + ) + db.commit() + + @staticmethod + def _to_dict(task: TaskModel) -> Dict[str, Any]: + return { + "task_id": task.id, + "task_type": task.task_type, + "status": task.status, + "created_at": task.created_at.isoformat(), + "updated_at": task.updated_at.isoformat(), + "progress": task.progress, + "message": task.message or "", + "progress_detail": task.progress_detail or {}, + "result": task.result, + "error": task.error, + "metadata": task.progress_detail or {}, + } +``` + +**Nota:** `get_session()` ja és un context manager des del Task 3. Usa `with get_session() as db:` tal com es mostra al codi. + +- [ ] **Step 4: Executar tests del TaskManager** + +```bash +backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v +``` + +Expected: 6 passed + +- [ ] **Step 5: Commit** + +```bash +git add backend/app/models/task.py backend/app/db.py backend/tests/test_task_manager_db.py +git commit -m "feat(task): refactor TaskManager to persist tasks in SQLAlchemy DB" +``` + +--- + +## Task 8: Refactoritzar ProjectManager → BD + Storage + +**Files:** +- Modify: `backend/app/models/project.py` +- Create: `backend/tests/test_project_manager_db.py` + +Refactoritzem `ProjectManager` per usar la BD per a metadades i `StorageService` per a fitxers. Mantenim la mateixa interfície pública. + +- [ ] **Step 1: Escriure tests del nou ProjectManager** + +```python +# backend/tests/test_project_manager_db.py +import io +import pytest +import tempfile +from sqlalchemy import create_engine +from sqlalchemy.orm import sessionmaker +from backend.app.db import Base +import backend.app.db as db_module +from backend.app.storage.local import LocalFSStorage + + +@pytest.fixture(autouse=True) +def isolated_db(tmp_path): + db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) + db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) + Base.metadata.create_all(db_module._engine) + yield + Base.metadata.drop_all(db_module._engine) + db_module._engine = None + db_module._SessionLocal = None + + +@pytest.fixture +def storage(tmp_path): + return LocalFSStorage(str(tmp_path)) + + +def test_create_project(storage): + from backend.app.models.project import ProjectManager + proj = ProjectManager.create_project("Test Project", storage=storage) + assert proj["name"] == "Test Project" + assert proj["status"] == "created" + assert "id" in proj + + +def test_get_project(storage): + from backend.app.models.project import ProjectManager + created = ProjectManager.create_project("My Project", storage=storage) + fetched = ProjectManager.get_project(created["id"]) + assert fetched is not None + assert fetched["name"] == "My Project" + + +def test_project_not_found(storage): + from backend.app.models.project import ProjectManager + result = ProjectManager.get_project("nonexistent-id") + assert result is None + + +def test_save_and_get_extracted_text(storage): + from backend.app.models.project import ProjectManager + proj = ProjectManager.create_project("Text Project", storage=storage) + ProjectManager.save_extracted_text(proj["id"], "hello extracted", storage=storage) + text = ProjectManager.get_extracted_text(proj["id"], storage=storage) + assert text == "hello extracted" + + +def test_project_survives_manager_reset(storage): + """Les dades han d'estar a la BD, no a la memòria.""" + from backend.app.models.project import ProjectManager + created = ProjectManager.create_project("Persist Me", storage=storage) + # Simular reinici: netejar l'estat en memòria si n'hi ha + fetched = ProjectManager.get_project(created["id"]) + assert fetched is not None + + +def test_list_projects(storage): + from backend.app.models.project import ProjectManager + ProjectManager.create_project("P1", storage=storage) + ProjectManager.create_project("P2", storage=storage) + projects = ProjectManager.list_projects() + assert len(projects) == 2 + + +def test_delete_project(storage): + from backend.app.models.project import ProjectManager + proj = ProjectManager.create_project("Del Me", storage=storage) + result = ProjectManager.delete_project(proj["id"], storage=storage) + assert result is True + assert ProjectManager.get_project(proj["id"]) is None +``` + +- [ ] **Step 2: Executar tests per verificar que fallen** + +```bash +backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v +``` + +Expected: errors (interfície actual no accepta `storage=` paràmetre) + +- [ ] **Step 3: Refactoritzar ProjectManager** + +Substituir el contingut de `backend/app/models/project.py`: + +```python +"""Project context management — persistent via SQLAlchemy + StorageService.""" +import uuid +import io +from datetime import datetime +from typing import Dict, Any, List, Optional +from enum import Enum + +from ..db import get_session +from ..models.db_models import ProjectModel, ProjectFileModel + + +class ProjectStatus(str, Enum): + CREATED = "created" + ONTOLOGY_GENERATED = "ontology_generated" + GRAPH_BUILDING = "graph_building" + GRAPH_COMPLETED = "graph_completed" + FAILED = "failed" + + +class ProjectManager: + """Gestiona projectes: metadades a BD, fitxers a StorageService.""" + + @classmethod + def create_project(cls, name: str = "Unnamed Project", storage=None) -> Dict[str, Any]: + project_id = str(uuid.uuid4()) + with get_session() as db: + proj = ProjectModel(id=project_id, name=name, status="created") + db.add(proj) + db.commit() + db.refresh(proj) + return cls._to_dict(proj) + + @classmethod + def get_project(cls, project_id: str) -> Optional[Dict[str, Any]]: + with get_session() as db: + proj = db.get(ProjectModel, project_id) + if proj is None: + return None + return cls._to_dict(proj) + + @classmethod + def save_project(cls, project_data: Dict[str, Any]) -> None: + """Actualitza els camps d'un projecte existent.""" + project_id = project_data.get("id") or project_data.get("project_id") + with get_session() as db: + proj = db.get(ProjectModel, project_id) + if proj is None: + return + updatable = [ + "name", "status", "analysis_summary", "simulation_requirement", + "chunk_size", "chunk_overlap", "active_task_id", + ] + for field in updatable: + if field in project_data: + setattr(proj, field, project_data[field]) + proj.updated_at = datetime.utcnow() + db.commit() + + @classmethod + def list_projects(cls, limit: int = 50) -> List[Dict[str, Any]]: + from sqlalchemy import select, desc + with get_session() as db: + stmt = select(ProjectModel).order_by(desc(ProjectModel.created_at)).limit(limit) + projects = db.execute(stmt).scalars().all() + return [cls._to_dict(p) for p in projects] + + @classmethod + def delete_project(cls, project_id: str, storage=None) -> bool: + with get_session() as db: + proj = db.get(ProjectModel, project_id) + if proj is None: + return False + # Esborrar fitxers de storage si s'ha passat el servei + if storage is not None: + storage.delete_prefix(f"projects/{project_id}") + db.delete(proj) + db.commit() + return True + + @classmethod + def save_file_to_project( + cls, + project_id: str, + file_storage, # Flask FileStorage + original_filename: str, + storage, + ) -> Dict[str, Any]: + import os + ext = os.path.splitext(original_filename)[1].lower() + safe_filename = f"{uuid.uuid4().hex[:8]}{ext}" + storage_path = f"projects/{project_id}/files/{safe_filename}" + + data = file_storage.read() + storage.upload(storage_path, data) + + mime_type = getattr(file_storage, "content_type", "application/octet-stream") or "application/octet-stream" + + with get_session() as db: + file_rec = ProjectFileModel( + id=str(uuid.uuid4()), + project_id=project_id, + original_name=original_filename, + storage_path=storage_path, + size=len(data), + mime_type=mime_type, + file_type="upload", + ) + db.add(file_rec) + db.commit() + + return { + "original_filename": original_filename, + "saved_filename": safe_filename, + "storage_path": storage_path, + "size": len(data), + } + + @classmethod + def save_extracted_text(cls, project_id: str, text: str, storage) -> None: + storage_path = f"projects/{project_id}/extracted_text.txt" + storage.upload(storage_path, text.encode("utf-8"), "text/plain") + + with get_session() as db: + from sqlalchemy import select + stmt = select(ProjectFileModel).where( + ProjectFileModel.project_id == project_id, + ProjectFileModel.file_type == "extracted_text", + ) + existing = db.execute(stmt).scalar_one_or_none() + if existing: + existing.storage_path = storage_path + existing.size = len(text.encode("utf-8")) + else: + rec = ProjectFileModel( + id=str(uuid.uuid4()), + project_id=project_id, + original_name="extracted_text.txt", + storage_path=storage_path, + size=len(text.encode("utf-8")), + mime_type="text/plain", + file_type="extracted_text", + ) + db.add(rec) + db.commit() + + @classmethod + def get_extracted_text(cls, project_id: str, storage) -> Optional[str]: + storage_path = f"projects/{project_id}/extracted_text.txt" + if not storage.exists(storage_path): + return None + return storage.download(storage_path).decode("utf-8") + + @staticmethod + def _to_dict(proj: ProjectModel) -> Dict[str, Any]: + return { + "id": proj.id, + "project_id": proj.id, # compatibilitat amb codi existent + "name": proj.name, + "status": proj.status, + "analysis_summary": proj.analysis_summary, + "simulation_requirement": proj.simulation_requirement, + "chunk_size": proj.chunk_size, + "chunk_overlap": proj.chunk_overlap, + "active_task_id": proj.active_task_id, + "created_at": proj.created_at.isoformat(), + "updated_at": proj.updated_at.isoformat(), + # Camps llegits del model antic — ara buits per compatibilitat + "files": [], + "total_text_length": 0, + "ontology": None, + "graph_id": None, + "graph_build_task_id": None, + "error": None, + } +``` + +- [ ] **Step 4: Executar tests del ProjectManager** + +```bash +backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v +``` + +Expected: 7 passed + +- [ ] **Step 5: Commit** + +```bash +git add backend/app/models/project.py backend/tests/test_project_manager_db.py +git commit -m "feat(project): refactor ProjectManager to persist via SQLAlchemy + StorageService" +``` + +--- + +## Task 9: Actualitzar tests existents i verificació final + +**Files:** +- Modify: `backend/tests/conftest.py` +- Modify: `backend/tests/test_project_task_recovery.py` (si afectat) + +- [ ] **Step 1: Actualitzar conftest.py per afegir fixtures globals** + +```python +# backend/tests/conftest.py +import pytest +from sqlalchemy import create_engine +from sqlalchemy.orm import sessionmaker +from backend.app.db import Base +import backend.app.db as db_module + + +@pytest.fixture(autouse=True) +def reset_graph_factory_singleton(): + """Reset the graph backend singleton before each test.""" + yield + try: + import backend.app.graph.factory as fmod + fmod._backend_instance = None + except ImportError: + pass + + +@pytest.fixture(autouse=True) +def reset_task_manager_singleton(): + """Reset TaskManager singleton between tests.""" + from backend.app.models import task as task_module + task_module.TaskManager._instance = None + yield + task_module.TaskManager._instance = None + + +@pytest.fixture +def in_memory_db(): + """BD SQLite en memòria per a tests que necessiten BD.""" + db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) + db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) + Base.metadata.create_all(db_module._engine) + yield db_module._engine + Base.metadata.drop_all(db_module._engine) + db_module._engine = None + db_module._SessionLocal = None +``` + +- [ ] **Step 2: Executar tota la suite de tests** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia +backend/.venv/bin/pytest backend/tests/ -v --tb=short 2>&1 | tail -30 +``` + +Expected: tots els tests del Task 2-8 passen. El test `test_config_graph_backend_default` pot continuar fallant (falla preexistent no relacionada). + +- [ ] **Step 3: Verificar que l'app arrenca i la BD es crea correctament** + +```bash +cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend +DATABASE_URL=sqlite:///verify_startup.db \ +STORAGE_TYPE=local \ +STORAGE_LOCAL_PATH=/tmp/mirofish_test_uploads \ +LLM_API_KEY=test-key \ +ZEP_API_KEY=test-key \ + .venv/bin/python -c " +from app import create_app +app = create_app() +with app.app_context(): + from app.models.project import ProjectManager + from app.storage import create_storage_service + storage = app.extensions['storage'] + proj = ProjectManager.create_project('Startup Test', storage=storage) + print('Project created:', proj['id']) + fetched = ProjectManager.get_project(proj['id']) + print('Project fetched:', fetched['name']) + print('Verification OK') +" +rm -f verify_startup.db +``` + +Expected: `Verification OK` + +- [ ] **Step 4: Commit final de la Fase 1** + +```bash +git add backend/tests/conftest.py +git commit -m "test(conftest): add in_memory_db and task manager singleton reset fixtures" + +git tag fase1-infraestructura-base +``` + +--- + +## Verificació end-to-end de la Fase 1 + +```bash +# 1. Tots els tests passen +backend/.venv/bin/pytest backend/tests/ -v + +# 2. La BD es crea amb les migracions +backend/.venv/bin/alembic upgrade head + +# 3. L'app arrenca correctament +DATABASE_URL=sqlite:///mirofish_dev.db STORAGE_TYPE=local LLM_API_KEY=x ZEP_API_KEY=x \ + backend/.venv/bin/python backend/run.py & +sleep 2 +curl -s http://localhost:5001/health | python3 -m json.tool +kill %1 +``` + +Expected final: `{"service": "MiroFish Backend", "status": "ok"}` + +--- + +> **Nota:** Les Fases 2 (Auth+RBAC), 3 (pipeline) i 4 (hardening producció) tindran els seus propis plans, escrits quan comenci cada fase.