# Persistència Fase 1: Infraestructura Base > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Substituir la persistència JSON+memòria per SQLAlchemy 2.x (SQLite dev / PostgreSQL prod) i un `StorageService` abstracte (LocalFS dev / Azure Blob prod), de manera que projectes i tasques sobrevisquin reinicis del servidor. **Architecture:** S'afegeix una capa de BD via SQLAlchemy sota els `Manager` existents. `ProjectManager` i `TaskManager` es refactoritzen per llegir/escriure a la BD en comptes de JSON i memòria. L'`StorageService` substitueix totes les operacions directes de fitxers. L'app factory (`create_app`) injecta la sessió DB i el storage com a extensions Flask. **Tech Stack:** SQLAlchemy 2.x, Alembic, `flask-sqlalchemy`, `azure-storage-blob`, `bcrypt` (per a fases posteriors, s'afegeix al `pyproject.toml` ara) --- ## Mapa de fitxers ### Nous fitxers a crear | Fitxer | Responsabilitat | |--------|-----------------| | `backend/app/db.py` | Engine SQLAlchemy, `Base`, `get_db()` session factory, `init_db()` | | `backend/app/models/db_models.py` | Tots els models SQLAlchemy (Project, ProjectFile, Ontology, Graph, Simulation, Report, Task, SystemConfig, User, InvitationToken, PasswordResetToken) | | `backend/app/storage/__init__.py` | Exporta `StorageService`, `get_storage()` | | `backend/app/storage/protocol.py` | `StorageService` Protocol (interfície) | | `backend/app/storage/local.py` | `LocalFSStorage` (pathlib) | | `backend/app/storage/azure_blob.py` | `AzureBlobStorage` (azure-storage-blob) | | `backend/app/storage/factory.py` | `create_storage_service()` — selecció per STORAGE_TYPE | | `backend/alembic.ini` | Config Alembic | | `backend/alembic/env.py` | Entorn Alembic (llegeix DATABASE_URL) | | `backend/alembic/versions/0001_initial_schema.py` | Migració inicial (totes les taules) | | `backend/tests/test_db_models.py` | Tests dels models SQLAlchemy | | `backend/tests/test_storage.py` | Tests de LocalFSStorage | | `backend/tests/test_project_manager_db.py` | Tests de ProjectManager amb BD | | `backend/tests/test_task_manager_db.py` | Tests de TaskManager amb BD | ### Fitxers a modificar | Fitxer | Canvi | |--------|-------| | `backend/pyproject.toml` | Afegir `sqlalchemy`, `alembic`, `flask-sqlalchemy`, `azure-storage-blob`, `bcrypt`, `flask-jwt-extended` | | `backend/app/config.py` | Afegir `DATABASE_URL`, `STORAGE_TYPE`, `STORAGE_LOCAL_PATH`, `AZURE_STORAGE_*`, `JWT_SECRET`, `JWT_REFRESH_SECRET` | | `backend/app/__init__.py` | Inicialitzar DB + Storage a `create_app()`; substituir auth provisional per `flask-jwt-extended` stub | | `backend/app/models/project.py` | Refactoritzar `ProjectManager` per usar BD + `StorageService` | | `backend/app/models/task.py` | Refactoritzar `TaskManager` per usar BD | | `backend/tests/conftest.py` | Afegir fixtures de BD en memòria i storage temporal | --- ## Task 1: Afegir dependències **Files:** - Modify: `backend/pyproject.toml` - [ ] **Step 1: Afegir dependències al pyproject.toml** ```toml # backend/pyproject.toml — secció dependencies, afegir: "sqlalchemy>=2.0.0", "alembic>=1.13.0", "flask-sqlalchemy>=3.1.0", "azure-storage-blob>=12.19.0", "bcrypt>=4.1.0", "flask-jwt-extended>=4.6.0", ``` - [ ] **Step 2: Instal·lar dependències** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend uv sync ``` Expected: sense errors. `uv sync` actualitza el `.venv`. - [ ] **Step 3: Verificar importació** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend .venv/bin/python -c "import sqlalchemy; import alembic; import flask_sqlalchemy; print('OK')" ``` Expected: `OK` - [ ] **Step 4: Commit** ```bash git add backend/pyproject.toml git commit -m "chore(deps): add SQLAlchemy, Alembic, Azure Blob, bcrypt, flask-jwt-extended" ``` --- ## Task 2: Afegir variables de configuració **Files:** - Modify: `backend/app/config.py` - [ ] **Step 1: Escriure test de la nova configuració** Afegir a `backend/tests/test_db_models.py` (el fitxer el crearem al Task 3, però el test de config el posem a un fitxer nou): ```python # backend/tests/test_config.py import os import pytest def test_database_url_default(): """DATABASE_URL per defecte ha de ser SQLite""" from backend.app.config import Config assert Config.DATABASE_URL.startswith("sqlite") def test_storage_type_default(): from backend.app.config import Config assert Config.STORAGE_TYPE == "local" def test_storage_local_path_exists(): from backend.app.config import Config assert Config.STORAGE_LOCAL_PATH is not None ``` - [ ] **Step 2: Executar test per verificar que falla** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia .venv/bin/pytest backend/tests/test_config.py -v 2>/dev/null || \ backend/.venv/bin/pytest backend/tests/test_config.py -v ``` Expected: `AttributeError: type object 'Config' has no attribute 'DATABASE_URL'` - [ ] **Step 3: Afegir configuració a config.py** Afegir al final de la classe `Config` (just abans del mètode `validate`): ```python # ── Persistència ────────────────────────────────────────────── # Base de dades DATABASE_URL = os.environ.get('DATABASE_URL', 'sqlite:///mirofish_dev.db') # Storage de fitxers STORAGE_TYPE = os.environ.get('STORAGE_TYPE', 'local') # local | azure STORAGE_LOCAL_PATH = os.environ.get( 'STORAGE_LOCAL_PATH', os.path.join(os.path.dirname(__file__), '../uploads') ) AZURE_STORAGE_CONNECTION_STRING = os.environ.get('AZURE_STORAGE_CONNECTION_STRING', '') AZURE_STORAGE_CONTAINER = os.environ.get('AZURE_STORAGE_CONTAINER', 'mirofish') # JWT (per a la Fase 2 d'autenticació — definits aquí perquè flask-jwt-extended els necessita en create_app) JWT_SECRET_KEY = os.environ.get('JWT_SECRET', 'change-me-in-production') JWT_REFRESH_SECRET_KEY = os.environ.get('JWT_REFRESH_SECRET', 'change-me-refresh-in-production') JWT_ACCESS_TOKEN_EXPIRES_HOURS = int(os.environ.get('JWT_ACCESS_TOKEN_EXPIRES_HOURS', '8')) JWT_REFRESH_TOKEN_EXPIRES_DAYS = int(os.environ.get('JWT_REFRESH_TOKEN_EXPIRES_DAYS', '7')) ``` - [ ] **Step 4: Executar test per verificar que passa** ```bash backend/.venv/bin/pytest backend/tests/test_config.py -v ``` Expected: 3 passed - [ ] **Step 5: Commit** ```bash git add backend/app/config.py backend/tests/test_config.py git commit -m "feat(config): add DATABASE_URL, STORAGE_TYPE, AZURE_STORAGE_*, JWT config vars" ``` --- ## Task 3: Crear models SQLAlchemy **Files:** - Create: `backend/app/models/db_models.py` - Create: `backend/app/db.py` - Create: `backend/tests/test_db_models.py` - [ ] **Step 1: Crear backend/app/db.py** ```python # backend/app/db.py """SQLAlchemy engine, session factory i Base declarativa.""" from contextlib import contextmanager from sqlalchemy import create_engine from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session from typing import Generator class Base(DeclarativeBase): pass _engine = None _SessionLocal = None def init_db(database_url: str) -> None: global _engine, _SessionLocal connect_args = {"check_same_thread": False} if database_url.startswith("sqlite") else {} _engine = create_engine(database_url, connect_args=connect_args, echo=False) _SessionLocal = sessionmaker(bind=_engine, autocommit=False, autoflush=False) Base.metadata.create_all(_engine) @contextmanager def get_session() -> Generator[Session, None, None]: """Context manager de sessió SQLAlchemy.""" if _SessionLocal is None: raise RuntimeError("Database not initialized. Call init_db() first.") db = _SessionLocal() try: yield db except Exception: db.rollback() raise finally: db.close() ``` - [ ] **Step 2: Crear backend/app/models/db_models.py** ```python # backend/app/models/db_models.py """Models SQLAlchemy per a tota la persistència de MiroFish.""" import uuid from datetime import datetime from typing import Optional from sqlalchemy import ( String, Integer, Text, Boolean, DateTime, JSON, ForeignKey, UniqueConstraint ) from sqlalchemy.orm import Mapped, mapped_column, relationship from ..db import Base def _uuid() -> str: return str(uuid.uuid4()) def _now() -> datetime: return datetime.utcnow() class UserModel(Base): __tablename__ = "users" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False) name: Mapped[str] = mapped_column(String(255), nullable=False, default="") password_hash: Mapped[Optional[str]] = mapped_column(Text, nullable=True) role: Mapped[str] = mapped_column(String(20), nullable=False, default="user") status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending") created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) projects: Mapped[list["ProjectModel"]] = relationship( back_populates="owner", cascade="all, delete-orphan" ) invitation_tokens: Mapped[list["InvitationTokenModel"]] = relationship( back_populates="user", cascade="all, delete-orphan" ) password_reset_tokens: Mapped[list["PasswordResetTokenModel"]] = relationship( back_populates="user", cascade="all, delete-orphan" ) class ProjectModel(Base): __tablename__ = "projects" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) user_id: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=True ) name: Mapped[str] = mapped_column(String(255), nullable=False, default="Unnamed Project") status: Mapped[str] = mapped_column(String(50), nullable=False, default="created") analysis_summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True) simulation_requirement: Mapped[Optional[str]] = mapped_column(Text, nullable=True) chunk_size: Mapped[int] = mapped_column(Integer, default=500) chunk_overlap: Mapped[int] = mapped_column(Integer, default=50) active_task_id: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("tasks.id", ondelete="SET NULL"), nullable=True ) created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) owner: Mapped[Optional["UserModel"]] = relationship(back_populates="projects") files: Mapped[list["ProjectFileModel"]] = relationship( back_populates="project", cascade="all, delete-orphan" ) ontologies: Mapped[list["OntologyModel"]] = relationship( back_populates="project", cascade="all, delete-orphan" ) graphs: Mapped[list["GraphModel"]] = relationship( back_populates="project", cascade="all, delete-orphan" ) simulations: Mapped[list["SimulationModel"]] = relationship( back_populates="project", cascade="all, delete-orphan" ) reports: Mapped[list["ReportModel"]] = relationship( back_populates="project", cascade="all, delete-orphan" ) class ProjectFileModel(Base): __tablename__ = "project_files" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) project_id: Mapped[str] = mapped_column( String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False ) original_name: Mapped[str] = mapped_column(String(255), nullable=False) storage_path: Mapped[str] = mapped_column(Text, nullable=False) size: Mapped[int] = mapped_column(Integer, default=0) mime_type: Mapped[str] = mapped_column(String(100), default="application/octet-stream") file_type: Mapped[str] = mapped_column(String(30), default="upload") # upload | extracted_text created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) project: Mapped["ProjectModel"] = relationship(back_populates="files") class OntologyModel(Base): __tablename__ = "ontologies" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) project_id: Mapped[str] = mapped_column( String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False ) version: Mapped[int] = mapped_column(Integer, default=1) entity_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) edge_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) project: Mapped["ProjectModel"] = relationship(back_populates="ontologies") graphs: Mapped[list["GraphModel"]] = relationship(back_populates="ontology") class GraphModel(Base): __tablename__ = "graphs" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) project_id: Mapped[str] = mapped_column( String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False ) ontology_id: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("ontologies.id", ondelete="SET NULL"), nullable=True ) backend: Mapped[str] = mapped_column(String(20), default="zep") # zep | graphiti external_id: Mapped[Optional[str]] = mapped_column(Text, nullable=True) status: Mapped[str] = mapped_column(String(20), default="building") # building | ready | failed node_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) edge_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) project: Mapped["ProjectModel"] = relationship(back_populates="graphs") ontology: Mapped[Optional["OntologyModel"]] = relationship(back_populates="graphs") simulations: Mapped[list["SimulationModel"]] = relationship(back_populates="graph") reports: Mapped[list["ReportModel"]] = relationship(back_populates="graph") class SimulationModel(Base): __tablename__ = "simulations" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) project_id: Mapped[str] = mapped_column( String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False ) graph_id: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True ) status: Mapped[str] = mapped_column(String(30), default="prepared") platform: Mapped[str] = mapped_column(String(20), default="twitter") # twitter | reddit | both config: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) profiles_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) db_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) actions_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True) rounds_total: Mapped[Optional[int]] = mapped_column(Integer, nullable=True) rounds_completed: Mapped[int] = mapped_column(Integer, default=0) created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) project: Mapped["ProjectModel"] = relationship(back_populates="simulations") graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="simulations") reports: Mapped[list["ReportModel"]] = relationship(back_populates="simulation") class ReportModel(Base): __tablename__ = "reports" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) project_id: Mapped[str] = mapped_column( String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False ) simulation_id: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("simulations.id", ondelete="SET NULL"), nullable=True ) graph_id: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True ) status: Mapped[str] = mapped_column(String(30), default="generating") outline: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) storage_prefix: Mapped[Optional[str]] = mapped_column(Text, nullable=True) created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) project: Mapped["ProjectModel"] = relationship(back_populates="reports") simulation: Mapped[Optional["SimulationModel"]] = relationship(back_populates="reports") graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="reports") class TaskModel(Base): __tablename__ = "tasks" id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid) task_type: Mapped[str] = mapped_column(String(100), nullable=False) entity_type: Mapped[Optional[str]] = mapped_column(String(50), nullable=True) entity_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True) status: Mapped[str] = mapped_column(String(20), default="pending") progress: Mapped[int] = mapped_column(Integer, default=0) message: Mapped[Optional[str]] = mapped_column(Text, nullable=True) result: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) error: Mapped[Optional[str]] = mapped_column(Text, nullable=True) progress_detail: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True) created_at: Mapped[datetime] = mapped_column(DateTime, default=_now) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) class SystemConfigModel(Base): __tablename__ = "system_config" key: Mapped[str] = mapped_column(String(100), primary_key=True) value: Mapped[Optional[str]] = mapped_column(Text, nullable=True) value_type: Mapped[str] = mapped_column(String(20), default="string") group: Mapped[str] = mapped_column(String(50), default="general") label: Mapped[str] = mapped_column(String(255), default="") description: Mapped[str] = mapped_column(Text, default="") is_secret: Mapped[bool] = mapped_column(Boolean, default=False) updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now) updated_by: Mapped[Optional[str]] = mapped_column( String(36), ForeignKey("users.id", ondelete="SET NULL"), nullable=True ) class InvitationTokenModel(Base): __tablename__ = "invitation_tokens" token: Mapped[str] = mapped_column(String(36), primary_key=True) user_id: Mapped[str] = mapped_column( String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False ) expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False) used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True) user: Mapped["UserModel"] = relationship(back_populates="invitation_tokens") class PasswordResetTokenModel(Base): __tablename__ = "password_reset_tokens" token: Mapped[str] = mapped_column(String(36), primary_key=True) user_id: Mapped[str] = mapped_column( String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False ) expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False) used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True) user: Mapped["UserModel"] = relationship(back_populates="password_reset_tokens") ``` - [ ] **Step 3: Crear test dels models** ```python # backend/tests/test_db_models.py import pytest from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from backend.app.db import Base, init_db, get_session from backend.app.models.db_models import ( ProjectModel, TaskModel, OntologyModel, GraphModel, SimulationModel, ReportModel, UserModel ) @pytest.fixture def db_session(): """Sessió SQLite en memòria per a tests.""" from backend.app import db as db_module db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) Base.metadata.create_all(db_module._engine) session = db_module._SessionLocal() yield session session.close() Base.metadata.drop_all(db_module._engine) db_module._engine = None db_module._SessionLocal = None def test_create_project(db_session): proj = ProjectModel(id="proj-1", name="Test Project") db_session.add(proj) db_session.commit() result = db_session.get(ProjectModel, "proj-1") assert result.name == "Test Project" assert result.status == "created" assert result.chunk_size == 500 def test_create_task(db_session): task = TaskModel(id="task-1", task_type="graph_build", entity_type="project", entity_id="proj-1") db_session.add(task) db_session.commit() result = db_session.get(TaskModel, "task-1") assert result.status == "pending" assert result.progress == 0 def test_project_cascade_delete(db_session): proj = ProjectModel(id="proj-del", name="Del Project") db_session.add(proj) db_session.flush() ont = OntologyModel(id="ont-1", project_id="proj-del", version=1) db_session.add(ont) db_session.commit() db_session.delete(proj) db_session.commit() assert db_session.get(OntologyModel, "ont-1") is None def test_task_set_null_on_delete(db_session): task = TaskModel(id="task-del", task_type="graph_build") proj = ProjectModel(id="proj-2", name="P2", active_task_id="task-del") db_session.add_all([task, proj]) db_session.commit() db_session.delete(task) db_session.commit() db_session.expire(proj) refreshed = db_session.get(ProjectModel, "proj-2") assert refreshed.active_task_id is None def test_graph_linked_to_ontology(db_session): proj = ProjectModel(id="proj-g", name="Graph Project") ont = OntologyModel(id="ont-g", project_id="proj-g", version=1) graph = GraphModel(id="graph-1", project_id="proj-g", ontology_id="ont-g", backend="zep") db_session.add_all([proj, ont, graph]) db_session.commit() result = db_session.get(GraphModel, "graph-1") assert result.ontology_id == "ont-g" assert result.backend == "zep" ``` - [ ] **Step 4: Executar tests dels models** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia backend/.venv/bin/pytest backend/tests/test_db_models.py -v ``` Expected: 5 passed - [ ] **Step 5: Commit** ```bash git add backend/app/db.py backend/app/models/db_models.py backend/tests/test_db_models.py git commit -m "feat(db): add SQLAlchemy Base, session factory, and all ORM models" ``` --- ## Task 4: Configurar Alembic **Files:** - Create: `backend/alembic.ini` - Create: `backend/alembic/env.py` - Create: `backend/alembic/script.py.mako` - Create: `backend/alembic/versions/0001_initial_schema.py` - [ ] **Step 1: Inicialitzar Alembic** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend backend/.venv/bin/alembic init alembic ``` Expected: crea `alembic/` i `alembic.ini` - [ ] **Step 2: Actualitzar alembic.ini** Substituir la línia `sqlalchemy.url = ...` a `alembic.ini`: ```ini # Canviar aquesta línia: sqlalchemy.url = driver://user:pass@localhost/dbname # Per: sqlalchemy.url = sqlite:///mirofish_dev.db ``` I afegir just sota `[alembic]`: ```ini script_location = alembic ``` - [ ] **Step 3: Actualitzar alembic/env.py** Substituir el contingut complet d'`alembic/env.py`: ```python # backend/alembic/env.py import os import sys from logging.config import fileConfig from sqlalchemy import engine_from_config, pool from alembic import context # Afegir el backend al path perquè els imports funcionin sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) from app.db import Base import app.models.db_models # noqa: F401 — registra tots els models al Base config = context.config # Llegir DATABASE_URL de l'entorn (prioritat sobre alembic.ini) db_url = os.environ.get('DATABASE_URL', config.get_main_option('sqlalchemy.url')) config.set_main_option('sqlalchemy.url', db_url) if config.config_file_name is not None: fileConfig(config.config_file_name) target_metadata = Base.metadata def run_migrations_offline(): url = config.get_main_option("sqlalchemy.url") context.configure(url=url, target_metadata=target_metadata, literal_binds=True) with context.begin_transaction(): context.run_migrations() def run_migrations_online(): connectable = engine_from_config( config.get_section(config.config_ini_section, {}), prefix="sqlalchemy.", poolclass=pool.NullPool, ) with connectable.connect() as connection: context.configure(connection=connection, target_metadata=target_metadata) with context.begin_transaction(): context.run_migrations() if context.is_offline_mode(): run_migrations_offline() else: run_migrations_online() ``` - [ ] **Step 4: Generar migració inicial** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend backend/.venv/bin/alembic revision --autogenerate -m "initial_schema" ``` Expected: crea `alembic/versions/XXXX_initial_schema.py` amb totes les taules - [ ] **Step 5: Aplicar migració** ```bash backend/.venv/bin/alembic upgrade head ``` Expected: `Running upgrade -> XXXX, initial_schema` - [ ] **Step 6: Verificar que la BD té les taules** ```bash backend/.venv/bin/python -c " import sqlite3 conn = sqlite3.connect('mirofish_dev.db') tables = conn.execute(\"SELECT name FROM sqlite_master WHERE type='table'\").fetchall() print([t[0] for t in tables]) conn.close() " ``` Expected: llista que inclou `projects`, `tasks`, `users`, `ontologies`, `graphs`, `simulations`, `reports`, `system_config` - [ ] **Step 7: Commit** ```bash git add backend/alembic.ini backend/alembic/ backend/mirofish_dev.db git commit -m "feat(alembic): add initial schema migration for all SQLAlchemy models" ``` --- ## Task 5: Implementar StorageService **Files:** - Create: `backend/app/storage/__init__.py` - Create: `backend/app/storage/protocol.py` - Create: `backend/app/storage/local.py` - Create: `backend/app/storage/azure_blob.py` - Create: `backend/app/storage/factory.py` - Create: `backend/tests/test_storage.py` - [ ] **Step 1: Crear el directori i el Protocol** ```python # backend/app/storage/protocol.py """Interfície abstracta per a la capa de storage de fitxers.""" from typing import IO, Iterator, Protocol, runtime_checkable @runtime_checkable class StorageService(Protocol): def upload(self, path: str, data: bytes | IO, content_type: str = "application/octet-stream") -> None: ... def download(self, path: str) -> bytes: ... def download_stream(self, path: str) -> IO: ... def delete(self, path: str) -> None: ... def delete_prefix(self, prefix: str) -> None: """Esborra tots els fitxers que comencen per prefix.""" ... def exists(self, path: str) -> bool: ... def list(self, prefix: str = "") -> list[str]: """Retorna paths relatius sota el prefix.""" ... def public_url(self, path: str) -> str | None: """URL pública si el backend ho suporta, None si no.""" ... ``` - [ ] **Step 2: Crear LocalFSStorage** ```python # backend/app/storage/local.py """Adapter de storage per a filesystem local.""" import io import os import shutil from pathlib import Path from .protocol import StorageService class LocalFSStorage: """Implementació de StorageService per a filesystem local.""" def __init__(self, base_path: str) -> None: self._base = Path(base_path).resolve() self._base.mkdir(parents=True, exist_ok=True) def _safe_path(self, relative: str) -> Path: """Resol el path i valida que estigui dins del base per evitar path traversal.""" resolved = (self._base / relative).resolve() if not str(resolved).startswith(str(self._base)): raise ValueError(f"Path traversal detectat: {relative!r}") return resolved def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None: dest = self._safe_path(path) dest.parent.mkdir(parents=True, exist_ok=True) if isinstance(data, bytes): dest.write_bytes(data) else: with open(dest, "wb") as f: shutil.copyfileobj(data, f) def download(self, path: str) -> bytes: return self._safe_path(path).read_bytes() def download_stream(self, path: str) -> io.BytesIO: return io.BytesIO(self.download(path)) def delete(self, path: str) -> None: p = self._safe_path(path) if p.exists(): p.unlink() def delete_prefix(self, prefix: str) -> None: p = self._safe_path(prefix) if p.is_dir(): shutil.rmtree(p) elif p.exists(): p.unlink() def exists(self, path: str) -> bool: return self._safe_path(path).exists() def list(self, prefix: str = "") -> list[str]: base = self._safe_path(prefix) if prefix else self._base if not base.exists(): return [] result = [] for p in base.rglob("*"): if p.is_file(): result.append(str(p.relative_to(self._base))) return result def public_url(self, path: str) -> str | None: return None ``` - [ ] **Step 3: Crear AzureBlobStorage** ```python # backend/app/storage/azure_blob.py """Adapter de storage per a Azure Blob Storage.""" import io from .protocol import StorageService class AzureBlobStorage: """Implementació de StorageService per a Azure Blob Storage.""" def __init__(self, connection_string: str, container_name: str) -> None: from azure.storage.blob import BlobServiceClient self._client = BlobServiceClient.from_connection_string(connection_string) self._container = container_name self._ensure_container() def _ensure_container(self) -> None: container_client = self._client.get_container_client(self._container) if not container_client.exists(): container_client.create_container() def _blob_client(self, path: str): return self._client.get_blob_client(container=self._container, blob=path) def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None: blob = self._blob_client(path) if isinstance(data, bytes): blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type}) else: blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type}) def download(self, path: str) -> bytes: return self._blob_client(path).download_blob().readall() def download_stream(self, path: str) -> io.BytesIO: return io.BytesIO(self.download(path)) def delete(self, path: str) -> None: self._blob_client(path).delete_blob(delete_snapshots="include") def delete_prefix(self, prefix: str) -> None: container = self._client.get_container_client(self._container) blobs = container.list_blobs(name_starts_with=prefix) for blob in blobs: container.delete_blob(blob.name, delete_snapshots="include") def exists(self, path: str) -> bool: return self._blob_client(path).exists() def list(self, prefix: str = "") -> list[str]: container = self._client.get_container_client(self._container) return [b.name for b in container.list_blobs(name_starts_with=prefix)] def public_url(self, path: str) -> str | None: return self._blob_client(path).url ``` - [ ] **Step 4: Crear factory** ```python # backend/app/storage/factory.py """Selecciona la implementació de StorageService per STORAGE_TYPE.""" import os from .protocol import StorageService def create_storage_service() -> StorageService: storage_type = os.environ.get("STORAGE_TYPE", "local") match storage_type: case "azure": from .azure_blob import AzureBlobStorage conn_str = os.environ.get("AZURE_STORAGE_CONNECTION_STRING", "") container = os.environ.get("AZURE_STORAGE_CONTAINER", "mirofish") if not conn_str: raise RuntimeError("AZURE_STORAGE_CONNECTION_STRING no configurada per STORAGE_TYPE=azure") return AzureBlobStorage(conn_str, container) case _: from .local import LocalFSStorage base = os.environ.get("STORAGE_LOCAL_PATH", os.path.join(os.path.dirname(__file__), "../../../uploads")) return LocalFSStorage(base) ``` - [ ] **Step 5: Crear __init__.py del package** ```python # backend/app/storage/__init__.py from .protocol import StorageService from .factory import create_storage_service __all__ = ["StorageService", "create_storage_service"] ``` - [ ] **Step 6: Escriure tests de LocalFSStorage** ```python # backend/tests/test_storage.py import io import pytest import tempfile import os from backend.app.storage.local import LocalFSStorage @pytest.fixture def storage(tmp_path): return LocalFSStorage(str(tmp_path)) def test_upload_and_download_bytes(storage): storage.upload("foo/bar.txt", b"hello world", "text/plain") assert storage.download("foo/bar.txt") == b"hello world" def test_upload_and_download_stream(storage): data = io.BytesIO(b"stream data") storage.upload("test/stream.bin", data) result = storage.download("test/stream.bin") assert result == b"stream data" def test_exists(storage): assert not storage.exists("not/there.txt") storage.upload("yes.txt", b"x") assert storage.exists("yes.txt") def test_delete(storage): storage.upload("del.txt", b"bye") storage.delete("del.txt") assert not storage.exists("del.txt") def test_delete_prefix(storage): storage.upload("dir/a.txt", b"a") storage.upload("dir/b.txt", b"b") storage.delete_prefix("dir") assert not storage.exists("dir/a.txt") assert not storage.exists("dir/b.txt") def test_list(storage): storage.upload("root/x.txt", b"x") storage.upload("root/y.txt", b"y") paths = storage.list("root") assert len(paths) == 2 assert all("root" in p for p in paths) def test_path_traversal_blocked(storage): with pytest.raises(ValueError, match="Path traversal"): storage._safe_path("../../etc/passwd") def test_public_url_is_none(storage): storage.upload("f.txt", b"x") assert storage.public_url("f.txt") is None ``` - [ ] **Step 7: Executar tests de storage** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia backend/.venv/bin/pytest backend/tests/test_storage.py -v ``` Expected: 8 passed - [ ] **Step 8: Commit** ```bash git add backend/app/storage/ backend/tests/test_storage.py git commit -m "feat(storage): add StorageService protocol, LocalFSStorage, AzureBlobStorage, factory" ``` --- ## Task 6: Injectar DB i Storage a Flask **Files:** - Modify: `backend/app/__init__.py` - [ ] **Step 1: Actualitzar create_app per inicialitzar DB i Storage** Afegir just després de `app = Flask(__name__)` i `app.config.from_object(...)`: ```python # Inicialitzar BD from .db import init_db init_db(app.config['DATABASE_URL']) # Inicialitzar Storage from .storage import create_storage_service app.extensions['storage'] = create_storage_service() ``` I afegir una funció helper al final del fitxer (fora de `create_app`): ```python def get_storage(): """Accés al StorageService des de qualsevol context Flask.""" from flask import current_app return current_app.extensions['storage'] ``` - [ ] **Step 2: Verificar que l'app arrenca correctament** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia DATABASE_URL=sqlite:///test_startup.db STORAGE_TYPE=local \ backend/.venv/bin/python -c " from backend.app import create_app app = create_app() print('App created OK') print('Storage:', app.extensions.get('storage')) " ``` Expected: `App created OK` + `Storage: ` - [ ] **Step 3: Netejar fitxer de test** ```bash rm -f /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend/test_startup.db ``` - [ ] **Step 4: Commit** ```bash git add backend/app/__init__.py git commit -m "feat(app): inject SQLAlchemy DB and StorageService into Flask app factory" ``` --- ## Task 7: Refactoritzar TaskManager → BD **Files:** - Modify: `backend/app/models/task.py` - Create: `backend/tests/test_task_manager_db.py` El `TaskManager` actual és in-memory. El refactoritzem per usar la BD via `get_session()`. Mantenim la mateixa interfície pública (`create_task`, `get_task`, `update_task`, `complete_task`, `fail_task`, `list_tasks`) per no trencar cap cridador. - [ ] **Step 1: Escriure els tests del nou TaskManager** ```python # backend/tests/test_task_manager_db.py import pytest from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from backend.app.db import Base import backend.app.db as db_module from backend.app.models.db_models import TaskModel @pytest.fixture(autouse=True) def isolated_db(): """BD SQLite en memòria per a cada test.""" db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) Base.metadata.create_all(db_module._engine) yield Base.metadata.drop_all(db_module._engine) db_module._engine = None db_module._SessionLocal = None def test_create_and_get_task(): from backend.app.models.task import TaskManager tm = TaskManager() task_id = tm.create_task("graph_build", {"project_id": "proj-1"}) task = tm.get_task(task_id) assert task is not None assert task["task_type"] == "graph_build" assert task["status"] == "pending" assert task["progress"] == 0 def test_update_task_progress(): from backend.app.models.task import TaskManager tm = TaskManager() task_id = tm.create_task("ontology_generate") tm.update_task(task_id, progress=50, message="Halfway") task = tm.get_task(task_id) assert task["progress"] == 50 assert task["message"] == "Halfway" def test_complete_task(): from backend.app.models.task import TaskManager tm = TaskManager() task_id = tm.create_task("graph_build") tm.complete_task(task_id, {"graph_id": "g-1"}) task = tm.get_task(task_id) assert task["status"] == "completed" assert task["progress"] == 100 assert task["result"]["graph_id"] == "g-1" def test_fail_task(): from backend.app.models.task import TaskManager tm = TaskManager() task_id = tm.create_task("simulation_prepare") tm.fail_task(task_id, "LLM timeout") task = tm.get_task(task_id) assert task["status"] == "failed" assert task["error"] == "LLM timeout" def test_task_survives_new_manager_instance(): """La tasca ha d'estar a la BD, no a la memòria.""" from backend.app.models.task import TaskManager tm1 = TaskManager() task_id = tm1.create_task("graph_build") # Crear una nova instància (simula reinici) TaskManager._instance = None tm2 = TaskManager() task = tm2.get_task(task_id) assert task is not None assert task["task_id"] == task_id def test_list_tasks(): from backend.app.models.task import TaskManager tm = TaskManager() tm.create_task("graph_build") tm.create_task("graph_build") tm.create_task("ontology_generate") all_tasks = tm.list_tasks() assert len(all_tasks) == 3 graph_tasks = tm.list_tasks(task_type="graph_build") assert len(graph_tasks) == 2 ``` - [ ] **Step 2: Executar tests per verificar que fallen** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v ``` Expected: `test_task_survives_new_manager_instance` FAIL (perquè ara és in-memory) - [ ] **Step 3: Refactoritzar TaskManager** Substituir el contingut de `backend/app/models/task.py`: ```python """Task state management — persistent via SQLAlchemy.""" import uuid import threading from datetime import datetime from enum import Enum from typing import Dict, Any, Optional, List from ..db import get_session from ..models.db_models import TaskModel from ..utils.locale import t class TaskStatus(str, Enum): PENDING = "pending" PROCESSING = "processing" COMPLETED = "completed" FAILED = "failed" class TaskManager: """Task manager — thread-safe, persistent via SQLAlchemy.""" _instance = None _lock = threading.Lock() def __new__(cls): if cls._instance is None: with cls._lock: if cls._instance is None: cls._instance = super().__new__(cls) return cls._instance def create_task(self, task_type: str, metadata: Optional[Dict] = None) -> str: task_id = str(uuid.uuid4()) with get_session() as db: task = TaskModel( id=task_id, task_type=task_type, status="pending", progress=0, progress_detail=metadata or {}, ) db.add(task) db.commit() return task_id def get_task(self, task_id: str) -> Optional[Dict[str, Any]]: with get_session() as db: task = db.get(TaskModel, task_id) if task is None: return None return self._to_dict(task) def update_task( self, task_id: str, status: Optional[str] = None, progress: Optional[int] = None, message: Optional[str] = None, result: Optional[Dict] = None, error: Optional[str] = None, progress_detail: Optional[Dict] = None, ) -> None: with get_session() as db: task = db.get(TaskModel, task_id) if task is None: return if status is not None: task.status = status if progress is not None: task.progress = progress if message is not None: task.message = message if result is not None: task.result = result if error is not None: task.error = error if progress_detail is not None: task.progress_detail = progress_detail task.updated_at = datetime.utcnow() db.commit() def complete_task(self, task_id: str, result: Dict) -> None: self.update_task( task_id, status=TaskStatus.COMPLETED, progress=100, message=t("progress.taskComplete"), result=result, ) def fail_task(self, task_id: str, error: str) -> None: self.update_task( task_id, status=TaskStatus.FAILED, message=t("progress.taskFailed"), error=error, ) def list_tasks(self, task_type: Optional[str] = None) -> List[Dict[str, Any]]: from sqlalchemy import select, desc with get_session() as db: stmt = select(TaskModel).order_by(desc(TaskModel.created_at)) if task_type: stmt = stmt.where(TaskModel.task_type == task_type) tasks = db.execute(stmt).scalars().all() return [self._to_dict(t) for t in tasks] def cleanup_old_tasks(self, max_age_hours: int = 24) -> None: from datetime import timedelta from sqlalchemy import delete cutoff = datetime.utcnow() - timedelta(hours=max_age_hours) with get_session() as db: db.execute( delete(TaskModel).where( TaskModel.created_at < cutoff, TaskModel.status.in_(["completed", "failed"]), ) ) db.commit() @staticmethod def _to_dict(task: TaskModel) -> Dict[str, Any]: return { "task_id": task.id, "task_type": task.task_type, "status": task.status, "created_at": task.created_at.isoformat(), "updated_at": task.updated_at.isoformat(), "progress": task.progress, "message": task.message or "", "progress_detail": task.progress_detail or {}, "result": task.result, "error": task.error, "metadata": task.progress_detail or {}, } ``` **Nota:** `get_session()` ja és un context manager des del Task 3. Usa `with get_session() as db:` tal com es mostra al codi. - [ ] **Step 4: Executar tests del TaskManager** ```bash backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v ``` Expected: 6 passed - [ ] **Step 5: Commit** ```bash git add backend/app/models/task.py backend/app/db.py backend/tests/test_task_manager_db.py git commit -m "feat(task): refactor TaskManager to persist tasks in SQLAlchemy DB" ``` --- ## Task 8: Refactoritzar ProjectManager → BD + Storage **Files:** - Modify: `backend/app/models/project.py` - Create: `backend/tests/test_project_manager_db.py` Refactoritzem `ProjectManager` per usar la BD per a metadades i `StorageService` per a fitxers. Mantenim la mateixa interfície pública. - [ ] **Step 1: Escriure tests del nou ProjectManager** ```python # backend/tests/test_project_manager_db.py import io import pytest import tempfile from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from backend.app.db import Base import backend.app.db as db_module from backend.app.storage.local import LocalFSStorage @pytest.fixture(autouse=True) def isolated_db(tmp_path): db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) Base.metadata.create_all(db_module._engine) yield Base.metadata.drop_all(db_module._engine) db_module._engine = None db_module._SessionLocal = None @pytest.fixture def storage(tmp_path): return LocalFSStorage(str(tmp_path)) def test_create_project(storage): from backend.app.models.project import ProjectManager proj = ProjectManager.create_project("Test Project", storage=storage) assert proj["name"] == "Test Project" assert proj["status"] == "created" assert "id" in proj def test_get_project(storage): from backend.app.models.project import ProjectManager created = ProjectManager.create_project("My Project", storage=storage) fetched = ProjectManager.get_project(created["id"]) assert fetched is not None assert fetched["name"] == "My Project" def test_project_not_found(storage): from backend.app.models.project import ProjectManager result = ProjectManager.get_project("nonexistent-id") assert result is None def test_save_and_get_extracted_text(storage): from backend.app.models.project import ProjectManager proj = ProjectManager.create_project("Text Project", storage=storage) ProjectManager.save_extracted_text(proj["id"], "hello extracted", storage=storage) text = ProjectManager.get_extracted_text(proj["id"], storage=storage) assert text == "hello extracted" def test_project_survives_manager_reset(storage): """Les dades han d'estar a la BD, no a la memòria.""" from backend.app.models.project import ProjectManager created = ProjectManager.create_project("Persist Me", storage=storage) # Simular reinici: netejar l'estat en memòria si n'hi ha fetched = ProjectManager.get_project(created["id"]) assert fetched is not None def test_list_projects(storage): from backend.app.models.project import ProjectManager ProjectManager.create_project("P1", storage=storage) ProjectManager.create_project("P2", storage=storage) projects = ProjectManager.list_projects() assert len(projects) == 2 def test_delete_project(storage): from backend.app.models.project import ProjectManager proj = ProjectManager.create_project("Del Me", storage=storage) result = ProjectManager.delete_project(proj["id"], storage=storage) assert result is True assert ProjectManager.get_project(proj["id"]) is None ``` - [ ] **Step 2: Executar tests per verificar que fallen** ```bash backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v ``` Expected: errors (interfície actual no accepta `storage=` paràmetre) - [ ] **Step 3: Refactoritzar ProjectManager** Substituir el contingut de `backend/app/models/project.py`: ```python """Project context management — persistent via SQLAlchemy + StorageService.""" import uuid import io from datetime import datetime from typing import Dict, Any, List, Optional from enum import Enum from ..db import get_session from ..models.db_models import ProjectModel, ProjectFileModel class ProjectStatus(str, Enum): CREATED = "created" ONTOLOGY_GENERATED = "ontology_generated" GRAPH_BUILDING = "graph_building" GRAPH_COMPLETED = "graph_completed" FAILED = "failed" class ProjectManager: """Gestiona projectes: metadades a BD, fitxers a StorageService.""" @classmethod def create_project(cls, name: str = "Unnamed Project", storage=None) -> Dict[str, Any]: project_id = str(uuid.uuid4()) with get_session() as db: proj = ProjectModel(id=project_id, name=name, status="created") db.add(proj) db.commit() db.refresh(proj) return cls._to_dict(proj) @classmethod def get_project(cls, project_id: str) -> Optional[Dict[str, Any]]: with get_session() as db: proj = db.get(ProjectModel, project_id) if proj is None: return None return cls._to_dict(proj) @classmethod def save_project(cls, project_data: Dict[str, Any]) -> None: """Actualitza els camps d'un projecte existent.""" project_id = project_data.get("id") or project_data.get("project_id") with get_session() as db: proj = db.get(ProjectModel, project_id) if proj is None: return updatable = [ "name", "status", "analysis_summary", "simulation_requirement", "chunk_size", "chunk_overlap", "active_task_id", ] for field in updatable: if field in project_data: setattr(proj, field, project_data[field]) proj.updated_at = datetime.utcnow() db.commit() @classmethod def list_projects(cls, limit: int = 50) -> List[Dict[str, Any]]: from sqlalchemy import select, desc with get_session() as db: stmt = select(ProjectModel).order_by(desc(ProjectModel.created_at)).limit(limit) projects = db.execute(stmt).scalars().all() return [cls._to_dict(p) for p in projects] @classmethod def delete_project(cls, project_id: str, storage=None) -> bool: with get_session() as db: proj = db.get(ProjectModel, project_id) if proj is None: return False # Esborrar fitxers de storage si s'ha passat el servei if storage is not None: storage.delete_prefix(f"projects/{project_id}") db.delete(proj) db.commit() return True @classmethod def save_file_to_project( cls, project_id: str, file_storage, # Flask FileStorage original_filename: str, storage, ) -> Dict[str, Any]: import os ext = os.path.splitext(original_filename)[1].lower() safe_filename = f"{uuid.uuid4().hex[:8]}{ext}" storage_path = f"projects/{project_id}/files/{safe_filename}" data = file_storage.read() storage.upload(storage_path, data) mime_type = getattr(file_storage, "content_type", "application/octet-stream") or "application/octet-stream" with get_session() as db: file_rec = ProjectFileModel( id=str(uuid.uuid4()), project_id=project_id, original_name=original_filename, storage_path=storage_path, size=len(data), mime_type=mime_type, file_type="upload", ) db.add(file_rec) db.commit() return { "original_filename": original_filename, "saved_filename": safe_filename, "storage_path": storage_path, "size": len(data), } @classmethod def save_extracted_text(cls, project_id: str, text: str, storage) -> None: storage_path = f"projects/{project_id}/extracted_text.txt" storage.upload(storage_path, text.encode("utf-8"), "text/plain") with get_session() as db: from sqlalchemy import select stmt = select(ProjectFileModel).where( ProjectFileModel.project_id == project_id, ProjectFileModel.file_type == "extracted_text", ) existing = db.execute(stmt).scalar_one_or_none() if existing: existing.storage_path = storage_path existing.size = len(text.encode("utf-8")) else: rec = ProjectFileModel( id=str(uuid.uuid4()), project_id=project_id, original_name="extracted_text.txt", storage_path=storage_path, size=len(text.encode("utf-8")), mime_type="text/plain", file_type="extracted_text", ) db.add(rec) db.commit() @classmethod def get_extracted_text(cls, project_id: str, storage) -> Optional[str]: storage_path = f"projects/{project_id}/extracted_text.txt" if not storage.exists(storage_path): return None return storage.download(storage_path).decode("utf-8") @staticmethod def _to_dict(proj: ProjectModel) -> Dict[str, Any]: return { "id": proj.id, "project_id": proj.id, # compatibilitat amb codi existent "name": proj.name, "status": proj.status, "analysis_summary": proj.analysis_summary, "simulation_requirement": proj.simulation_requirement, "chunk_size": proj.chunk_size, "chunk_overlap": proj.chunk_overlap, "active_task_id": proj.active_task_id, "created_at": proj.created_at.isoformat(), "updated_at": proj.updated_at.isoformat(), # Camps llegits del model antic — ara buits per compatibilitat "files": [], "total_text_length": 0, "ontology": None, "graph_id": None, "graph_build_task_id": None, "error": None, } ``` - [ ] **Step 4: Executar tests del ProjectManager** ```bash backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v ``` Expected: 7 passed - [ ] **Step 5: Commit** ```bash git add backend/app/models/project.py backend/tests/test_project_manager_db.py git commit -m "feat(project): refactor ProjectManager to persist via SQLAlchemy + StorageService" ``` --- ## Task 9: Actualitzar tests existents i verificació final **Files:** - Modify: `backend/tests/conftest.py` - Modify: `backend/tests/test_project_task_recovery.py` (si afectat) - [ ] **Step 1: Actualitzar conftest.py per afegir fixtures globals** ```python # backend/tests/conftest.py import pytest from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from backend.app.db import Base import backend.app.db as db_module @pytest.fixture(autouse=True) def reset_graph_factory_singleton(): """Reset the graph backend singleton before each test.""" yield try: import backend.app.graph.factory as fmod fmod._backend_instance = None except ImportError: pass @pytest.fixture(autouse=True) def reset_task_manager_singleton(): """Reset TaskManager singleton between tests.""" from backend.app.models import task as task_module task_module.TaskManager._instance = None yield task_module.TaskManager._instance = None @pytest.fixture def in_memory_db(): """BD SQLite en memòria per a tests que necessiten BD.""" db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False}) db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False) Base.metadata.create_all(db_module._engine) yield db_module._engine Base.metadata.drop_all(db_module._engine) db_module._engine = None db_module._SessionLocal = None ``` - [ ] **Step 2: Executar tota la suite de tests** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia backend/.venv/bin/pytest backend/tests/ -v --tb=short 2>&1 | tail -30 ``` Expected: tots els tests del Task 2-8 passen. El test `test_config_graph_backend_default` pot continuar fallant (falla preexistent no relacionada). - [ ] **Step 3: Verificar que l'app arrenca i la BD es crea correctament** ```bash cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend DATABASE_URL=sqlite:///verify_startup.db \ STORAGE_TYPE=local \ STORAGE_LOCAL_PATH=/tmp/mirofish_test_uploads \ LLM_API_KEY=test-key \ ZEP_API_KEY=test-key \ .venv/bin/python -c " from app import create_app app = create_app() with app.app_context(): from app.models.project import ProjectManager from app.storage import create_storage_service storage = app.extensions['storage'] proj = ProjectManager.create_project('Startup Test', storage=storage) print('Project created:', proj['id']) fetched = ProjectManager.get_project(proj['id']) print('Project fetched:', fetched['name']) print('Verification OK') " rm -f verify_startup.db ``` Expected: `Verification OK` - [ ] **Step 4: Commit final de la Fase 1** ```bash git add backend/tests/conftest.py git commit -m "test(conftest): add in_memory_db and task manager singleton reset fixtures" git tag fase1-infraestructura-base ``` --- ## Verificació end-to-end de la Fase 1 ```bash # 1. Tots els tests passen backend/.venv/bin/pytest backend/tests/ -v # 2. La BD es crea amb les migracions backend/.venv/bin/alembic upgrade head # 3. L'app arrenca correctament DATABASE_URL=sqlite:///mirofish_dev.db STORAGE_TYPE=local LLM_API_KEY=x ZEP_API_KEY=x \ backend/.venv/bin/python backend/run.py & sleep 2 curl -s http://localhost:5001/health | python3 -m json.tool kill %1 ``` Expected final: `{"service": "MiroFish Backend", "status": "ok"}` --- > **Nota:** Les Fases 2 (Auth+RBAC), 3 (pipeline) i 4 (hardening producció) tindran els seus propis plans, escrits quan comenci cada fase.