MicroFish/docs/superpowers/plans/2026-04-26-persistencia-fas...

58 KiB

Persistència Fase 1: Infraestructura Base

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Substituir la persistència JSON+memòria per SQLAlchemy 2.x (SQLite dev / PostgreSQL prod) i un StorageService abstracte (LocalFS dev / Azure Blob prod), de manera que projectes i tasques sobrevisquin reinicis del servidor.

Architecture: S'afegeix una capa de BD via SQLAlchemy sota els Manager existents. ProjectManager i TaskManager es refactoritzen per llegir/escriure a la BD en comptes de JSON i memòria. L'StorageService substitueix totes les operacions directes de fitxers. L'app factory (create_app) injecta la sessió DB i el storage com a extensions Flask.

Tech Stack: SQLAlchemy 2.x, Alembic, flask-sqlalchemy, azure-storage-blob, bcrypt (per a fases posteriors, s'afegeix al pyproject.toml ara)


Mapa de fitxers

Nous fitxers a crear

Fitxer Responsabilitat
backend/app/db.py Engine SQLAlchemy, Base, get_db() session factory, init_db()
backend/app/models/db_models.py Tots els models SQLAlchemy (Project, ProjectFile, Ontology, Graph, Simulation, Report, Task, SystemConfig, User, InvitationToken, PasswordResetToken)
backend/app/storage/__init__.py Exporta StorageService, get_storage()
backend/app/storage/protocol.py StorageService Protocol (interfície)
backend/app/storage/local.py LocalFSStorage (pathlib)
backend/app/storage/azure_blob.py AzureBlobStorage (azure-storage-blob)
backend/app/storage/factory.py create_storage_service() — selecció per STORAGE_TYPE
backend/alembic.ini Config Alembic
backend/alembic/env.py Entorn Alembic (llegeix DATABASE_URL)
backend/alembic/versions/0001_initial_schema.py Migració inicial (totes les taules)
backend/tests/test_db_models.py Tests dels models SQLAlchemy
backend/tests/test_storage.py Tests de LocalFSStorage
backend/tests/test_project_manager_db.py Tests de ProjectManager amb BD
backend/tests/test_task_manager_db.py Tests de TaskManager amb BD

Fitxers a modificar

Fitxer Canvi
backend/pyproject.toml Afegir sqlalchemy, alembic, flask-sqlalchemy, azure-storage-blob, bcrypt, flask-jwt-extended
backend/app/config.py Afegir DATABASE_URL, STORAGE_TYPE, STORAGE_LOCAL_PATH, AZURE_STORAGE_*, JWT_SECRET, JWT_REFRESH_SECRET
backend/app/__init__.py Inicialitzar DB + Storage a create_app(); substituir auth provisional per flask-jwt-extended stub
backend/app/models/project.py Refactoritzar ProjectManager per usar BD + StorageService
backend/app/models/task.py Refactoritzar TaskManager per usar BD
backend/tests/conftest.py Afegir fixtures de BD en memòria i storage temporal

Task 1: Afegir dependències

Files:

  • Modify: backend/pyproject.toml

  • Step 1: Afegir dependències al pyproject.toml

# backend/pyproject.toml — secció dependencies, afegir:
    "sqlalchemy>=2.0.0",
    "alembic>=1.13.0",
    "flask-sqlalchemy>=3.1.0",
    "azure-storage-blob>=12.19.0",
    "bcrypt>=4.1.0",
    "flask-jwt-extended>=4.6.0",
  • Step 2: Instal·lar dependències
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
uv sync

Expected: sense errors. uv sync actualitza el .venv.

  • Step 3: Verificar importació
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
.venv/bin/python -c "import sqlalchemy; import alembic; import flask_sqlalchemy; print('OK')"

Expected: OK

  • Step 4: Commit
git add backend/pyproject.toml
git commit -m "chore(deps): add SQLAlchemy, Alembic, Azure Blob, bcrypt, flask-jwt-extended"

Task 2: Afegir variables de configuració

Files:

  • Modify: backend/app/config.py

  • Step 1: Escriure test de la nova configuració

Afegir a backend/tests/test_db_models.py (el fitxer el crearem al Task 3, però el test de config el posem a un fitxer nou):

# backend/tests/test_config.py
import os
import pytest


def test_database_url_default():
    """DATABASE_URL per defecte ha de ser SQLite"""
    from backend.app.config import Config
    assert Config.DATABASE_URL.startswith("sqlite")


def test_storage_type_default():
    from backend.app.config import Config
    assert Config.STORAGE_TYPE == "local"


def test_storage_local_path_exists():
    from backend.app.config import Config
    assert Config.STORAGE_LOCAL_PATH is not None
  • Step 2: Executar test per verificar que falla
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
.venv/bin/pytest backend/tests/test_config.py -v 2>/dev/null || \
backend/.venv/bin/pytest backend/tests/test_config.py -v

Expected: AttributeError: type object 'Config' has no attribute 'DATABASE_URL'

  • Step 3: Afegir configuració a config.py

Afegir al final de la classe Config (just abans del mètode validate):

    # ── Persistència ──────────────────────────────────────────────
    # Base de dades
    DATABASE_URL = os.environ.get('DATABASE_URL', 'sqlite:///mirofish_dev.db')
    
    # Storage de fitxers
    STORAGE_TYPE = os.environ.get('STORAGE_TYPE', 'local')          # local | azure
    STORAGE_LOCAL_PATH = os.environ.get(
        'STORAGE_LOCAL_PATH',
        os.path.join(os.path.dirname(__file__), '../uploads')
    )
    AZURE_STORAGE_CONNECTION_STRING = os.environ.get('AZURE_STORAGE_CONNECTION_STRING', '')
    AZURE_STORAGE_CONTAINER = os.environ.get('AZURE_STORAGE_CONTAINER', 'mirofish')
    
    # JWT (per a la Fase 2 d'autenticació — definits aquí perquè flask-jwt-extended els necessita en create_app)
    JWT_SECRET_KEY = os.environ.get('JWT_SECRET', 'change-me-in-production')
    JWT_REFRESH_SECRET_KEY = os.environ.get('JWT_REFRESH_SECRET', 'change-me-refresh-in-production')
    JWT_ACCESS_TOKEN_EXPIRES_HOURS = int(os.environ.get('JWT_ACCESS_TOKEN_EXPIRES_HOURS', '8'))
    JWT_REFRESH_TOKEN_EXPIRES_DAYS = int(os.environ.get('JWT_REFRESH_TOKEN_EXPIRES_DAYS', '7'))
  • Step 4: Executar test per verificar que passa
backend/.venv/bin/pytest backend/tests/test_config.py -v

Expected: 3 passed

  • Step 5: Commit
git add backend/app/config.py backend/tests/test_config.py
git commit -m "feat(config): add DATABASE_URL, STORAGE_TYPE, AZURE_STORAGE_*, JWT config vars"

Task 3: Crear models SQLAlchemy

Files:

  • Create: backend/app/models/db_models.py

  • Create: backend/app/db.py

  • Create: backend/tests/test_db_models.py

  • Step 1: Crear backend/app/db.py

# backend/app/db.py
"""SQLAlchemy engine, session factory i Base declarativa."""
from contextlib import contextmanager
from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
from typing import Generator


class Base(DeclarativeBase):
    pass


_engine = None
_SessionLocal = None


def init_db(database_url: str) -> None:
    global _engine, _SessionLocal
    connect_args = {"check_same_thread": False} if database_url.startswith("sqlite") else {}
    _engine = create_engine(database_url, connect_args=connect_args, echo=False)
    _SessionLocal = sessionmaker(bind=_engine, autocommit=False, autoflush=False)
    Base.metadata.create_all(_engine)


@contextmanager
def get_session() -> Generator[Session, None, None]:
    """Context manager de sessió SQLAlchemy."""
    if _SessionLocal is None:
        raise RuntimeError("Database not initialized. Call init_db() first.")
    db = _SessionLocal()
    try:
        yield db
    except Exception:
        db.rollback()
        raise
    finally:
        db.close()
  • Step 2: Crear backend/app/models/db_models.py
# backend/app/models/db_models.py
"""Models SQLAlchemy per a tota la persistència de MiroFish."""
import uuid
from datetime import datetime
from typing import Optional
from sqlalchemy import (
    String, Integer, Text, Boolean, DateTime, JSON,
    ForeignKey, UniqueConstraint
)
from sqlalchemy.orm import Mapped, mapped_column, relationship
from ..db import Base


def _uuid() -> str:
    return str(uuid.uuid4())


def _now() -> datetime:
    return datetime.utcnow()


class UserModel(Base):
    __tablename__ = "users"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    name: Mapped[str] = mapped_column(String(255), nullable=False, default="")
    password_hash: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    role: Mapped[str] = mapped_column(String(20), nullable=False, default="user")
    status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)

    projects: Mapped[list["ProjectModel"]] = relationship(
        back_populates="owner", cascade="all, delete-orphan"
    )
    invitation_tokens: Mapped[list["InvitationTokenModel"]] = relationship(
        back_populates="user", cascade="all, delete-orphan"
    )
    password_reset_tokens: Mapped[list["PasswordResetTokenModel"]] = relationship(
        back_populates="user", cascade="all, delete-orphan"
    )


class ProjectModel(Base):
    __tablename__ = "projects"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    user_id: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=True
    )
    name: Mapped[str] = mapped_column(String(255), nullable=False, default="Unnamed Project")
    status: Mapped[str] = mapped_column(String(50), nullable=False, default="created")
    analysis_summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    simulation_requirement: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    chunk_size: Mapped[int] = mapped_column(Integer, default=500)
    chunk_overlap: Mapped[int] = mapped_column(Integer, default=50)
    active_task_id: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("tasks.id", ondelete="SET NULL"), nullable=True
    )
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)

    owner: Mapped[Optional["UserModel"]] = relationship(back_populates="projects")
    files: Mapped[list["ProjectFileModel"]] = relationship(
        back_populates="project", cascade="all, delete-orphan"
    )
    ontologies: Mapped[list["OntologyModel"]] = relationship(
        back_populates="project", cascade="all, delete-orphan"
    )
    graphs: Mapped[list["GraphModel"]] = relationship(
        back_populates="project", cascade="all, delete-orphan"
    )
    simulations: Mapped[list["SimulationModel"]] = relationship(
        back_populates="project", cascade="all, delete-orphan"
    )
    reports: Mapped[list["ReportModel"]] = relationship(
        back_populates="project", cascade="all, delete-orphan"
    )


class ProjectFileModel(Base):
    __tablename__ = "project_files"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    project_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
    )
    original_name: Mapped[str] = mapped_column(String(255), nullable=False)
    storage_path: Mapped[str] = mapped_column(Text, nullable=False)
    size: Mapped[int] = mapped_column(Integer, default=0)
    mime_type: Mapped[str] = mapped_column(String(100), default="application/octet-stream")
    file_type: Mapped[str] = mapped_column(String(30), default="upload")  # upload | extracted_text
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)

    project: Mapped["ProjectModel"] = relationship(back_populates="files")


class OntologyModel(Base):
    __tablename__ = "ontologies"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    project_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
    )
    version: Mapped[int] = mapped_column(Integer, default=1)
    entity_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    edge_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)

    project: Mapped["ProjectModel"] = relationship(back_populates="ontologies")
    graphs: Mapped[list["GraphModel"]] = relationship(back_populates="ontology")


class GraphModel(Base):
    __tablename__ = "graphs"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    project_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
    )
    ontology_id: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("ontologies.id", ondelete="SET NULL"), nullable=True
    )
    backend: Mapped[str] = mapped_column(String(20), default="zep")  # zep | graphiti
    external_id: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    status: Mapped[str] = mapped_column(String(20), default="building")  # building | ready | failed
    node_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
    edge_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)

    project: Mapped["ProjectModel"] = relationship(back_populates="graphs")
    ontology: Mapped[Optional["OntologyModel"]] = relationship(back_populates="graphs")
    simulations: Mapped[list["SimulationModel"]] = relationship(back_populates="graph")
    reports: Mapped[list["ReportModel"]] = relationship(back_populates="graph")


class SimulationModel(Base):
    __tablename__ = "simulations"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    project_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
    )
    graph_id: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True
    )
    status: Mapped[str] = mapped_column(String(30), default="prepared")
    platform: Mapped[str] = mapped_column(String(20), default="twitter")  # twitter | reddit | both
    config: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    profiles_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    db_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    actions_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    rounds_total: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
    rounds_completed: Mapped[int] = mapped_column(Integer, default=0)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)

    project: Mapped["ProjectModel"] = relationship(back_populates="simulations")
    graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="simulations")
    reports: Mapped[list["ReportModel"]] = relationship(back_populates="simulation")


class ReportModel(Base):
    __tablename__ = "reports"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    project_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
    )
    simulation_id: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("simulations.id", ondelete="SET NULL"), nullable=True
    )
    graph_id: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True
    )
    status: Mapped[str] = mapped_column(String(30), default="generating")
    outline: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    storage_prefix: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)

    project: Mapped["ProjectModel"] = relationship(back_populates="reports")
    simulation: Mapped[Optional["SimulationModel"]] = relationship(back_populates="reports")
    graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="reports")


class TaskModel(Base):
    __tablename__ = "tasks"

    id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
    task_type: Mapped[str] = mapped_column(String(100), nullable=False)
    entity_type: Mapped[Optional[str]] = mapped_column(String(50), nullable=True)
    entity_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True)
    status: Mapped[str] = mapped_column(String(20), default="pending")
    progress: Mapped[int] = mapped_column(Integer, default=0)
    message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    result: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    error: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    progress_detail: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)


class SystemConfigModel(Base):
    __tablename__ = "system_config"

    key: Mapped[str] = mapped_column(String(100), primary_key=True)
    value: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    value_type: Mapped[str] = mapped_column(String(20), default="string")
    group: Mapped[str] = mapped_column(String(50), default="general")
    label: Mapped[str] = mapped_column(String(255), default="")
    description: Mapped[str] = mapped_column(Text, default="")
    is_secret: Mapped[bool] = mapped_column(Boolean, default=False)
    updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
    updated_by: Mapped[Optional[str]] = mapped_column(
        String(36), ForeignKey("users.id", ondelete="SET NULL"), nullable=True
    )


class InvitationTokenModel(Base):
    __tablename__ = "invitation_tokens"

    token: Mapped[str] = mapped_column(String(36), primary_key=True)
    user_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
    )
    expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False)
    used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True)

    user: Mapped["UserModel"] = relationship(back_populates="invitation_tokens")


class PasswordResetTokenModel(Base):
    __tablename__ = "password_reset_tokens"

    token: Mapped[str] = mapped_column(String(36), primary_key=True)
    user_id: Mapped[str] = mapped_column(
        String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
    )
    expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False)
    used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True)

    user: Mapped["UserModel"] = relationship(back_populates="password_reset_tokens")
  • Step 3: Crear test dels models
# backend/tests/test_db_models.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base, init_db, get_session
from backend.app.models.db_models import (
    ProjectModel, TaskModel, OntologyModel, GraphModel,
    SimulationModel, ReportModel, UserModel
)


@pytest.fixture
def db_session():
    """Sessió SQLite en memòria per a tests."""
    from backend.app import db as db_module
    db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
    db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
    Base.metadata.create_all(db_module._engine)
    session = db_module._SessionLocal()
    yield session
    session.close()
    Base.metadata.drop_all(db_module._engine)
    db_module._engine = None
    db_module._SessionLocal = None


def test_create_project(db_session):
    proj = ProjectModel(id="proj-1", name="Test Project")
    db_session.add(proj)
    db_session.commit()
    result = db_session.get(ProjectModel, "proj-1")
    assert result.name == "Test Project"
    assert result.status == "created"
    assert result.chunk_size == 500


def test_create_task(db_session):
    task = TaskModel(id="task-1", task_type="graph_build", entity_type="project", entity_id="proj-1")
    db_session.add(task)
    db_session.commit()
    result = db_session.get(TaskModel, "task-1")
    assert result.status == "pending"
    assert result.progress == 0


def test_project_cascade_delete(db_session):
    proj = ProjectModel(id="proj-del", name="Del Project")
    db_session.add(proj)
    db_session.flush()
    ont = OntologyModel(id="ont-1", project_id="proj-del", version=1)
    db_session.add(ont)
    db_session.commit()
    db_session.delete(proj)
    db_session.commit()
    assert db_session.get(OntologyModel, "ont-1") is None


def test_task_set_null_on_delete(db_session):
    task = TaskModel(id="task-del", task_type="graph_build")
    proj = ProjectModel(id="proj-2", name="P2", active_task_id="task-del")
    db_session.add_all([task, proj])
    db_session.commit()
    db_session.delete(task)
    db_session.commit()
    db_session.expire(proj)
    refreshed = db_session.get(ProjectModel, "proj-2")
    assert refreshed.active_task_id is None


def test_graph_linked_to_ontology(db_session):
    proj = ProjectModel(id="proj-g", name="Graph Project")
    ont = OntologyModel(id="ont-g", project_id="proj-g", version=1)
    graph = GraphModel(id="graph-1", project_id="proj-g", ontology_id="ont-g", backend="zep")
    db_session.add_all([proj, ont, graph])
    db_session.commit()
    result = db_session.get(GraphModel, "graph-1")
    assert result.ontology_id == "ont-g"
    assert result.backend == "zep"
  • Step 4: Executar tests dels models
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/test_db_models.py -v

Expected: 5 passed

  • Step 5: Commit
git add backend/app/db.py backend/app/models/db_models.py backend/tests/test_db_models.py
git commit -m "feat(db): add SQLAlchemy Base, session factory, and all ORM models"

Task 4: Configurar Alembic

Files:

  • Create: backend/alembic.ini

  • Create: backend/alembic/env.py

  • Create: backend/alembic/script.py.mako

  • Create: backend/alembic/versions/0001_initial_schema.py

  • Step 1: Inicialitzar Alembic

cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
backend/.venv/bin/alembic init alembic

Expected: crea alembic/ i alembic.ini

  • Step 2: Actualitzar alembic.ini

Substituir la línia sqlalchemy.url = ... a alembic.ini:

# Canviar aquesta línia:
sqlalchemy.url = driver://user:pass@localhost/dbname
# Per:
sqlalchemy.url = sqlite:///mirofish_dev.db

I afegir just sota [alembic]:

script_location = alembic
  • Step 3: Actualitzar alembic/env.py

Substituir el contingut complet d'alembic/env.py:

# backend/alembic/env.py
import os
import sys
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context

# Afegir el backend al path perquè els imports funcionin
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))

from app.db import Base
import app.models.db_models  # noqa: F401 — registra tots els models al Base

config = context.config

# Llegir DATABASE_URL de l'entorn (prioritat sobre alembic.ini)
db_url = os.environ.get('DATABASE_URL', config.get_main_option('sqlalchemy.url'))
config.set_main_option('sqlalchemy.url', db_url)

if config.config_file_name is not None:
    fileConfig(config.config_file_name)

target_metadata = Base.metadata


def run_migrations_offline():
    url = config.get_main_option("sqlalchemy.url")
    context.configure(url=url, target_metadata=target_metadata, literal_binds=True)
    with context.begin_transaction():
        context.run_migrations()


def run_migrations_online():
    connectable = engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )
    with connectable.connect() as connection:
        context.configure(connection=connection, target_metadata=target_metadata)
        with context.begin_transaction():
            context.run_migrations()


if context.is_offline_mode():
    run_migrations_offline()
else:
    run_migrations_online()
  • Step 4: Generar migració inicial
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
backend/.venv/bin/alembic revision --autogenerate -m "initial_schema"

Expected: crea alembic/versions/XXXX_initial_schema.py amb totes les taules

  • Step 5: Aplicar migració
backend/.venv/bin/alembic upgrade head

Expected: Running upgrade -> XXXX, initial_schema

  • Step 6: Verificar que la BD té les taules
backend/.venv/bin/python -c "
import sqlite3
conn = sqlite3.connect('mirofish_dev.db')
tables = conn.execute(\"SELECT name FROM sqlite_master WHERE type='table'\").fetchall()
print([t[0] for t in tables])
conn.close()
"

Expected: llista que inclou projects, tasks, users, ontologies, graphs, simulations, reports, system_config

  • Step 7: Commit
git add backend/alembic.ini backend/alembic/ backend/mirofish_dev.db
git commit -m "feat(alembic): add initial schema migration for all SQLAlchemy models"

Task 5: Implementar StorageService

Files:

  • Create: backend/app/storage/__init__.py

  • Create: backend/app/storage/protocol.py

  • Create: backend/app/storage/local.py

  • Create: backend/app/storage/azure_blob.py

  • Create: backend/app/storage/factory.py

  • Create: backend/tests/test_storage.py

  • Step 1: Crear el directori i el Protocol

# backend/app/storage/protocol.py
"""Interfície abstracta per a la capa de storage de fitxers."""
from typing import IO, Iterator, Protocol, runtime_checkable


@runtime_checkable
class StorageService(Protocol):
    def upload(self, path: str, data: bytes | IO, content_type: str = "application/octet-stream") -> None:
        ...

    def download(self, path: str) -> bytes:
        ...

    def download_stream(self, path: str) -> IO:
        ...

    def delete(self, path: str) -> None:
        ...

    def delete_prefix(self, prefix: str) -> None:
        """Esborra tots els fitxers que comencen per prefix."""
        ...

    def exists(self, path: str) -> bool:
        ...

    def list(self, prefix: str = "") -> list[str]:
        """Retorna paths relatius sota el prefix."""
        ...

    def public_url(self, path: str) -> str | None:
        """URL pública si el backend ho suporta, None si no."""
        ...
  • Step 2: Crear LocalFSStorage
# backend/app/storage/local.py
"""Adapter de storage per a filesystem local."""
import io
import os
import shutil
from pathlib import Path
from .protocol import StorageService


class LocalFSStorage:
    """Implementació de StorageService per a filesystem local."""

    def __init__(self, base_path: str) -> None:
        self._base = Path(base_path).resolve()
        self._base.mkdir(parents=True, exist_ok=True)

    def _safe_path(self, relative: str) -> Path:
        """Resol el path i valida que estigui dins del base per evitar path traversal."""
        resolved = (self._base / relative).resolve()
        if not str(resolved).startswith(str(self._base)):
            raise ValueError(f"Path traversal detectat: {relative!r}")
        return resolved

    def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None:
        dest = self._safe_path(path)
        dest.parent.mkdir(parents=True, exist_ok=True)
        if isinstance(data, bytes):
            dest.write_bytes(data)
        else:
            with open(dest, "wb") as f:
                shutil.copyfileobj(data, f)

    def download(self, path: str) -> bytes:
        return self._safe_path(path).read_bytes()

    def download_stream(self, path: str) -> io.BytesIO:
        return io.BytesIO(self.download(path))

    def delete(self, path: str) -> None:
        p = self._safe_path(path)
        if p.exists():
            p.unlink()

    def delete_prefix(self, prefix: str) -> None:
        p = self._safe_path(prefix)
        if p.is_dir():
            shutil.rmtree(p)
        elif p.exists():
            p.unlink()

    def exists(self, path: str) -> bool:
        return self._safe_path(path).exists()

    def list(self, prefix: str = "") -> list[str]:
        base = self._safe_path(prefix) if prefix else self._base
        if not base.exists():
            return []
        result = []
        for p in base.rglob("*"):
            if p.is_file():
                result.append(str(p.relative_to(self._base)))
        return result

    def public_url(self, path: str) -> str | None:
        return None
  • Step 3: Crear AzureBlobStorage
# backend/app/storage/azure_blob.py
"""Adapter de storage per a Azure Blob Storage."""
import io
from .protocol import StorageService


class AzureBlobStorage:
    """Implementació de StorageService per a Azure Blob Storage."""

    def __init__(self, connection_string: str, container_name: str) -> None:
        from azure.storage.blob import BlobServiceClient
        self._client = BlobServiceClient.from_connection_string(connection_string)
        self._container = container_name
        self._ensure_container()

    def _ensure_container(self) -> None:
        container_client = self._client.get_container_client(self._container)
        if not container_client.exists():
            container_client.create_container()

    def _blob_client(self, path: str):
        return self._client.get_blob_client(container=self._container, blob=path)

    def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None:
        blob = self._blob_client(path)
        if isinstance(data, bytes):
            blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type})
        else:
            blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type})

    def download(self, path: str) -> bytes:
        return self._blob_client(path).download_blob().readall()

    def download_stream(self, path: str) -> io.BytesIO:
        return io.BytesIO(self.download(path))

    def delete(self, path: str) -> None:
        self._blob_client(path).delete_blob(delete_snapshots="include")

    def delete_prefix(self, prefix: str) -> None:
        container = self._client.get_container_client(self._container)
        blobs = container.list_blobs(name_starts_with=prefix)
        for blob in blobs:
            container.delete_blob(blob.name, delete_snapshots="include")

    def exists(self, path: str) -> bool:
        return self._blob_client(path).exists()

    def list(self, prefix: str = "") -> list[str]:
        container = self._client.get_container_client(self._container)
        return [b.name for b in container.list_blobs(name_starts_with=prefix)]

    def public_url(self, path: str) -> str | None:
        return self._blob_client(path).url
  • Step 4: Crear factory
# backend/app/storage/factory.py
"""Selecciona la implementació de StorageService per STORAGE_TYPE."""
import os
from .protocol import StorageService


def create_storage_service() -> StorageService:
    storage_type = os.environ.get("STORAGE_TYPE", "local")
    match storage_type:
        case "azure":
            from .azure_blob import AzureBlobStorage
            conn_str = os.environ.get("AZURE_STORAGE_CONNECTION_STRING", "")
            container = os.environ.get("AZURE_STORAGE_CONTAINER", "mirofish")
            if not conn_str:
                raise RuntimeError("AZURE_STORAGE_CONNECTION_STRING no configurada per STORAGE_TYPE=azure")
            return AzureBlobStorage(conn_str, container)
        case _:
            from .local import LocalFSStorage
            base = os.environ.get("STORAGE_LOCAL_PATH",
                                   os.path.join(os.path.dirname(__file__), "../../../uploads"))
            return LocalFSStorage(base)
  • Step 5: Crear init.py del package
# backend/app/storage/__init__.py
from .protocol import StorageService
from .factory import create_storage_service

__all__ = ["StorageService", "create_storage_service"]
  • Step 6: Escriure tests de LocalFSStorage
# backend/tests/test_storage.py
import io
import pytest
import tempfile
import os
from backend.app.storage.local import LocalFSStorage


@pytest.fixture
def storage(tmp_path):
    return LocalFSStorage(str(tmp_path))


def test_upload_and_download_bytes(storage):
    storage.upload("foo/bar.txt", b"hello world", "text/plain")
    assert storage.download("foo/bar.txt") == b"hello world"


def test_upload_and_download_stream(storage):
    data = io.BytesIO(b"stream data")
    storage.upload("test/stream.bin", data)
    result = storage.download("test/stream.bin")
    assert result == b"stream data"


def test_exists(storage):
    assert not storage.exists("not/there.txt")
    storage.upload("yes.txt", b"x")
    assert storage.exists("yes.txt")


def test_delete(storage):
    storage.upload("del.txt", b"bye")
    storage.delete("del.txt")
    assert not storage.exists("del.txt")


def test_delete_prefix(storage):
    storage.upload("dir/a.txt", b"a")
    storage.upload("dir/b.txt", b"b")
    storage.delete_prefix("dir")
    assert not storage.exists("dir/a.txt")
    assert not storage.exists("dir/b.txt")


def test_list(storage):
    storage.upload("root/x.txt", b"x")
    storage.upload("root/y.txt", b"y")
    paths = storage.list("root")
    assert len(paths) == 2
    assert all("root" in p for p in paths)


def test_path_traversal_blocked(storage):
    with pytest.raises(ValueError, match="Path traversal"):
        storage._safe_path("../../etc/passwd")


def test_public_url_is_none(storage):
    storage.upload("f.txt", b"x")
    assert storage.public_url("f.txt") is None
  • Step 7: Executar tests de storage
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/test_storage.py -v

Expected: 8 passed

  • Step 8: Commit
git add backend/app/storage/ backend/tests/test_storage.py
git commit -m "feat(storage): add StorageService protocol, LocalFSStorage, AzureBlobStorage, factory"

Task 6: Injectar DB i Storage a Flask

Files:

  • Modify: backend/app/__init__.py

  • Step 1: Actualitzar create_app per inicialitzar DB i Storage

Afegir just després de app = Flask(__name__) i app.config.from_object(...):

    # Inicialitzar BD
    from .db import init_db
    init_db(app.config['DATABASE_URL'])

    # Inicialitzar Storage
    from .storage import create_storage_service
    app.extensions['storage'] = create_storage_service()

I afegir una funció helper al final del fitxer (fora de create_app):

def get_storage():
    """Accés al StorageService des de qualsevol context Flask."""
    from flask import current_app
    return current_app.extensions['storage']
  • Step 2: Verificar que l'app arrenca correctament
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
DATABASE_URL=sqlite:///test_startup.db STORAGE_TYPE=local \
  backend/.venv/bin/python -c "
from backend.app import create_app
app = create_app()
print('App created OK')
print('Storage:', app.extensions.get('storage'))
"

Expected: App created OK + Storage: <LocalFSStorage ...>

  • Step 3: Netejar fitxer de test
rm -f /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend/test_startup.db
  • Step 4: Commit
git add backend/app/__init__.py
git commit -m "feat(app): inject SQLAlchemy DB and StorageService into Flask app factory"

Task 7: Refactoritzar TaskManager → BD

Files:

  • Modify: backend/app/models/task.py
  • Create: backend/tests/test_task_manager_db.py

El TaskManager actual és in-memory. El refactoritzem per usar la BD via get_session(). Mantenim la mateixa interfície pública (create_task, get_task, update_task, complete_task, fail_task, list_tasks) per no trencar cap cridador.

  • Step 1: Escriure els tests del nou TaskManager
# backend/tests/test_task_manager_db.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base
import backend.app.db as db_module
from backend.app.models.db_models import TaskModel


@pytest.fixture(autouse=True)
def isolated_db():
    """BD SQLite en memòria per a cada test."""
    db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
    db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
    Base.metadata.create_all(db_module._engine)
    yield
    Base.metadata.drop_all(db_module._engine)
    db_module._engine = None
    db_module._SessionLocal = None


def test_create_and_get_task():
    from backend.app.models.task import TaskManager
    tm = TaskManager()
    task_id = tm.create_task("graph_build", {"project_id": "proj-1"})
    task = tm.get_task(task_id)
    assert task is not None
    assert task["task_type"] == "graph_build"
    assert task["status"] == "pending"
    assert task["progress"] == 0


def test_update_task_progress():
    from backend.app.models.task import TaskManager
    tm = TaskManager()
    task_id = tm.create_task("ontology_generate")
    tm.update_task(task_id, progress=50, message="Halfway")
    task = tm.get_task(task_id)
    assert task["progress"] == 50
    assert task["message"] == "Halfway"


def test_complete_task():
    from backend.app.models.task import TaskManager
    tm = TaskManager()
    task_id = tm.create_task("graph_build")
    tm.complete_task(task_id, {"graph_id": "g-1"})
    task = tm.get_task(task_id)
    assert task["status"] == "completed"
    assert task["progress"] == 100
    assert task["result"]["graph_id"] == "g-1"


def test_fail_task():
    from backend.app.models.task import TaskManager
    tm = TaskManager()
    task_id = tm.create_task("simulation_prepare")
    tm.fail_task(task_id, "LLM timeout")
    task = tm.get_task(task_id)
    assert task["status"] == "failed"
    assert task["error"] == "LLM timeout"


def test_task_survives_new_manager_instance():
    """La tasca ha d'estar a la BD, no a la memòria."""
    from backend.app.models.task import TaskManager
    tm1 = TaskManager()
    task_id = tm1.create_task("graph_build")
    # Crear una nova instància (simula reinici)
    TaskManager._instance = None
    tm2 = TaskManager()
    task = tm2.get_task(task_id)
    assert task is not None
    assert task["task_id"] == task_id


def test_list_tasks():
    from backend.app.models.task import TaskManager
    tm = TaskManager()
    tm.create_task("graph_build")
    tm.create_task("graph_build")
    tm.create_task("ontology_generate")
    all_tasks = tm.list_tasks()
    assert len(all_tasks) == 3
    graph_tasks = tm.list_tasks(task_type="graph_build")
    assert len(graph_tasks) == 2
  • Step 2: Executar tests per verificar que fallen
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v

Expected: test_task_survives_new_manager_instance FAIL (perquè ara és in-memory)

  • Step 3: Refactoritzar TaskManager

Substituir el contingut de backend/app/models/task.py:

"""Task state management — persistent via SQLAlchemy."""
import uuid
import threading
from datetime import datetime
from enum import Enum
from typing import Dict, Any, Optional, List

from ..db import get_session
from ..models.db_models import TaskModel
from ..utils.locale import t


class TaskStatus(str, Enum):
    PENDING = "pending"
    PROCESSING = "processing"
    COMPLETED = "completed"
    FAILED = "failed"


class TaskManager:
    """Task manager — thread-safe, persistent via SQLAlchemy."""

    _instance = None
    _lock = threading.Lock()

    def __new__(cls):
        if cls._instance is None:
            with cls._lock:
                if cls._instance is None:
                    cls._instance = super().__new__(cls)
        return cls._instance

    def create_task(self, task_type: str, metadata: Optional[Dict] = None) -> str:
        task_id = str(uuid.uuid4())
        with get_session() as db:
            task = TaskModel(
                id=task_id,
                task_type=task_type,
                status="pending",
                progress=0,
                progress_detail=metadata or {},
            )
            db.add(task)
            db.commit()
        return task_id

    def get_task(self, task_id: str) -> Optional[Dict[str, Any]]:
        with get_session() as db:
            task = db.get(TaskModel, task_id)
            if task is None:
                return None
            return self._to_dict(task)

    def update_task(
        self,
        task_id: str,
        status: Optional[str] = None,
        progress: Optional[int] = None,
        message: Optional[str] = None,
        result: Optional[Dict] = None,
        error: Optional[str] = None,
        progress_detail: Optional[Dict] = None,
    ) -> None:
        with get_session() as db:
            task = db.get(TaskModel, task_id)
            if task is None:
                return
            if status is not None:
                task.status = status
            if progress is not None:
                task.progress = progress
            if message is not None:
                task.message = message
            if result is not None:
                task.result = result
            if error is not None:
                task.error = error
            if progress_detail is not None:
                task.progress_detail = progress_detail
            task.updated_at = datetime.utcnow()
            db.commit()

    def complete_task(self, task_id: str, result: Dict) -> None:
        self.update_task(
            task_id,
            status=TaskStatus.COMPLETED,
            progress=100,
            message=t("progress.taskComplete"),
            result=result,
        )

    def fail_task(self, task_id: str, error: str) -> None:
        self.update_task(
            task_id,
            status=TaskStatus.FAILED,
            message=t("progress.taskFailed"),
            error=error,
        )

    def list_tasks(self, task_type: Optional[str] = None) -> List[Dict[str, Any]]:
        from sqlalchemy import select, desc
        with get_session() as db:
            stmt = select(TaskModel).order_by(desc(TaskModel.created_at))
            if task_type:
                stmt = stmt.where(TaskModel.task_type == task_type)
            tasks = db.execute(stmt).scalars().all()
            return [self._to_dict(t) for t in tasks]

    def cleanup_old_tasks(self, max_age_hours: int = 24) -> None:
        from datetime import timedelta
        from sqlalchemy import delete
        cutoff = datetime.utcnow() - timedelta(hours=max_age_hours)
        with get_session() as db:
            db.execute(
                delete(TaskModel).where(
                    TaskModel.created_at < cutoff,
                    TaskModel.status.in_(["completed", "failed"]),
                )
            )
            db.commit()

    @staticmethod
    def _to_dict(task: TaskModel) -> Dict[str, Any]:
        return {
            "task_id": task.id,
            "task_type": task.task_type,
            "status": task.status,
            "created_at": task.created_at.isoformat(),
            "updated_at": task.updated_at.isoformat(),
            "progress": task.progress,
            "message": task.message or "",
            "progress_detail": task.progress_detail or {},
            "result": task.result,
            "error": task.error,
            "metadata": task.progress_detail or {},
        }

Nota: get_session() ja és un context manager des del Task 3. Usa with get_session() as db: tal com es mostra al codi.

  • Step 4: Executar tests del TaskManager
backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v

Expected: 6 passed

  • Step 5: Commit
git add backend/app/models/task.py backend/app/db.py backend/tests/test_task_manager_db.py
git commit -m "feat(task): refactor TaskManager to persist tasks in SQLAlchemy DB"

Task 8: Refactoritzar ProjectManager → BD + Storage

Files:

  • Modify: backend/app/models/project.py
  • Create: backend/tests/test_project_manager_db.py

Refactoritzem ProjectManager per usar la BD per a metadades i StorageService per a fitxers. Mantenim la mateixa interfície pública.

  • Step 1: Escriure tests del nou ProjectManager
# backend/tests/test_project_manager_db.py
import io
import pytest
import tempfile
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base
import backend.app.db as db_module
from backend.app.storage.local import LocalFSStorage


@pytest.fixture(autouse=True)
def isolated_db(tmp_path):
    db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
    db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
    Base.metadata.create_all(db_module._engine)
    yield
    Base.metadata.drop_all(db_module._engine)
    db_module._engine = None
    db_module._SessionLocal = None


@pytest.fixture
def storage(tmp_path):
    return LocalFSStorage(str(tmp_path))


def test_create_project(storage):
    from backend.app.models.project import ProjectManager
    proj = ProjectManager.create_project("Test Project", storage=storage)
    assert proj["name"] == "Test Project"
    assert proj["status"] == "created"
    assert "id" in proj


def test_get_project(storage):
    from backend.app.models.project import ProjectManager
    created = ProjectManager.create_project("My Project", storage=storage)
    fetched = ProjectManager.get_project(created["id"])
    assert fetched is not None
    assert fetched["name"] == "My Project"


def test_project_not_found(storage):
    from backend.app.models.project import ProjectManager
    result = ProjectManager.get_project("nonexistent-id")
    assert result is None


def test_save_and_get_extracted_text(storage):
    from backend.app.models.project import ProjectManager
    proj = ProjectManager.create_project("Text Project", storage=storage)
    ProjectManager.save_extracted_text(proj["id"], "hello extracted", storage=storage)
    text = ProjectManager.get_extracted_text(proj["id"], storage=storage)
    assert text == "hello extracted"


def test_project_survives_manager_reset(storage):
    """Les dades han d'estar a la BD, no a la memòria."""
    from backend.app.models.project import ProjectManager
    created = ProjectManager.create_project("Persist Me", storage=storage)
    # Simular reinici: netejar l'estat en memòria si n'hi ha
    fetched = ProjectManager.get_project(created["id"])
    assert fetched is not None


def test_list_projects(storage):
    from backend.app.models.project import ProjectManager
    ProjectManager.create_project("P1", storage=storage)
    ProjectManager.create_project("P2", storage=storage)
    projects = ProjectManager.list_projects()
    assert len(projects) == 2


def test_delete_project(storage):
    from backend.app.models.project import ProjectManager
    proj = ProjectManager.create_project("Del Me", storage=storage)
    result = ProjectManager.delete_project(proj["id"], storage=storage)
    assert result is True
    assert ProjectManager.get_project(proj["id"]) is None
  • Step 2: Executar tests per verificar que fallen
backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v

Expected: errors (interfície actual no accepta storage= paràmetre)

  • Step 3: Refactoritzar ProjectManager

Substituir el contingut de backend/app/models/project.py:

"""Project context management — persistent via SQLAlchemy + StorageService."""
import uuid
import io
from datetime import datetime
from typing import Dict, Any, List, Optional
from enum import Enum

from ..db import get_session
from ..models.db_models import ProjectModel, ProjectFileModel


class ProjectStatus(str, Enum):
    CREATED = "created"
    ONTOLOGY_GENERATED = "ontology_generated"
    GRAPH_BUILDING = "graph_building"
    GRAPH_COMPLETED = "graph_completed"
    FAILED = "failed"


class ProjectManager:
    """Gestiona projectes: metadades a BD, fitxers a StorageService."""

    @classmethod
    def create_project(cls, name: str = "Unnamed Project", storage=None) -> Dict[str, Any]:
        project_id = str(uuid.uuid4())
        with get_session() as db:
            proj = ProjectModel(id=project_id, name=name, status="created")
            db.add(proj)
            db.commit()
            db.refresh(proj)
            return cls._to_dict(proj)

    @classmethod
    def get_project(cls, project_id: str) -> Optional[Dict[str, Any]]:
        with get_session() as db:
            proj = db.get(ProjectModel, project_id)
            if proj is None:
                return None
            return cls._to_dict(proj)

    @classmethod
    def save_project(cls, project_data: Dict[str, Any]) -> None:
        """Actualitza els camps d'un projecte existent."""
        project_id = project_data.get("id") or project_data.get("project_id")
        with get_session() as db:
            proj = db.get(ProjectModel, project_id)
            if proj is None:
                return
            updatable = [
                "name", "status", "analysis_summary", "simulation_requirement",
                "chunk_size", "chunk_overlap", "active_task_id",
            ]
            for field in updatable:
                if field in project_data:
                    setattr(proj, field, project_data[field])
            proj.updated_at = datetime.utcnow()
            db.commit()

    @classmethod
    def list_projects(cls, limit: int = 50) -> List[Dict[str, Any]]:
        from sqlalchemy import select, desc
        with get_session() as db:
            stmt = select(ProjectModel).order_by(desc(ProjectModel.created_at)).limit(limit)
            projects = db.execute(stmt).scalars().all()
            return [cls._to_dict(p) for p in projects]

    @classmethod
    def delete_project(cls, project_id: str, storage=None) -> bool:
        with get_session() as db:
            proj = db.get(ProjectModel, project_id)
            if proj is None:
                return False
            # Esborrar fitxers de storage si s'ha passat el servei
            if storage is not None:
                storage.delete_prefix(f"projects/{project_id}")
            db.delete(proj)
            db.commit()
        return True

    @classmethod
    def save_file_to_project(
        cls,
        project_id: str,
        file_storage,  # Flask FileStorage
        original_filename: str,
        storage,
    ) -> Dict[str, Any]:
        import os
        ext = os.path.splitext(original_filename)[1].lower()
        safe_filename = f"{uuid.uuid4().hex[:8]}{ext}"
        storage_path = f"projects/{project_id}/files/{safe_filename}"

        data = file_storage.read()
        storage.upload(storage_path, data)

        mime_type = getattr(file_storage, "content_type", "application/octet-stream") or "application/octet-stream"

        with get_session() as db:
            file_rec = ProjectFileModel(
                id=str(uuid.uuid4()),
                project_id=project_id,
                original_name=original_filename,
                storage_path=storage_path,
                size=len(data),
                mime_type=mime_type,
                file_type="upload",
            )
            db.add(file_rec)
            db.commit()

        return {
            "original_filename": original_filename,
            "saved_filename": safe_filename,
            "storage_path": storage_path,
            "size": len(data),
        }

    @classmethod
    def save_extracted_text(cls, project_id: str, text: str, storage) -> None:
        storage_path = f"projects/{project_id}/extracted_text.txt"
        storage.upload(storage_path, text.encode("utf-8"), "text/plain")

        with get_session() as db:
            from sqlalchemy import select
            stmt = select(ProjectFileModel).where(
                ProjectFileModel.project_id == project_id,
                ProjectFileModel.file_type == "extracted_text",
            )
            existing = db.execute(stmt).scalar_one_or_none()
            if existing:
                existing.storage_path = storage_path
                existing.size = len(text.encode("utf-8"))
            else:
                rec = ProjectFileModel(
                    id=str(uuid.uuid4()),
                    project_id=project_id,
                    original_name="extracted_text.txt",
                    storage_path=storage_path,
                    size=len(text.encode("utf-8")),
                    mime_type="text/plain",
                    file_type="extracted_text",
                )
                db.add(rec)
            db.commit()

    @classmethod
    def get_extracted_text(cls, project_id: str, storage) -> Optional[str]:
        storage_path = f"projects/{project_id}/extracted_text.txt"
        if not storage.exists(storage_path):
            return None
        return storage.download(storage_path).decode("utf-8")

    @staticmethod
    def _to_dict(proj: ProjectModel) -> Dict[str, Any]:
        return {
            "id": proj.id,
            "project_id": proj.id,  # compatibilitat amb codi existent
            "name": proj.name,
            "status": proj.status,
            "analysis_summary": proj.analysis_summary,
            "simulation_requirement": proj.simulation_requirement,
            "chunk_size": proj.chunk_size,
            "chunk_overlap": proj.chunk_overlap,
            "active_task_id": proj.active_task_id,
            "created_at": proj.created_at.isoformat(),
            "updated_at": proj.updated_at.isoformat(),
            # Camps llegits del model antic — ara buits per compatibilitat
            "files": [],
            "total_text_length": 0,
            "ontology": None,
            "graph_id": None,
            "graph_build_task_id": None,
            "error": None,
        }
  • Step 4: Executar tests del ProjectManager
backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v

Expected: 7 passed

  • Step 5: Commit
git add backend/app/models/project.py backend/tests/test_project_manager_db.py
git commit -m "feat(project): refactor ProjectManager to persist via SQLAlchemy + StorageService"

Task 9: Actualitzar tests existents i verificació final

Files:

  • Modify: backend/tests/conftest.py

  • Modify: backend/tests/test_project_task_recovery.py (si afectat)

  • Step 1: Actualitzar conftest.py per afegir fixtures globals

# backend/tests/conftest.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base
import backend.app.db as db_module


@pytest.fixture(autouse=True)
def reset_graph_factory_singleton():
    """Reset the graph backend singleton before each test."""
    yield
    try:
        import backend.app.graph.factory as fmod
        fmod._backend_instance = None
    except ImportError:
        pass


@pytest.fixture(autouse=True)
def reset_task_manager_singleton():
    """Reset TaskManager singleton between tests."""
    from backend.app.models import task as task_module
    task_module.TaskManager._instance = None
    yield
    task_module.TaskManager._instance = None


@pytest.fixture
def in_memory_db():
    """BD SQLite en memòria per a tests que necessiten BD."""
    db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
    db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
    Base.metadata.create_all(db_module._engine)
    yield db_module._engine
    Base.metadata.drop_all(db_module._engine)
    db_module._engine = None
    db_module._SessionLocal = None
  • Step 2: Executar tota la suite de tests
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/ -v --tb=short 2>&1 | tail -30

Expected: tots els tests del Task 2-8 passen. El test test_config_graph_backend_default pot continuar fallant (falla preexistent no relacionada).

  • Step 3: Verificar que l'app arrenca i la BD es crea correctament
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
DATABASE_URL=sqlite:///verify_startup.db \
STORAGE_TYPE=local \
STORAGE_LOCAL_PATH=/tmp/mirofish_test_uploads \
LLM_API_KEY=test-key \
ZEP_API_KEY=test-key \
  .venv/bin/python -c "
from app import create_app
app = create_app()
with app.app_context():
    from app.models.project import ProjectManager
    from app.storage import create_storage_service
    storage = app.extensions['storage']
    proj = ProjectManager.create_project('Startup Test', storage=storage)
    print('Project created:', proj['id'])
    fetched = ProjectManager.get_project(proj['id'])
    print('Project fetched:', fetched['name'])
    print('Verification OK')
"
rm -f verify_startup.db

Expected: Verification OK

  • Step 4: Commit final de la Fase 1
git add backend/tests/conftest.py
git commit -m "test(conftest): add in_memory_db and task manager singleton reset fixtures"

git tag fase1-infraestructura-base

Verificació end-to-end de la Fase 1

# 1. Tots els tests passen
backend/.venv/bin/pytest backend/tests/ -v

# 2. La BD es crea amb les migracions
backend/.venv/bin/alembic upgrade head

# 3. L'app arrenca correctament
DATABASE_URL=sqlite:///mirofish_dev.db STORAGE_TYPE=local LLM_API_KEY=x ZEP_API_KEY=x \
  backend/.venv/bin/python backend/run.py &
sleep 2
curl -s http://localhost:5001/health | python3 -m json.tool
kill %1

Expected final: {"service": "MiroFish Backend", "status": "ok"}


Nota: Les Fases 2 (Auth+RBAC), 3 (pipeline) i 4 (hardening producció) tindran els seus propis plans, escrits quan comenci cada fase.