58 KiB
Persistència Fase 1: Infraestructura Base
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Substituir la persistència JSON+memòria per SQLAlchemy 2.x (SQLite dev / PostgreSQL prod) i un StorageService abstracte (LocalFS dev / Azure Blob prod), de manera que projectes i tasques sobrevisquin reinicis del servidor.
Architecture: S'afegeix una capa de BD via SQLAlchemy sota els Manager existents. ProjectManager i TaskManager es refactoritzen per llegir/escriure a la BD en comptes de JSON i memòria. L'StorageService substitueix totes les operacions directes de fitxers. L'app factory (create_app) injecta la sessió DB i el storage com a extensions Flask.
Tech Stack: SQLAlchemy 2.x, Alembic, flask-sqlalchemy, azure-storage-blob, bcrypt (per a fases posteriors, s'afegeix al pyproject.toml ara)
Mapa de fitxers
Nous fitxers a crear
| Fitxer | Responsabilitat |
|---|---|
backend/app/db.py |
Engine SQLAlchemy, Base, get_db() session factory, init_db() |
backend/app/models/db_models.py |
Tots els models SQLAlchemy (Project, ProjectFile, Ontology, Graph, Simulation, Report, Task, SystemConfig, User, InvitationToken, PasswordResetToken) |
backend/app/storage/__init__.py |
Exporta StorageService, get_storage() |
backend/app/storage/protocol.py |
StorageService Protocol (interfície) |
backend/app/storage/local.py |
LocalFSStorage (pathlib) |
backend/app/storage/azure_blob.py |
AzureBlobStorage (azure-storage-blob) |
backend/app/storage/factory.py |
create_storage_service() — selecció per STORAGE_TYPE |
backend/alembic.ini |
Config Alembic |
backend/alembic/env.py |
Entorn Alembic (llegeix DATABASE_URL) |
backend/alembic/versions/0001_initial_schema.py |
Migració inicial (totes les taules) |
backend/tests/test_db_models.py |
Tests dels models SQLAlchemy |
backend/tests/test_storage.py |
Tests de LocalFSStorage |
backend/tests/test_project_manager_db.py |
Tests de ProjectManager amb BD |
backend/tests/test_task_manager_db.py |
Tests de TaskManager amb BD |
Fitxers a modificar
| Fitxer | Canvi |
|---|---|
backend/pyproject.toml |
Afegir sqlalchemy, alembic, flask-sqlalchemy, azure-storage-blob, bcrypt, flask-jwt-extended |
backend/app/config.py |
Afegir DATABASE_URL, STORAGE_TYPE, STORAGE_LOCAL_PATH, AZURE_STORAGE_*, JWT_SECRET, JWT_REFRESH_SECRET |
backend/app/__init__.py |
Inicialitzar DB + Storage a create_app(); substituir auth provisional per flask-jwt-extended stub |
backend/app/models/project.py |
Refactoritzar ProjectManager per usar BD + StorageService |
backend/app/models/task.py |
Refactoritzar TaskManager per usar BD |
backend/tests/conftest.py |
Afegir fixtures de BD en memòria i storage temporal |
Task 1: Afegir dependències
Files:
-
Modify:
backend/pyproject.toml -
Step 1: Afegir dependències al pyproject.toml
# backend/pyproject.toml — secció dependencies, afegir:
"sqlalchemy>=2.0.0",
"alembic>=1.13.0",
"flask-sqlalchemy>=3.1.0",
"azure-storage-blob>=12.19.0",
"bcrypt>=4.1.0",
"flask-jwt-extended>=4.6.0",
- Step 2: Instal·lar dependències
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
uv sync
Expected: sense errors. uv sync actualitza el .venv.
- Step 3: Verificar importació
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
.venv/bin/python -c "import sqlalchemy; import alembic; import flask_sqlalchemy; print('OK')"
Expected: OK
- Step 4: Commit
git add backend/pyproject.toml
git commit -m "chore(deps): add SQLAlchemy, Alembic, Azure Blob, bcrypt, flask-jwt-extended"
Task 2: Afegir variables de configuració
Files:
-
Modify:
backend/app/config.py -
Step 1: Escriure test de la nova configuració
Afegir a backend/tests/test_db_models.py (el fitxer el crearem al Task 3, però el test de config el posem a un fitxer nou):
# backend/tests/test_config.py
import os
import pytest
def test_database_url_default():
"""DATABASE_URL per defecte ha de ser SQLite"""
from backend.app.config import Config
assert Config.DATABASE_URL.startswith("sqlite")
def test_storage_type_default():
from backend.app.config import Config
assert Config.STORAGE_TYPE == "local"
def test_storage_local_path_exists():
from backend.app.config import Config
assert Config.STORAGE_LOCAL_PATH is not None
- Step 2: Executar test per verificar que falla
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
.venv/bin/pytest backend/tests/test_config.py -v 2>/dev/null || \
backend/.venv/bin/pytest backend/tests/test_config.py -v
Expected: AttributeError: type object 'Config' has no attribute 'DATABASE_URL'
- Step 3: Afegir configuració a config.py
Afegir al final de la classe Config (just abans del mètode validate):
# ── Persistència ──────────────────────────────────────────────
# Base de dades
DATABASE_URL = os.environ.get('DATABASE_URL', 'sqlite:///mirofish_dev.db')
# Storage de fitxers
STORAGE_TYPE = os.environ.get('STORAGE_TYPE', 'local') # local | azure
STORAGE_LOCAL_PATH = os.environ.get(
'STORAGE_LOCAL_PATH',
os.path.join(os.path.dirname(__file__), '../uploads')
)
AZURE_STORAGE_CONNECTION_STRING = os.environ.get('AZURE_STORAGE_CONNECTION_STRING', '')
AZURE_STORAGE_CONTAINER = os.environ.get('AZURE_STORAGE_CONTAINER', 'mirofish')
# JWT (per a la Fase 2 d'autenticació — definits aquí perquè flask-jwt-extended els necessita en create_app)
JWT_SECRET_KEY = os.environ.get('JWT_SECRET', 'change-me-in-production')
JWT_REFRESH_SECRET_KEY = os.environ.get('JWT_REFRESH_SECRET', 'change-me-refresh-in-production')
JWT_ACCESS_TOKEN_EXPIRES_HOURS = int(os.environ.get('JWT_ACCESS_TOKEN_EXPIRES_HOURS', '8'))
JWT_REFRESH_TOKEN_EXPIRES_DAYS = int(os.environ.get('JWT_REFRESH_TOKEN_EXPIRES_DAYS', '7'))
- Step 4: Executar test per verificar que passa
backend/.venv/bin/pytest backend/tests/test_config.py -v
Expected: 3 passed
- Step 5: Commit
git add backend/app/config.py backend/tests/test_config.py
git commit -m "feat(config): add DATABASE_URL, STORAGE_TYPE, AZURE_STORAGE_*, JWT config vars"
Task 3: Crear models SQLAlchemy
Files:
-
Create:
backend/app/models/db_models.py -
Create:
backend/app/db.py -
Create:
backend/tests/test_db_models.py -
Step 1: Crear backend/app/db.py
# backend/app/db.py
"""SQLAlchemy engine, session factory i Base declarativa."""
from contextlib import contextmanager
from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker, Session
from typing import Generator
class Base(DeclarativeBase):
pass
_engine = None
_SessionLocal = None
def init_db(database_url: str) -> None:
global _engine, _SessionLocal
connect_args = {"check_same_thread": False} if database_url.startswith("sqlite") else {}
_engine = create_engine(database_url, connect_args=connect_args, echo=False)
_SessionLocal = sessionmaker(bind=_engine, autocommit=False, autoflush=False)
Base.metadata.create_all(_engine)
@contextmanager
def get_session() -> Generator[Session, None, None]:
"""Context manager de sessió SQLAlchemy."""
if _SessionLocal is None:
raise RuntimeError("Database not initialized. Call init_db() first.")
db = _SessionLocal()
try:
yield db
except Exception:
db.rollback()
raise
finally:
db.close()
- Step 2: Crear backend/app/models/db_models.py
# backend/app/models/db_models.py
"""Models SQLAlchemy per a tota la persistència de MiroFish."""
import uuid
from datetime import datetime
from typing import Optional
from sqlalchemy import (
String, Integer, Text, Boolean, DateTime, JSON,
ForeignKey, UniqueConstraint
)
from sqlalchemy.orm import Mapped, mapped_column, relationship
from ..db import Base
def _uuid() -> str:
return str(uuid.uuid4())
def _now() -> datetime:
return datetime.utcnow()
class UserModel(Base):
__tablename__ = "users"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
name: Mapped[str] = mapped_column(String(255), nullable=False, default="")
password_hash: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
role: Mapped[str] = mapped_column(String(20), nullable=False, default="user")
status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
projects: Mapped[list["ProjectModel"]] = relationship(
back_populates="owner", cascade="all, delete-orphan"
)
invitation_tokens: Mapped[list["InvitationTokenModel"]] = relationship(
back_populates="user", cascade="all, delete-orphan"
)
password_reset_tokens: Mapped[list["PasswordResetTokenModel"]] = relationship(
back_populates="user", cascade="all, delete-orphan"
)
class ProjectModel(Base):
__tablename__ = "projects"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
user_id: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=True
)
name: Mapped[str] = mapped_column(String(255), nullable=False, default="Unnamed Project")
status: Mapped[str] = mapped_column(String(50), nullable=False, default="created")
analysis_summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
simulation_requirement: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
chunk_size: Mapped[int] = mapped_column(Integer, default=500)
chunk_overlap: Mapped[int] = mapped_column(Integer, default=50)
active_task_id: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("tasks.id", ondelete="SET NULL"), nullable=True
)
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
owner: Mapped[Optional["UserModel"]] = relationship(back_populates="projects")
files: Mapped[list["ProjectFileModel"]] = relationship(
back_populates="project", cascade="all, delete-orphan"
)
ontologies: Mapped[list["OntologyModel"]] = relationship(
back_populates="project", cascade="all, delete-orphan"
)
graphs: Mapped[list["GraphModel"]] = relationship(
back_populates="project", cascade="all, delete-orphan"
)
simulations: Mapped[list["SimulationModel"]] = relationship(
back_populates="project", cascade="all, delete-orphan"
)
reports: Mapped[list["ReportModel"]] = relationship(
back_populates="project", cascade="all, delete-orphan"
)
class ProjectFileModel(Base):
__tablename__ = "project_files"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
project_id: Mapped[str] = mapped_column(
String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
)
original_name: Mapped[str] = mapped_column(String(255), nullable=False)
storage_path: Mapped[str] = mapped_column(Text, nullable=False)
size: Mapped[int] = mapped_column(Integer, default=0)
mime_type: Mapped[str] = mapped_column(String(100), default="application/octet-stream")
file_type: Mapped[str] = mapped_column(String(30), default="upload") # upload | extracted_text
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
project: Mapped["ProjectModel"] = relationship(back_populates="files")
class OntologyModel(Base):
__tablename__ = "ontologies"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
project_id: Mapped[str] = mapped_column(
String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
)
version: Mapped[int] = mapped_column(Integer, default=1)
entity_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
edge_types: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
project: Mapped["ProjectModel"] = relationship(back_populates="ontologies")
graphs: Mapped[list["GraphModel"]] = relationship(back_populates="ontology")
class GraphModel(Base):
__tablename__ = "graphs"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
project_id: Mapped[str] = mapped_column(
String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
)
ontology_id: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("ontologies.id", ondelete="SET NULL"), nullable=True
)
backend: Mapped[str] = mapped_column(String(20), default="zep") # zep | graphiti
external_id: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
status: Mapped[str] = mapped_column(String(20), default="building") # building | ready | failed
node_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
edge_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
project: Mapped["ProjectModel"] = relationship(back_populates="graphs")
ontology: Mapped[Optional["OntologyModel"]] = relationship(back_populates="graphs")
simulations: Mapped[list["SimulationModel"]] = relationship(back_populates="graph")
reports: Mapped[list["ReportModel"]] = relationship(back_populates="graph")
class SimulationModel(Base):
__tablename__ = "simulations"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
project_id: Mapped[str] = mapped_column(
String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
)
graph_id: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True
)
status: Mapped[str] = mapped_column(String(30), default="prepared")
platform: Mapped[str] = mapped_column(String(20), default="twitter") # twitter | reddit | both
config: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
profiles_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
db_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
actions_path: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
rounds_total: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
rounds_completed: Mapped[int] = mapped_column(Integer, default=0)
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
project: Mapped["ProjectModel"] = relationship(back_populates="simulations")
graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="simulations")
reports: Mapped[list["ReportModel"]] = relationship(back_populates="simulation")
class ReportModel(Base):
__tablename__ = "reports"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
project_id: Mapped[str] = mapped_column(
String(36), ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
)
simulation_id: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("simulations.id", ondelete="SET NULL"), nullable=True
)
graph_id: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("graphs.id", ondelete="SET NULL"), nullable=True
)
status: Mapped[str] = mapped_column(String(30), default="generating")
outline: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
storage_prefix: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
project: Mapped["ProjectModel"] = relationship(back_populates="reports")
simulation: Mapped[Optional["SimulationModel"]] = relationship(back_populates="reports")
graph: Mapped[Optional["GraphModel"]] = relationship(back_populates="reports")
class TaskModel(Base):
__tablename__ = "tasks"
id: Mapped[str] = mapped_column(String(36), primary_key=True, default=_uuid)
task_type: Mapped[str] = mapped_column(String(100), nullable=False)
entity_type: Mapped[Optional[str]] = mapped_column(String(50), nullable=True)
entity_id: Mapped[Optional[str]] = mapped_column(String(36), nullable=True)
status: Mapped[str] = mapped_column(String(20), default="pending")
progress: Mapped[int] = mapped_column(Integer, default=0)
message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
result: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
error: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
progress_detail: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=_now)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
class SystemConfigModel(Base):
__tablename__ = "system_config"
key: Mapped[str] = mapped_column(String(100), primary_key=True)
value: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
value_type: Mapped[str] = mapped_column(String(20), default="string")
group: Mapped[str] = mapped_column(String(50), default="general")
label: Mapped[str] = mapped_column(String(255), default="")
description: Mapped[str] = mapped_column(Text, default="")
is_secret: Mapped[bool] = mapped_column(Boolean, default=False)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=_now, onupdate=_now)
updated_by: Mapped[Optional[str]] = mapped_column(
String(36), ForeignKey("users.id", ondelete="SET NULL"), nullable=True
)
class InvitationTokenModel(Base):
__tablename__ = "invitation_tokens"
token: Mapped[str] = mapped_column(String(36), primary_key=True)
user_id: Mapped[str] = mapped_column(
String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False)
used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True)
user: Mapped["UserModel"] = relationship(back_populates="invitation_tokens")
class PasswordResetTokenModel(Base):
__tablename__ = "password_reset_tokens"
token: Mapped[str] = mapped_column(String(36), primary_key=True)
user_id: Mapped[str] = mapped_column(
String(36), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
expires_at: Mapped[datetime] = mapped_column(DateTime, nullable=False)
used_at: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True)
user: Mapped["UserModel"] = relationship(back_populates="password_reset_tokens")
- Step 3: Crear test dels models
# backend/tests/test_db_models.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base, init_db, get_session
from backend.app.models.db_models import (
ProjectModel, TaskModel, OntologyModel, GraphModel,
SimulationModel, ReportModel, UserModel
)
@pytest.fixture
def db_session():
"""Sessió SQLite en memòria per a tests."""
from backend.app import db as db_module
db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
Base.metadata.create_all(db_module._engine)
session = db_module._SessionLocal()
yield session
session.close()
Base.metadata.drop_all(db_module._engine)
db_module._engine = None
db_module._SessionLocal = None
def test_create_project(db_session):
proj = ProjectModel(id="proj-1", name="Test Project")
db_session.add(proj)
db_session.commit()
result = db_session.get(ProjectModel, "proj-1")
assert result.name == "Test Project"
assert result.status == "created"
assert result.chunk_size == 500
def test_create_task(db_session):
task = TaskModel(id="task-1", task_type="graph_build", entity_type="project", entity_id="proj-1")
db_session.add(task)
db_session.commit()
result = db_session.get(TaskModel, "task-1")
assert result.status == "pending"
assert result.progress == 0
def test_project_cascade_delete(db_session):
proj = ProjectModel(id="proj-del", name="Del Project")
db_session.add(proj)
db_session.flush()
ont = OntologyModel(id="ont-1", project_id="proj-del", version=1)
db_session.add(ont)
db_session.commit()
db_session.delete(proj)
db_session.commit()
assert db_session.get(OntologyModel, "ont-1") is None
def test_task_set_null_on_delete(db_session):
task = TaskModel(id="task-del", task_type="graph_build")
proj = ProjectModel(id="proj-2", name="P2", active_task_id="task-del")
db_session.add_all([task, proj])
db_session.commit()
db_session.delete(task)
db_session.commit()
db_session.expire(proj)
refreshed = db_session.get(ProjectModel, "proj-2")
assert refreshed.active_task_id is None
def test_graph_linked_to_ontology(db_session):
proj = ProjectModel(id="proj-g", name="Graph Project")
ont = OntologyModel(id="ont-g", project_id="proj-g", version=1)
graph = GraphModel(id="graph-1", project_id="proj-g", ontology_id="ont-g", backend="zep")
db_session.add_all([proj, ont, graph])
db_session.commit()
result = db_session.get(GraphModel, "graph-1")
assert result.ontology_id == "ont-g"
assert result.backend == "zep"
- Step 4: Executar tests dels models
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/test_db_models.py -v
Expected: 5 passed
- Step 5: Commit
git add backend/app/db.py backend/app/models/db_models.py backend/tests/test_db_models.py
git commit -m "feat(db): add SQLAlchemy Base, session factory, and all ORM models"
Task 4: Configurar Alembic
Files:
-
Create:
backend/alembic.ini -
Create:
backend/alembic/env.py -
Create:
backend/alembic/script.py.mako -
Create:
backend/alembic/versions/0001_initial_schema.py -
Step 1: Inicialitzar Alembic
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
backend/.venv/bin/alembic init alembic
Expected: crea alembic/ i alembic.ini
- Step 2: Actualitzar alembic.ini
Substituir la línia sqlalchemy.url = ... a alembic.ini:
# Canviar aquesta línia:
sqlalchemy.url = driver://user:pass@localhost/dbname
# Per:
sqlalchemy.url = sqlite:///mirofish_dev.db
I afegir just sota [alembic]:
script_location = alembic
- Step 3: Actualitzar alembic/env.py
Substituir el contingut complet d'alembic/env.py:
# backend/alembic/env.py
import os
import sys
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context
# Afegir el backend al path perquè els imports funcionin
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
from app.db import Base
import app.models.db_models # noqa: F401 — registra tots els models al Base
config = context.config
# Llegir DATABASE_URL de l'entorn (prioritat sobre alembic.ini)
db_url = os.environ.get('DATABASE_URL', config.get_main_option('sqlalchemy.url'))
config.set_main_option('sqlalchemy.url', db_url)
if config.config_file_name is not None:
fileConfig(config.config_file_name)
target_metadata = Base.metadata
def run_migrations_offline():
url = config.get_main_option("sqlalchemy.url")
context.configure(url=url, target_metadata=target_metadata, literal_binds=True)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online():
connectable = engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
with connectable.connect() as connection:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()
- Step 4: Generar migració inicial
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
backend/.venv/bin/alembic revision --autogenerate -m "initial_schema"
Expected: crea alembic/versions/XXXX_initial_schema.py amb totes les taules
- Step 5: Aplicar migració
backend/.venv/bin/alembic upgrade head
Expected: Running upgrade -> XXXX, initial_schema
- Step 6: Verificar que la BD té les taules
backend/.venv/bin/python -c "
import sqlite3
conn = sqlite3.connect('mirofish_dev.db')
tables = conn.execute(\"SELECT name FROM sqlite_master WHERE type='table'\").fetchall()
print([t[0] for t in tables])
conn.close()
"
Expected: llista que inclou projects, tasks, users, ontologies, graphs, simulations, reports, system_config
- Step 7: Commit
git add backend/alembic.ini backend/alembic/ backend/mirofish_dev.db
git commit -m "feat(alembic): add initial schema migration for all SQLAlchemy models"
Task 5: Implementar StorageService
Files:
-
Create:
backend/app/storage/__init__.py -
Create:
backend/app/storage/protocol.py -
Create:
backend/app/storage/local.py -
Create:
backend/app/storage/azure_blob.py -
Create:
backend/app/storage/factory.py -
Create:
backend/tests/test_storage.py -
Step 1: Crear el directori i el Protocol
# backend/app/storage/protocol.py
"""Interfície abstracta per a la capa de storage de fitxers."""
from typing import IO, Iterator, Protocol, runtime_checkable
@runtime_checkable
class StorageService(Protocol):
def upload(self, path: str, data: bytes | IO, content_type: str = "application/octet-stream") -> None:
...
def download(self, path: str) -> bytes:
...
def download_stream(self, path: str) -> IO:
...
def delete(self, path: str) -> None:
...
def delete_prefix(self, prefix: str) -> None:
"""Esborra tots els fitxers que comencen per prefix."""
...
def exists(self, path: str) -> bool:
...
def list(self, prefix: str = "") -> list[str]:
"""Retorna paths relatius sota el prefix."""
...
def public_url(self, path: str) -> str | None:
"""URL pública si el backend ho suporta, None si no."""
...
- Step 2: Crear LocalFSStorage
# backend/app/storage/local.py
"""Adapter de storage per a filesystem local."""
import io
import os
import shutil
from pathlib import Path
from .protocol import StorageService
class LocalFSStorage:
"""Implementació de StorageService per a filesystem local."""
def __init__(self, base_path: str) -> None:
self._base = Path(base_path).resolve()
self._base.mkdir(parents=True, exist_ok=True)
def _safe_path(self, relative: str) -> Path:
"""Resol el path i valida que estigui dins del base per evitar path traversal."""
resolved = (self._base / relative).resolve()
if not str(resolved).startswith(str(self._base)):
raise ValueError(f"Path traversal detectat: {relative!r}")
return resolved
def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None:
dest = self._safe_path(path)
dest.parent.mkdir(parents=True, exist_ok=True)
if isinstance(data, bytes):
dest.write_bytes(data)
else:
with open(dest, "wb") as f:
shutil.copyfileobj(data, f)
def download(self, path: str) -> bytes:
return self._safe_path(path).read_bytes()
def download_stream(self, path: str) -> io.BytesIO:
return io.BytesIO(self.download(path))
def delete(self, path: str) -> None:
p = self._safe_path(path)
if p.exists():
p.unlink()
def delete_prefix(self, prefix: str) -> None:
p = self._safe_path(prefix)
if p.is_dir():
shutil.rmtree(p)
elif p.exists():
p.unlink()
def exists(self, path: str) -> bool:
return self._safe_path(path).exists()
def list(self, prefix: str = "") -> list[str]:
base = self._safe_path(prefix) if prefix else self._base
if not base.exists():
return []
result = []
for p in base.rglob("*"):
if p.is_file():
result.append(str(p.relative_to(self._base)))
return result
def public_url(self, path: str) -> str | None:
return None
- Step 3: Crear AzureBlobStorage
# backend/app/storage/azure_blob.py
"""Adapter de storage per a Azure Blob Storage."""
import io
from .protocol import StorageService
class AzureBlobStorage:
"""Implementació de StorageService per a Azure Blob Storage."""
def __init__(self, connection_string: str, container_name: str) -> None:
from azure.storage.blob import BlobServiceClient
self._client = BlobServiceClient.from_connection_string(connection_string)
self._container = container_name
self._ensure_container()
def _ensure_container(self) -> None:
container_client = self._client.get_container_client(self._container)
if not container_client.exists():
container_client.create_container()
def _blob_client(self, path: str):
return self._client.get_blob_client(container=self._container, blob=path)
def upload(self, path: str, data: bytes | io.IOBase, content_type: str = "application/octet-stream") -> None:
blob = self._blob_client(path)
if isinstance(data, bytes):
blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type})
else:
blob.upload_blob(data, overwrite=True, content_settings={"content_type": content_type})
def download(self, path: str) -> bytes:
return self._blob_client(path).download_blob().readall()
def download_stream(self, path: str) -> io.BytesIO:
return io.BytesIO(self.download(path))
def delete(self, path: str) -> None:
self._blob_client(path).delete_blob(delete_snapshots="include")
def delete_prefix(self, prefix: str) -> None:
container = self._client.get_container_client(self._container)
blobs = container.list_blobs(name_starts_with=prefix)
for blob in blobs:
container.delete_blob(blob.name, delete_snapshots="include")
def exists(self, path: str) -> bool:
return self._blob_client(path).exists()
def list(self, prefix: str = "") -> list[str]:
container = self._client.get_container_client(self._container)
return [b.name for b in container.list_blobs(name_starts_with=prefix)]
def public_url(self, path: str) -> str | None:
return self._blob_client(path).url
- Step 4: Crear factory
# backend/app/storage/factory.py
"""Selecciona la implementació de StorageService per STORAGE_TYPE."""
import os
from .protocol import StorageService
def create_storage_service() -> StorageService:
storage_type = os.environ.get("STORAGE_TYPE", "local")
match storage_type:
case "azure":
from .azure_blob import AzureBlobStorage
conn_str = os.environ.get("AZURE_STORAGE_CONNECTION_STRING", "")
container = os.environ.get("AZURE_STORAGE_CONTAINER", "mirofish")
if not conn_str:
raise RuntimeError("AZURE_STORAGE_CONNECTION_STRING no configurada per STORAGE_TYPE=azure")
return AzureBlobStorage(conn_str, container)
case _:
from .local import LocalFSStorage
base = os.environ.get("STORAGE_LOCAL_PATH",
os.path.join(os.path.dirname(__file__), "../../../uploads"))
return LocalFSStorage(base)
- Step 5: Crear init.py del package
# backend/app/storage/__init__.py
from .protocol import StorageService
from .factory import create_storage_service
__all__ = ["StorageService", "create_storage_service"]
- Step 6: Escriure tests de LocalFSStorage
# backend/tests/test_storage.py
import io
import pytest
import tempfile
import os
from backend.app.storage.local import LocalFSStorage
@pytest.fixture
def storage(tmp_path):
return LocalFSStorage(str(tmp_path))
def test_upload_and_download_bytes(storage):
storage.upload("foo/bar.txt", b"hello world", "text/plain")
assert storage.download("foo/bar.txt") == b"hello world"
def test_upload_and_download_stream(storage):
data = io.BytesIO(b"stream data")
storage.upload("test/stream.bin", data)
result = storage.download("test/stream.bin")
assert result == b"stream data"
def test_exists(storage):
assert not storage.exists("not/there.txt")
storage.upload("yes.txt", b"x")
assert storage.exists("yes.txt")
def test_delete(storage):
storage.upload("del.txt", b"bye")
storage.delete("del.txt")
assert not storage.exists("del.txt")
def test_delete_prefix(storage):
storage.upload("dir/a.txt", b"a")
storage.upload("dir/b.txt", b"b")
storage.delete_prefix("dir")
assert not storage.exists("dir/a.txt")
assert not storage.exists("dir/b.txt")
def test_list(storage):
storage.upload("root/x.txt", b"x")
storage.upload("root/y.txt", b"y")
paths = storage.list("root")
assert len(paths) == 2
assert all("root" in p for p in paths)
def test_path_traversal_blocked(storage):
with pytest.raises(ValueError, match="Path traversal"):
storage._safe_path("../../etc/passwd")
def test_public_url_is_none(storage):
storage.upload("f.txt", b"x")
assert storage.public_url("f.txt") is None
- Step 7: Executar tests de storage
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/test_storage.py -v
Expected: 8 passed
- Step 8: Commit
git add backend/app/storage/ backend/tests/test_storage.py
git commit -m "feat(storage): add StorageService protocol, LocalFSStorage, AzureBlobStorage, factory"
Task 6: Injectar DB i Storage a Flask
Files:
-
Modify:
backend/app/__init__.py -
Step 1: Actualitzar create_app per inicialitzar DB i Storage
Afegir just després de app = Flask(__name__) i app.config.from_object(...):
# Inicialitzar BD
from .db import init_db
init_db(app.config['DATABASE_URL'])
# Inicialitzar Storage
from .storage import create_storage_service
app.extensions['storage'] = create_storage_service()
I afegir una funció helper al final del fitxer (fora de create_app):
def get_storage():
"""Accés al StorageService des de qualsevol context Flask."""
from flask import current_app
return current_app.extensions['storage']
- Step 2: Verificar que l'app arrenca correctament
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
DATABASE_URL=sqlite:///test_startup.db STORAGE_TYPE=local \
backend/.venv/bin/python -c "
from backend.app import create_app
app = create_app()
print('App created OK')
print('Storage:', app.extensions.get('storage'))
"
Expected: App created OK + Storage: <LocalFSStorage ...>
- Step 3: Netejar fitxer de test
rm -f /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend/test_startup.db
- Step 4: Commit
git add backend/app/__init__.py
git commit -m "feat(app): inject SQLAlchemy DB and StorageService into Flask app factory"
Task 7: Refactoritzar TaskManager → BD
Files:
- Modify:
backend/app/models/task.py - Create:
backend/tests/test_task_manager_db.py
El TaskManager actual és in-memory. El refactoritzem per usar la BD via get_session(). Mantenim la mateixa interfície pública (create_task, get_task, update_task, complete_task, fail_task, list_tasks) per no trencar cap cridador.
- Step 1: Escriure els tests del nou TaskManager
# backend/tests/test_task_manager_db.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base
import backend.app.db as db_module
from backend.app.models.db_models import TaskModel
@pytest.fixture(autouse=True)
def isolated_db():
"""BD SQLite en memòria per a cada test."""
db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
Base.metadata.create_all(db_module._engine)
yield
Base.metadata.drop_all(db_module._engine)
db_module._engine = None
db_module._SessionLocal = None
def test_create_and_get_task():
from backend.app.models.task import TaskManager
tm = TaskManager()
task_id = tm.create_task("graph_build", {"project_id": "proj-1"})
task = tm.get_task(task_id)
assert task is not None
assert task["task_type"] == "graph_build"
assert task["status"] == "pending"
assert task["progress"] == 0
def test_update_task_progress():
from backend.app.models.task import TaskManager
tm = TaskManager()
task_id = tm.create_task("ontology_generate")
tm.update_task(task_id, progress=50, message="Halfway")
task = tm.get_task(task_id)
assert task["progress"] == 50
assert task["message"] == "Halfway"
def test_complete_task():
from backend.app.models.task import TaskManager
tm = TaskManager()
task_id = tm.create_task("graph_build")
tm.complete_task(task_id, {"graph_id": "g-1"})
task = tm.get_task(task_id)
assert task["status"] == "completed"
assert task["progress"] == 100
assert task["result"]["graph_id"] == "g-1"
def test_fail_task():
from backend.app.models.task import TaskManager
tm = TaskManager()
task_id = tm.create_task("simulation_prepare")
tm.fail_task(task_id, "LLM timeout")
task = tm.get_task(task_id)
assert task["status"] == "failed"
assert task["error"] == "LLM timeout"
def test_task_survives_new_manager_instance():
"""La tasca ha d'estar a la BD, no a la memòria."""
from backend.app.models.task import TaskManager
tm1 = TaskManager()
task_id = tm1.create_task("graph_build")
# Crear una nova instància (simula reinici)
TaskManager._instance = None
tm2 = TaskManager()
task = tm2.get_task(task_id)
assert task is not None
assert task["task_id"] == task_id
def test_list_tasks():
from backend.app.models.task import TaskManager
tm = TaskManager()
tm.create_task("graph_build")
tm.create_task("graph_build")
tm.create_task("ontology_generate")
all_tasks = tm.list_tasks()
assert len(all_tasks) == 3
graph_tasks = tm.list_tasks(task_type="graph_build")
assert len(graph_tasks) == 2
- Step 2: Executar tests per verificar que fallen
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v
Expected: test_task_survives_new_manager_instance FAIL (perquè ara és in-memory)
- Step 3: Refactoritzar TaskManager
Substituir el contingut de backend/app/models/task.py:
"""Task state management — persistent via SQLAlchemy."""
import uuid
import threading
from datetime import datetime
from enum import Enum
from typing import Dict, Any, Optional, List
from ..db import get_session
from ..models.db_models import TaskModel
from ..utils.locale import t
class TaskStatus(str, Enum):
PENDING = "pending"
PROCESSING = "processing"
COMPLETED = "completed"
FAILED = "failed"
class TaskManager:
"""Task manager — thread-safe, persistent via SQLAlchemy."""
_instance = None
_lock = threading.Lock()
def __new__(cls):
if cls._instance is None:
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def create_task(self, task_type: str, metadata: Optional[Dict] = None) -> str:
task_id = str(uuid.uuid4())
with get_session() as db:
task = TaskModel(
id=task_id,
task_type=task_type,
status="pending",
progress=0,
progress_detail=metadata or {},
)
db.add(task)
db.commit()
return task_id
def get_task(self, task_id: str) -> Optional[Dict[str, Any]]:
with get_session() as db:
task = db.get(TaskModel, task_id)
if task is None:
return None
return self._to_dict(task)
def update_task(
self,
task_id: str,
status: Optional[str] = None,
progress: Optional[int] = None,
message: Optional[str] = None,
result: Optional[Dict] = None,
error: Optional[str] = None,
progress_detail: Optional[Dict] = None,
) -> None:
with get_session() as db:
task = db.get(TaskModel, task_id)
if task is None:
return
if status is not None:
task.status = status
if progress is not None:
task.progress = progress
if message is not None:
task.message = message
if result is not None:
task.result = result
if error is not None:
task.error = error
if progress_detail is not None:
task.progress_detail = progress_detail
task.updated_at = datetime.utcnow()
db.commit()
def complete_task(self, task_id: str, result: Dict) -> None:
self.update_task(
task_id,
status=TaskStatus.COMPLETED,
progress=100,
message=t("progress.taskComplete"),
result=result,
)
def fail_task(self, task_id: str, error: str) -> None:
self.update_task(
task_id,
status=TaskStatus.FAILED,
message=t("progress.taskFailed"),
error=error,
)
def list_tasks(self, task_type: Optional[str] = None) -> List[Dict[str, Any]]:
from sqlalchemy import select, desc
with get_session() as db:
stmt = select(TaskModel).order_by(desc(TaskModel.created_at))
if task_type:
stmt = stmt.where(TaskModel.task_type == task_type)
tasks = db.execute(stmt).scalars().all()
return [self._to_dict(t) for t in tasks]
def cleanup_old_tasks(self, max_age_hours: int = 24) -> None:
from datetime import timedelta
from sqlalchemy import delete
cutoff = datetime.utcnow() - timedelta(hours=max_age_hours)
with get_session() as db:
db.execute(
delete(TaskModel).where(
TaskModel.created_at < cutoff,
TaskModel.status.in_(["completed", "failed"]),
)
)
db.commit()
@staticmethod
def _to_dict(task: TaskModel) -> Dict[str, Any]:
return {
"task_id": task.id,
"task_type": task.task_type,
"status": task.status,
"created_at": task.created_at.isoformat(),
"updated_at": task.updated_at.isoformat(),
"progress": task.progress,
"message": task.message or "",
"progress_detail": task.progress_detail or {},
"result": task.result,
"error": task.error,
"metadata": task.progress_detail or {},
}
Nota: get_session() ja és un context manager des del Task 3. Usa with get_session() as db: tal com es mostra al codi.
- Step 4: Executar tests del TaskManager
backend/.venv/bin/pytest backend/tests/test_task_manager_db.py -v
Expected: 6 passed
- Step 5: Commit
git add backend/app/models/task.py backend/app/db.py backend/tests/test_task_manager_db.py
git commit -m "feat(task): refactor TaskManager to persist tasks in SQLAlchemy DB"
Task 8: Refactoritzar ProjectManager → BD + Storage
Files:
- Modify:
backend/app/models/project.py - Create:
backend/tests/test_project_manager_db.py
Refactoritzem ProjectManager per usar la BD per a metadades i StorageService per a fitxers. Mantenim la mateixa interfície pública.
- Step 1: Escriure tests del nou ProjectManager
# backend/tests/test_project_manager_db.py
import io
import pytest
import tempfile
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base
import backend.app.db as db_module
from backend.app.storage.local import LocalFSStorage
@pytest.fixture(autouse=True)
def isolated_db(tmp_path):
db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
Base.metadata.create_all(db_module._engine)
yield
Base.metadata.drop_all(db_module._engine)
db_module._engine = None
db_module._SessionLocal = None
@pytest.fixture
def storage(tmp_path):
return LocalFSStorage(str(tmp_path))
def test_create_project(storage):
from backend.app.models.project import ProjectManager
proj = ProjectManager.create_project("Test Project", storage=storage)
assert proj["name"] == "Test Project"
assert proj["status"] == "created"
assert "id" in proj
def test_get_project(storage):
from backend.app.models.project import ProjectManager
created = ProjectManager.create_project("My Project", storage=storage)
fetched = ProjectManager.get_project(created["id"])
assert fetched is not None
assert fetched["name"] == "My Project"
def test_project_not_found(storage):
from backend.app.models.project import ProjectManager
result = ProjectManager.get_project("nonexistent-id")
assert result is None
def test_save_and_get_extracted_text(storage):
from backend.app.models.project import ProjectManager
proj = ProjectManager.create_project("Text Project", storage=storage)
ProjectManager.save_extracted_text(proj["id"], "hello extracted", storage=storage)
text = ProjectManager.get_extracted_text(proj["id"], storage=storage)
assert text == "hello extracted"
def test_project_survives_manager_reset(storage):
"""Les dades han d'estar a la BD, no a la memòria."""
from backend.app.models.project import ProjectManager
created = ProjectManager.create_project("Persist Me", storage=storage)
# Simular reinici: netejar l'estat en memòria si n'hi ha
fetched = ProjectManager.get_project(created["id"])
assert fetched is not None
def test_list_projects(storage):
from backend.app.models.project import ProjectManager
ProjectManager.create_project("P1", storage=storage)
ProjectManager.create_project("P2", storage=storage)
projects = ProjectManager.list_projects()
assert len(projects) == 2
def test_delete_project(storage):
from backend.app.models.project import ProjectManager
proj = ProjectManager.create_project("Del Me", storage=storage)
result = ProjectManager.delete_project(proj["id"], storage=storage)
assert result is True
assert ProjectManager.get_project(proj["id"]) is None
- Step 2: Executar tests per verificar que fallen
backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v
Expected: errors (interfície actual no accepta storage= paràmetre)
- Step 3: Refactoritzar ProjectManager
Substituir el contingut de backend/app/models/project.py:
"""Project context management — persistent via SQLAlchemy + StorageService."""
import uuid
import io
from datetime import datetime
from typing import Dict, Any, List, Optional
from enum import Enum
from ..db import get_session
from ..models.db_models import ProjectModel, ProjectFileModel
class ProjectStatus(str, Enum):
CREATED = "created"
ONTOLOGY_GENERATED = "ontology_generated"
GRAPH_BUILDING = "graph_building"
GRAPH_COMPLETED = "graph_completed"
FAILED = "failed"
class ProjectManager:
"""Gestiona projectes: metadades a BD, fitxers a StorageService."""
@classmethod
def create_project(cls, name: str = "Unnamed Project", storage=None) -> Dict[str, Any]:
project_id = str(uuid.uuid4())
with get_session() as db:
proj = ProjectModel(id=project_id, name=name, status="created")
db.add(proj)
db.commit()
db.refresh(proj)
return cls._to_dict(proj)
@classmethod
def get_project(cls, project_id: str) -> Optional[Dict[str, Any]]:
with get_session() as db:
proj = db.get(ProjectModel, project_id)
if proj is None:
return None
return cls._to_dict(proj)
@classmethod
def save_project(cls, project_data: Dict[str, Any]) -> None:
"""Actualitza els camps d'un projecte existent."""
project_id = project_data.get("id") or project_data.get("project_id")
with get_session() as db:
proj = db.get(ProjectModel, project_id)
if proj is None:
return
updatable = [
"name", "status", "analysis_summary", "simulation_requirement",
"chunk_size", "chunk_overlap", "active_task_id",
]
for field in updatable:
if field in project_data:
setattr(proj, field, project_data[field])
proj.updated_at = datetime.utcnow()
db.commit()
@classmethod
def list_projects(cls, limit: int = 50) -> List[Dict[str, Any]]:
from sqlalchemy import select, desc
with get_session() as db:
stmt = select(ProjectModel).order_by(desc(ProjectModel.created_at)).limit(limit)
projects = db.execute(stmt).scalars().all()
return [cls._to_dict(p) for p in projects]
@classmethod
def delete_project(cls, project_id: str, storage=None) -> bool:
with get_session() as db:
proj = db.get(ProjectModel, project_id)
if proj is None:
return False
# Esborrar fitxers de storage si s'ha passat el servei
if storage is not None:
storage.delete_prefix(f"projects/{project_id}")
db.delete(proj)
db.commit()
return True
@classmethod
def save_file_to_project(
cls,
project_id: str,
file_storage, # Flask FileStorage
original_filename: str,
storage,
) -> Dict[str, Any]:
import os
ext = os.path.splitext(original_filename)[1].lower()
safe_filename = f"{uuid.uuid4().hex[:8]}{ext}"
storage_path = f"projects/{project_id}/files/{safe_filename}"
data = file_storage.read()
storage.upload(storage_path, data)
mime_type = getattr(file_storage, "content_type", "application/octet-stream") or "application/octet-stream"
with get_session() as db:
file_rec = ProjectFileModel(
id=str(uuid.uuid4()),
project_id=project_id,
original_name=original_filename,
storage_path=storage_path,
size=len(data),
mime_type=mime_type,
file_type="upload",
)
db.add(file_rec)
db.commit()
return {
"original_filename": original_filename,
"saved_filename": safe_filename,
"storage_path": storage_path,
"size": len(data),
}
@classmethod
def save_extracted_text(cls, project_id: str, text: str, storage) -> None:
storage_path = f"projects/{project_id}/extracted_text.txt"
storage.upload(storage_path, text.encode("utf-8"), "text/plain")
with get_session() as db:
from sqlalchemy import select
stmt = select(ProjectFileModel).where(
ProjectFileModel.project_id == project_id,
ProjectFileModel.file_type == "extracted_text",
)
existing = db.execute(stmt).scalar_one_or_none()
if existing:
existing.storage_path = storage_path
existing.size = len(text.encode("utf-8"))
else:
rec = ProjectFileModel(
id=str(uuid.uuid4()),
project_id=project_id,
original_name="extracted_text.txt",
storage_path=storage_path,
size=len(text.encode("utf-8")),
mime_type="text/plain",
file_type="extracted_text",
)
db.add(rec)
db.commit()
@classmethod
def get_extracted_text(cls, project_id: str, storage) -> Optional[str]:
storage_path = f"projects/{project_id}/extracted_text.txt"
if not storage.exists(storage_path):
return None
return storage.download(storage_path).decode("utf-8")
@staticmethod
def _to_dict(proj: ProjectModel) -> Dict[str, Any]:
return {
"id": proj.id,
"project_id": proj.id, # compatibilitat amb codi existent
"name": proj.name,
"status": proj.status,
"analysis_summary": proj.analysis_summary,
"simulation_requirement": proj.simulation_requirement,
"chunk_size": proj.chunk_size,
"chunk_overlap": proj.chunk_overlap,
"active_task_id": proj.active_task_id,
"created_at": proj.created_at.isoformat(),
"updated_at": proj.updated_at.isoformat(),
# Camps llegits del model antic — ara buits per compatibilitat
"files": [],
"total_text_length": 0,
"ontology": None,
"graph_id": None,
"graph_build_task_id": None,
"error": None,
}
- Step 4: Executar tests del ProjectManager
backend/.venv/bin/pytest backend/tests/test_project_manager_db.py -v
Expected: 7 passed
- Step 5: Commit
git add backend/app/models/project.py backend/tests/test_project_manager_db.py
git commit -m "feat(project): refactor ProjectManager to persist via SQLAlchemy + StorageService"
Task 9: Actualitzar tests existents i verificació final
Files:
-
Modify:
backend/tests/conftest.py -
Modify:
backend/tests/test_project_task_recovery.py(si afectat) -
Step 1: Actualitzar conftest.py per afegir fixtures globals
# backend/tests/conftest.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from backend.app.db import Base
import backend.app.db as db_module
@pytest.fixture(autouse=True)
def reset_graph_factory_singleton():
"""Reset the graph backend singleton before each test."""
yield
try:
import backend.app.graph.factory as fmod
fmod._backend_instance = None
except ImportError:
pass
@pytest.fixture(autouse=True)
def reset_task_manager_singleton():
"""Reset TaskManager singleton between tests."""
from backend.app.models import task as task_module
task_module.TaskManager._instance = None
yield
task_module.TaskManager._instance = None
@pytest.fixture
def in_memory_db():
"""BD SQLite en memòria per a tests que necessiten BD."""
db_module._engine = create_engine("sqlite:///:memory:", connect_args={"check_same_thread": False})
db_module._SessionLocal = sessionmaker(bind=db_module._engine, autocommit=False, autoflush=False)
Base.metadata.create_all(db_module._engine)
yield db_module._engine
Base.metadata.drop_all(db_module._engine)
db_module._engine = None
db_module._SessionLocal = None
- Step 2: Executar tota la suite de tests
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia
backend/.venv/bin/pytest backend/tests/ -v --tb=short 2>&1 | tail -30
Expected: tots els tests del Task 2-8 passen. El test test_config_graph_backend_default pot continuar fallant (falla preexistent no relacionada).
- Step 3: Verificar que l'app arrenca i la BD es crea correctament
cd /home/ubuntu/dev/MiroFish/.worktrees/persistencia/backend
DATABASE_URL=sqlite:///verify_startup.db \
STORAGE_TYPE=local \
STORAGE_LOCAL_PATH=/tmp/mirofish_test_uploads \
LLM_API_KEY=test-key \
ZEP_API_KEY=test-key \
.venv/bin/python -c "
from app import create_app
app = create_app()
with app.app_context():
from app.models.project import ProjectManager
from app.storage import create_storage_service
storage = app.extensions['storage']
proj = ProjectManager.create_project('Startup Test', storage=storage)
print('Project created:', proj['id'])
fetched = ProjectManager.get_project(proj['id'])
print('Project fetched:', fetched['name'])
print('Verification OK')
"
rm -f verify_startup.db
Expected: Verification OK
- Step 4: Commit final de la Fase 1
git add backend/tests/conftest.py
git commit -m "test(conftest): add in_memory_db and task manager singleton reset fixtures"
git tag fase1-infraestructura-base
Verificació end-to-end de la Fase 1
# 1. Tots els tests passen
backend/.venv/bin/pytest backend/tests/ -v
# 2. La BD es crea amb les migracions
backend/.venv/bin/alembic upgrade head
# 3. L'app arrenca correctament
DATABASE_URL=sqlite:///mirofish_dev.db STORAGE_TYPE=local LLM_API_KEY=x ZEP_API_KEY=x \
backend/.venv/bin/python backend/run.py &
sleep 2
curl -s http://localhost:5001/health | python3 -m json.tool
kill %1
Expected final: {"service": "MiroFish Backend", "status": "ok"}
Nota: Les Fases 2 (Auth+RBAC), 3 (pipeline) i 4 (hardening producció) tindran els seus propis plans, escrits quan comenci cada fase.