diff --git a/.gitignore b/.gitignore index 1b8926ef..01976b09 100644 --- a/.gitignore +++ b/.gitignore @@ -63,3 +63,7 @@ data/ # Configuració Azure amb secrets (no comitejar mai) azure/config.sh + +# Fitxers locals de treball (no per al repo) +exports.txt +docs/2026-04-26-enterprise-roadmap.md diff --git a/TechnicalDesign.md b/TechnicalDesign.md new file mode 100644 index 00000000..7ecb3b13 --- /dev/null +++ b/TechnicalDesign.md @@ -0,0 +1,146 @@ +# Technical Design — MiroFish + +## Graph Backend + +MiroFish suporta dos backends de knowledge graph, seleccionable via `GRAPH_BACKEND` al `.env`: + +| Valor | Backend | Requisits | +|-------|---------|-----------| +| `zep` (per defecte) | Zep Cloud (gestionat) | `ZEP_API_KEY` | +| `graphiti` | Graphiti + Neo4j (self-hosted) | `NEO4J_PASSWORD` + variables LLM | + +La selecció es fa via la factoria `backend/app/graph/factory.py` — un singleton que instancia `ZepBackend` o `GraphitiBackend` en funció de `GRAPH_BACKEND`. La validació de configuració és condicionada: si `GRAPH_BACKEND=graphiti`, `ZEP_API_KEY` no és necessari i viceversa. + +### Commutació entre backends + +Només cal canviar al `.env`: + +```env +# Per usar Zep Cloud: +GRAPH_BACKEND=zep +ZEP_API_KEY=z_... + +# Per usar Graphiti + Neo4j: +GRAPH_BACKEND=graphiti +NEO4J_URI=bolt://:7687 +NEO4J_USER=neo4j +NEO4J_PASSWORD= +``` + +--- + +## Models LLM + +El projecte usa fins a quatre grups de variables LLM, cadascun per a un ús diferent. Totes les variables `LLM_SMALL_*` i `LLM_EMBED_*` fan **fallback** als valors `LLM_*` si no s'estableixen. + +### Variables de configuració + +```env +# ── Model principal (generatiu, potent) ────────────────────────────────────── +LLM_API_KEY=... +LLM_BASE_URL=https://.cognitiveservices.azure.com/openai/deployments//chat/completions?api-version=2024-05-01-preview +LLM_MODEL_NAME=gpt-5.4 + +# ── Model petit/ràpid (lightweight, econòmic) ──────────────────────────────── +# Fallback a LLM_* si no definit +LLM_SMALL_API_KEY=... +LLM_SMALL_BASE_URL=https://.cognitiveservices.azure.com/openai/deployments//chat/completions?api-version=2024-05-01-preview +LLM_SMALL_MODEL_NAME=gpt-5-mini + +# ── Model d'embedding (vectorització) ──────────────────────────────────────── +# Fallback a LLM_* si no definit. Requerit per Graphiti. +LLM_EMBED_API_KEY=... +LLM_EMBED_BASE_URL=https://.services.ai.azure.com/openai/deployments//embeddings?api-version=2024-05-01-preview +LLM_EMBED_MODEL_NAME=text-embedding-3-large + +# ── Model boost (simulació OASIS, opcional) ────────────────────────────────── +# Fallback a LLM_* si no definit +LLM_BOOST_API_KEY=... +LLM_BOOST_BASE_URL=... +LLM_BOOST_MODEL_NAME=gpt-5.4 +``` + +### Mapa d'usos per operació + +| Grup de variables | Component | Operació | +|---|---|---| +| `LLM_*` | `OntologyGenerator` | Pas 1 — Anàlisi del document i generació d'ontologia (tipus d'entitat i relació) | +| `LLM_*` | `GraphBuilderService` (mode Zep) | Pas 2 — Extracció d'entitats i relacions del text via Zep SDK | +| `LLM_*` | Graphiti `OpenAIClient` (mode graphiti) | Pas 2 — Extracció d'entitats i relacions del text via graphiti-core | +| `LLM_*` | `OasisProfileGenerator` | Pas 2 — Generació de perfils d'agents OASIS a partir del graf | +| `LLM_*` | `ReportAgent` | Pas 4 — Generació de l'informe analític final (multi-turn, tool use) | +| `LLM_SMALL_*` | Graphiti `OpenAIRerankerClient` | Pas 2 — Reranking de resultats de cerca al graf (mode graphiti) | +| `LLM_SMALL_*` | Graphiti `ModelSize.small` | Pas 2 — Tasques lleugeres internes de graphiti (extracció simplificada) | +| `LLM_EMBED_*` | Graphiti `OpenAIEmbedder` | Pas 2 — Generació de vectors d'embedding per a indexació i cerca semàntica a Neo4j (mode graphiti) | +| `LLM_BOOST_*` | `SimulationRunner` / `run_parallel_simulation.py` | Pas 3 — Decisions d'acció de cada agent durant la simulació OASIS | + +### API endpoint usada per cada component + +Tots els components del projecte usen **`chat.completions`** o **`embeddings`** — mai `responses` (beta). + +| Component | API endpoint | Nota | +|---|---|---| +| `LLMClient` (wrapper projecte) | `chat.completions.create` | Síncrona (`OpenAI`) | +| `OntologyGenerator` | `chat.completions.create` | Via `LLMClient` | +| `OasisProfileGenerator` | `chat.completions.create` | Client intern | +| `SimulationConfigGenerator` | `chat.completions.create` | Client intern | +| `ReportAgent` | `chat.completions.create` | Via `LLMClient` | +| Graphiti `OpenAIGenericClient` | `chat.completions.create` | AsyncOpenAI, injectable | +| Graphiti `OpenAIRerankerClient` | `chat.completions.create` | Amb `logprobs=True` per scoring | +| Graphiti `OpenAIEmbedder` | `embeddings.create` | AsyncOpenAI, injectable | +| OASIS/CAMEL-AI | `chat.completions.create` | Via `ModelFactory` (CAMEL abstraction) | + +> **Nota:** graphiti-core inclou també un `OpenAIClient` que usa `responses.parse` (API beta d'OpenAI). MiroFish **no l'usa** — configura `OpenAIGenericClient` que sempre usa `chat.completions`, compatible amb Azure i qualsevol API OpenAI-compatible. + +### Notes sobre Azure OpenAI + +- `LLM_BASE_URL` accepta la URL completa d'Azure (`/chat/completions?api-version=...`). El codi la processa automàticament: extreu el `api-version` com a `default_query` i retalla el sufix per al SDK. +- El mateix tractament s'aplica a `LLM_EMBED_BASE_URL` (sufix `/embeddings?api-version=...`). +- `LLM_SMALL_BASE_URL` accepta directament la URL base d'Azure AI Foundry (`services.ai.azure.com/api/projects//openai/v1/`) sense sufix ni `api-version`. +- `LLM_EMBED_BASE_URL` pot usar el domini `services.ai.azure.com` o `cognitiveservices.azure.com`. + +### Recomanació de models (Azure OpenAI) + +| Grup | Model recomanat | Motiu | +|------|----------------|-------| +| `LLM_*` | `gpt-5.4` | Raonament complex: ontologia, extracció de graf, informes | +| `LLM_SMALL_*` | `gpt-5-mini` | Tasques lleugeres i econòmiques: reranking, classificació | +| `LLM_EMBED_*` | `text-embedding-3-large` | Màxima qualitat d'embedding semàntic | +| `LLM_BOOST_*` | `gpt-5.4` o `gpt-5-mini` | Simulació: moltes crides curtes, prioritzar velocitat/cost | + +--- + +## Pipeline de 5 passos + +``` +Pas 1 — Graph Build (ontologia) + └─ OntologyGenerator → LLM_* + +Pas 2 — Graph Build (construcció) + ├─ mode zep: GraphBuilderService + Zep SDK → LLM_* + └─ mode graphiti: GraphitiBackend + ├─ extracció: OpenAIGenericClient → LLM_* (chat.completions) + ├─ reranking: OpenAIRerankerClient → LLM_SMALL_* + └─ embedding: OpenAIEmbedder → LLM_EMBED_* + +Pas 3 — Simulació OASIS + └─ SimulationRunner / run_parallel_simulation.py → LLM_BOOST_* (o LLM_*) + +Pas 4 — Informe + └─ ReportAgent (multi-turn + tool use) → LLM_* + +Pas 5 — Interacció live + └─ Chat amb agents simulats → LLM_* +``` + +--- + +## Internacionalització (i18n) + +- Fitxers de traducció: `/locales/{ca,en,es,zh}.json` — compartits per frontend i backend. +- Instruccions de llengua per al LLM: `/locales/languages.json` (clau `llmInstruction`). +- El frontend injecta el locale actual via header `Accept-Language` a cada petició API. +- El backend detecta el locale a `backend/app/utils/locale.py:get_locale()` i l'usa per: + - Traduccions de missatges d'error (`t()`) + - Instruccions d'idioma als prompts LLM (`get_language_instruction()`) +- L'ontologia generada (descripcions, exemples, `analysis_summary`) sortirà en l'idioma de la UI. Els **noms** de tipus d'entitat i relació seguiran PascalCase/UPPER\_SNAKE\_CASE en l'idioma de la UI (p.ex. `AgenciaGovern`, `TREBALLA_PER` en català). diff --git a/backend/app/api/graph.py b/backend/app/api/graph.py index 9dbcc927..cc0a9053 100644 --- a/backend/app/api/graph.py +++ b/backend/app/api/graph.py @@ -4,6 +4,7 @@ Uses project context mechanism with server-side persistent state """ import os +import json import traceback import threading from flask import request, jsonify @@ -255,6 +256,126 @@ def generate_ontology(): }), 500 +# ============== Endpoint 1b: Import ontology ============== + +@graph_bp.route('/ontology/import', methods=['POST']) +def import_ontology(): + """ + Endpoint 1b: Upload files and import a pre-existing ontology definition + + Request method: multipart/form-data + + Parameters: + files: Uploaded files (PDF/MD/TXT), multiple allowed + simulation_requirement: Simulation requirement description (required) + ontology: JSON string with entity_types and edge_types (required) + project_name: Project name (optional) + + Returns same structure as generate_ontology. + """ + try: + logger.info("=== Starting ontology import ===") + + simulation_requirement = request.form.get('simulation_requirement', '') + project_name = request.form.get('project_name', 'Unnamed Project') + ontology_json = request.form.get('ontology', '') + + if not simulation_requirement: + return jsonify({ + "success": False, + "error": t('api.requireSimulationRequirement') + }), 400 + + if not ontology_json: + return jsonify({ + "success": False, + "error": t('api.requireOntologyJson') + }), 400 + + try: + ontology = json.loads(ontology_json) + except (ValueError, TypeError): + return jsonify({ + "success": False, + "error": t('api.invalidOntologyJson') + }), 400 + + if not isinstance(ontology.get('entity_types'), list) or not isinstance(ontology.get('edge_types'), list): + return jsonify({ + "success": False, + "error": t('api.invalidOntologyStructure') + }), 400 + + uploaded_files = request.files.getlist('files') + if not uploaded_files or all(not f.filename for f in uploaded_files): + return jsonify({ + "success": False, + "error": t('api.requireFileUpload') + }), 400 + + project = ProjectManager.create_project(name=project_name) + project.simulation_requirement = simulation_requirement + logger.info(f"Project created for import: {project.project_id}") + + document_texts = [] + all_text = "" + + for file in uploaded_files: + if file and file.filename and allowed_file(file.filename): + file_info = ProjectManager.save_file_to_project( + project.project_id, + file, + file.filename + ) + project.files.append({ + "filename": file_info["original_filename"], + "size": file_info["size"] + }) + + text = FileParser.extract_text(file_info["path"]) + text = TextProcessor.preprocess_text(text) + document_texts.append(text) + all_text += f"\n\n=== {file_info['original_filename']} ===\n{text}" + + if not document_texts: + ProjectManager.delete_project(project.project_id) + return jsonify({ + "success": False, + "error": t('api.noDocProcessed') + }), 400 + + project.total_text_length = len(all_text) + ProjectManager.save_extracted_text(project.project_id, all_text) + + project.ontology = { + "entity_types": ontology.get("entity_types", []), + "edge_types": ontology.get("edge_types", []) + } + project.analysis_summary = ontology.get("analysis_summary", "") + project.status = ProjectStatus.ONTOLOGY_GENERATED + ProjectManager.save_project(project) + logger.info(f"=== Ontology import complete === Project ID: {project.project_id}") + + return jsonify({ + "success": True, + "data": { + "project_id": project.project_id, + "project_name": project.name, + "ontology": project.ontology, + "analysis_summary": project.analysis_summary, + "files": project.files, + "total_text_length": project.total_text_length + } + }) + + except Exception as e: + return jsonify({ + "success": False, + "error": str(e), + "traceback": traceback.format_exc() + }), 500 + + # ============== Endpoint 2: Build graph ============== @graph_bp.route('/build', methods=['POST']) diff --git a/backend/app/api/simulation.py b/backend/app/api/simulation.py index 6ca6d78b..008a6089 100644 --- a/backend/app/api/simulation.py +++ b/backend/app/api/simulation.py @@ -1397,7 +1397,7 @@ def generate_profiles(): "error": t('api.noMatchingEntities') }), 400 - generator = OasisProfileGenerator() + generator = OasisProfileGenerator(graph_id=graph_id) profiles = generator.generate_profiles_from_entities( entities=filtered.entities, use_llm=use_llm diff --git a/backend/app/graph/graphiti_backend.py b/backend/app/graph/graphiti_backend.py index e8f9576f..c8a100a7 100644 --- a/backend/app/graph/graphiti_backend.py +++ b/backend/app/graph/graphiti_backend.py @@ -196,22 +196,51 @@ class GraphitiBackend(GraphBackend): @staticmethod def _patch_extract_entity_attributes() -> None: - """Monkey-patch graphiti's _extract_entity_attributes to sanitize LLM output. + """Monkey-patch graphiti internals to fix two LLM quirks: - Some LLMs return attribute values as nested dicts ({"value": "CTTI"}) instead - of plain strings. Neo4j rejects these with TypeError. We intercept the raw - llm_response dict before it is stored in node.attributes and flatten it. + 1. _extract_entity_attributes: some LLMs wrap attribute values in nested + dicts ({"value": "CTTI"}). Neo4j rejects these — flatten them. + 2. _extract_nodes_single: some LLMs omit entity_type_id from extracted + entities, causing a Pydantic ValidationError. Default missing IDs to 0 + (the generic "Entity" type) before validation runs. """ import graphiti_core.utils.maintenance.node_operations as _node_ops - original = _node_ops._extract_entity_attributes + # --- patch 1: attribute flattening --- + original_attrs = _node_ops._extract_entity_attributes - async def _patched(llm_client, node, episode, previous_episodes, entity_type): - result = await original(llm_client, node, episode, previous_episodes, entity_type) - # result is a dict — flatten any dict-valued attributes + async def _patched_attrs(llm_client, node, episode, previous_episodes, entity_type): + result = await original_attrs(llm_client, node, episode, previous_episodes, entity_type) return _flatten_attributes(result) if result else result - _node_ops._extract_entity_attributes = _patched + _node_ops._extract_entity_attributes = _patched_attrs + + # --- patch 2: entity_type_id defaulting --- + original_nodes = _node_ops._extract_nodes_single + + async def _patched_nodes(llm_client, episode, context): + from graphiti_core.utils.maintenance.node_operations import ExtractedEntities + # Call the LLM the normal way but catch the Pydantic validation error + # that arises when the LLM forgets entity_type_id. + try: + return await original_nodes(llm_client, episode, context) + except Exception as exc: + # Only intercept Pydantic validation errors about entity_type_id + if "entity_type_id" not in str(exc): + raise + logger.warning(f"LLM omitted entity_type_id — defaulting to 0 and retrying validation: {exc}") + # Re-run the LLM call via the internal helper to get the raw dict + from graphiti_core.utils.maintenance.node_operations import _call_extraction_llm + llm_response = await _call_extraction_llm(llm_client, episode, context) + # Inject entity_type_id=0 for any entity that is missing it + entities = llm_response.get("extracted_entities", []) + for ent in entities: + if isinstance(ent, dict) and "entity_type_id" not in ent: + ent["entity_type_id"] = 0 + response_object = ExtractedEntities(**llm_response) + return response_object.extracted_entities + + _node_ops._extract_nodes_single = _patched_nodes def create_graph(self, graph_id: str, name: str, description: str = "") -> None: logger.info(f"Graphiti graph namespace ready: {graph_id}") @@ -428,20 +457,46 @@ class GraphitiBackend(GraphBackend): return edges def search(self, graph_id: str, query: str, limit: int = 10, scope: str = "edges") -> Dict[str, Any]: - results = _run_async( - self._client.search(query=query, group_ids=[graph_id], num_results=limit) + max_retries = 3 + delay = 2.0 + last_exc = None + for attempt in range(max_retries): + try: + results = _run_async( + self._client.search(query=query, group_ids=[graph_id], num_results=limit) + ) + edges = [ + { + "uuid": getattr(r, "uuid", ""), + "name": getattr(r, "name", ""), + "fact": getattr(r, "fact", ""), + "source_node_uuid": getattr(r, "source_node_uuid", ""), + "target_node_uuid": getattr(r, "target_node_uuid", ""), + } + for r in (results or []) + ] + return {"edges": edges, "nodes": []} + except Exception as e: + last_exc = e + logger.debug( + f"Graphiti search attempt {attempt + 1}/{max_retries} failed: " + f"{type(e).__name__}: {e}" + ) + if attempt < max_retries - 1: + import time as _time + _time.sleep(delay) + delay *= 2 + # Reconnect in case the Neo4j TCP connection dropped + try: + self._client = self._build_client() + except Exception as rebuild_exc: + logger.warning(f"Graphiti client rebuild failed: {rebuild_exc}") + import traceback as _tb + logger.error( + f"Graphiti search failed after {max_retries} attempts: " + f"{type(last_exc).__name__}: {last_exc}\n{_tb.format_exc()}" ) - edges = [ - { - "uuid": getattr(r, "uuid", ""), - "name": getattr(r, "name", ""), - "fact": getattr(r, "fact", ""), - "source_node_uuid": getattr(r, "source_node_uuid", ""), - "target_node_uuid": getattr(r, "target_node_uuid", ""), - } - for r in (results or []) - ] - return {"edges": edges, "nodes": []} + raise last_exc def add_text(self, graph_id: str, data: str) -> None: from graphiti_core.nodes import EpisodeType diff --git a/backend/app/services/oasis_profile_generator.py b/backend/app/services/oasis_profile_generator.py index 7a42b7b6..e1ec54c5 100644 --- a/backend/app/services/oasis_profile_generator.py +++ b/backend/app/services/oasis_profile_generator.py @@ -11,6 +11,7 @@ Improvements: import json import random +import re import time from typing import Dict, Any, List, Optional from dataclasses import dataclass, field @@ -28,6 +29,19 @@ from .zep_entity_reader import EntityNode, ZepEntityReader logger = get_logger('mirofish.oasis_profile') +def _normalize_topics(value) -> List[str]: + """Ensure interested_topics is always List[str], even if the LLM returns a delimited string or a list with a single packed element.""" + if isinstance(value, str): + value = [value] + if not isinstance(value, list): + return [] + result = [] + for item in value: + if isinstance(item, str) and item.strip(): + result.extend(part.strip() for part in re.split(r'[,;|\n]+', item) if part.strip()) + return result + + @dataclass class OasisAgentProfile: """OASIS Agent Profile data structure""" @@ -204,12 +218,13 @@ class OasisProfileGenerator: default_query=_default_query if _default_query else None ) - # Zep client for enriching context via retrieval + # Graph retrieval client — only initialise Zep when it is the active backend self.zep_api_key = zep_api_key or Config.ZEP_API_KEY self.zep_client = None self.graph_id = graph_id + self._use_graphiti = (Config.GRAPH_BACKEND == "graphiti") - if self.zep_api_key: + if not self._use_graphiti and self.zep_api_key: try: self.zep_client = Zep(api_key=self.zep_api_key) except Exception as e: @@ -274,7 +289,7 @@ class OasisProfileGenerator: mbti=profile_data.get("mbti"), country=profile_data.get("country"), profession=profile_data.get("profession"), - interested_topics=profile_data.get("interested_topics", []), + interested_topics=_normalize_topics(profile_data.get("interested_topics", [])), source_entity_uuid=entity.uuid, source_entity_type=entity_type, ) @@ -290,45 +305,84 @@ class OasisProfileGenerator: return f"{username}_{suffix}" def _search_zep_for_entity(self, entity: EntityNode) -> Dict[str, Any]: + """Retrieve rich context for an entity via graph hybrid search. + + Dispatches to Graphiti (Neo4j) or Zep Cloud depending on the active backend. """ - Retrieve rich information about an entity using the Zep graph hybrid search. + results = {"facts": [], "node_summaries": [], "context": ""} - Zep has no built-in hybrid search endpoint, so edges and nodes are searched - separately and the results are merged. Parallel requests are used for - efficiency. - - Args: - entity: Entity node object - - Returns: - Dictionary containing facts, node_summaries, and context - """ - import concurrent.futures - - if not self.zep_client: - return {"facts": [], "node_summaries": [], "context": ""} - - entity_name = entity.name - - results = { - "facts": [], - "node_summaries": [], - "context": "" - } - - # graph_id is required for searching if not self.graph_id: - logger.debug(f"Skipping Zep retrieval: graph_id not set") + logger.debug("Skipping graph retrieval: graph_id not set") return results - - comprehensive_query = t('progress.zepSearchQuery', name=entity_name) - - def search_edges(): - """Search edges (facts/relationships) - with retry logic""" - max_retries = 3 - last_exception = None - delay = 2.0 + entity_name = entity.name + + if self._use_graphiti: + return self._search_graphiti_for_entity(entity_name, results) + else: + return self._search_zep_cloud_for_entity(entity_name, results) + + def _search_graphiti_for_entity(self, entity_name: str, results: Dict[str, Any]) -> Dict[str, Any]: + """Use the Graphiti backend's search() to retrieve context for an entity.""" + import traceback + from ..graph.factory import get_graph_backend + + max_retries = 3 + delay = 2.0 + last_exc = None + + for attempt in range(max_retries): + try: + backend = get_graph_backend() + query = t('progress.zepSearchQuery', name=entity_name) + search_result = backend.search( + graph_id=self.graph_id, + query=query, + limit=30, + scope="edges" + ) + all_facts = set() + for edge in search_result.get("edges", []): + fact = edge.get("fact", "") + if fact: + all_facts.add(fact) + results["facts"] = list(all_facts) + + context_parts = [] + if results["facts"]: + context_parts.append("Facts:\n" + "\n".join(f"- {f}" for f in results["facts"][:20])) + results["context"] = "\n\n".join(context_parts) + + logger.info(f"Graphiti retrieval complete: {entity_name}, fetched {len(results['facts'])} facts") + return results + except Exception as e: + last_exc = e + if attempt < max_retries - 1: + logger.debug( + f"Graphiti retrieval attempt {attempt + 1} failed ({entity_name}): " + f"{type(e).__name__}: {e} — retrying in {delay}s" + ) + time.sleep(delay) + delay *= 2 + + logger.warning( + f"Graphiti retrieval failed after {max_retries} attempts ({entity_name}): " + f"{type(last_exc).__name__}: {last_exc}\n{traceback.format_exc()}" + ) + return results + + def _search_zep_cloud_for_entity(self, entity_name: str, results: Dict[str, Any]) -> Dict[str, Any]: + """Use the Zep Cloud graph.search() to retrieve context for an entity.""" + import concurrent.futures + + if not self.zep_client: + return results + + comprehensive_query = t('progress.zepSearchQuery', name=entity_name) + + def search_edges(): + max_retries = 3 + delay = 2.0 for attempt in range(max_retries): try: return self.zep_client.graph.search( @@ -339,7 +393,6 @@ class OasisProfileGenerator: reranker="rrf" ) except Exception as e: - last_exception = e if attempt < max_retries - 1: logger.debug(f"Zep edge search attempt {attempt + 1} failed: {str(e)[:80]}, retrying...") time.sleep(delay) @@ -349,11 +402,8 @@ class OasisProfileGenerator: return None def search_nodes(): - """Search nodes (entity summaries) - with retry logic""" max_retries = 3 - last_exception = None delay = 2.0 - for attempt in range(max_retries): try: return self.zep_client.graph.search( @@ -364,7 +414,6 @@ class OasisProfileGenerator: reranker="rrf" ) except Exception as e: - last_exception = e if attempt < max_retries - 1: logger.debug(f"Zep node search attempt {attempt + 1} failed: {str(e)[:80]}, retrying...") time.sleep(delay) @@ -372,18 +421,14 @@ class OasisProfileGenerator: else: logger.debug(f"Zep node search failed after {max_retries} attempts: {e}") return None - + try: - # Run edge and node searches in parallel with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor: edge_future = executor.submit(search_edges) node_future = executor.submit(search_nodes) - - # Collect results edge_result = edge_future.result(timeout=30) node_result = node_future.result(timeout=30) - # Process edge search results all_facts = set() if edge_result and hasattr(edge_result, 'edges') and edge_result.edges: for edge in edge_result.edges: @@ -391,7 +436,6 @@ class OasisProfileGenerator: all_facts.add(edge.fact) results["facts"] = list(all_facts) - # Process node search results all_summaries = set() if node_result and hasattr(node_result, 'nodes') and node_result.nodes: for node in node_result.nodes: @@ -401,7 +445,6 @@ class OasisProfileGenerator: all_summaries.add(f"Related entity: {node.name}") results["node_summaries"] = list(all_summaries) - # Build comprehensive context context_parts = [] if results["facts"]: context_parts.append("Facts:\n" + "\n".join(f"- {f}" for f in results["facts"][:20])) @@ -415,7 +458,7 @@ class OasisProfileGenerator: logger.warning(f"Zep retrieval timed out ({entity_name})") except Exception as e: logger.warning(f"Zep retrieval failed ({entity_name}): {e}") - + return results def _build_entity_context(self, entity: EntityNode) -> str: diff --git a/docs/superpowers/plans/2026-04-25-report-pdf-download.md b/docs/superpowers/plans/2026-04-25-report-pdf-download.md new file mode 100644 index 00000000..96ba2edd --- /dev/null +++ b/docs/superpowers/plans/2026-04-25-report-pdf-download.md @@ -0,0 +1,577 @@ +# Report PDF/MD Download Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Afegir descàrrega del report generat en format MD i PDF des del frontend, amb un botó desplegable que apareix quan el report s'ha completat. + +**Architecture:** S'estén l'endpoint de descàrrega existent `GET /api/report//download` amb el paràmetre `?format=md|pdf`. El backend converteix `full_report.md` → HTML (via `markdown`) → PDF (via `PyMuPDF` / `fitz.Story`). El frontend afegeix un botó desplegable a `Step4Report.vue` amb opcions MD i PDF que obren una URL de descàrrega directa. + +**Tech Stack:** Python `markdown>=3.6` (nova dep), `PyMuPDF>=1.24.0` (ja instal·lat), Vue 3 SPA. + +--- + +## Arxius afectats + +| Arxiu | Canvi | +|---|---| +| `backend/pyproject.toml` | +`"markdown>=3.6"` a dependencies | +| `backend/app/api/report.py` | Ampliar `download_report()` amb `?format` + generar PDF | +| `frontend/src/api/report.js` | +`getReportDownloadUrl(reportId, format)` | +| `frontend/src/components/Step4Report.vue` | +botó desplegable MD/PDF quan `isComplete` | + +--- + +## Task 1: Afegir dependència `markdown` al backend + +**Files:** +- Modify: `backend/pyproject.toml` + +- [ ] **Step 1: Afegir la dependència** + +A `backend/pyproject.toml`, afegir `"markdown>=3.6"` a la llista `dependencies`, just després de `"PyMuPDF>=1.24.0"`: + +```toml + # 文件处理 + "PyMuPDF>=1.24.0", + "markdown>=3.6", +``` + +- [ ] **Step 2: Instal·lar la dependència** + +```bash +cd backend && uv sync +``` + +Resultat esperat: `markdown` s'instal·la sense errors. + +- [ ] **Step 3: Verificar que s'importa correctament** + +```bash +cd backend && uv run python -c "import markdown; print(markdown.__version__)" +``` + +Resultat esperat: imprimeix una versió ≥ 3.6 (p.ex. `3.7`). + +- [ ] **Step 4: Commit** + +```bash +git add backend/pyproject.toml backend/uv.lock +git commit -m "chore(deps): add markdown>=3.6 for PDF generation" +``` + +--- + +## Task 2: Escriure el test de l'endpoint de descàrrega PDF + +**Files:** +- Test: `backend/tests/test_report_download.py` + +- [ ] **Step 1: Escriure el test** + +Crear el fitxer `backend/tests/test_report_download.py`: + +```python +"""Tests for report download endpoint (MD and PDF formats).""" +import io +import os +import json +import tempfile +import pytest +from unittest.mock import patch, MagicMock +from app import create_app + + +@pytest.fixture +def app(): + app = create_app({'TESTING': True}) + yield app + + +@pytest.fixture +def client(app): + return app.test_client() + + +def _make_mock_report(report_id="report_test123", content="# Test Report\n\nHello **world**."): + mock = MagicMock() + mock.report_id = report_id + mock.markdown_content = content + return mock + + +def _make_md_file(tmp_path, report_id, content): + md_path = os.path.join(tmp_path, f"{report_id}_full_report.md") + with open(md_path, 'w', encoding='utf-8') as f: + f.write(content) + return md_path + + +class TestDownloadMD: + def test_download_md_format_param(self, client, tmp_path): + """?format=md returns a .md file.""" + mock_report = _make_mock_report() + md_path = _make_md_file(tmp_path, mock_report.report_id, mock_report.markdown_content) + + with patch('app.api.report.ReportManager.get_report', return_value=mock_report), \ + patch('app.api.report.ReportManager._get_report_markdown_path', return_value=md_path): + resp = client.get(f'/api/report/{mock_report.report_id}/download?format=md') + + assert resp.status_code == 200 + assert 'attachment' in resp.headers.get('Content-Disposition', '') + assert '.md' in resp.headers.get('Content-Disposition', '') + + def test_download_default_is_md(self, client, tmp_path): + """No format param defaults to md.""" + mock_report = _make_mock_report() + md_path = _make_md_file(tmp_path, mock_report.report_id, mock_report.markdown_content) + + with patch('app.api.report.ReportManager.get_report', return_value=mock_report), \ + patch('app.api.report.ReportManager._get_report_markdown_path', return_value=md_path): + resp = client.get(f'/api/report/{mock_report.report_id}/download') + + assert resp.status_code == 200 + assert '.md' in resp.headers.get('Content-Disposition', '') + + +class TestDownloadPDF: + def test_download_pdf_returns_pdf_bytes(self, client, tmp_path): + """?format=pdf returns a valid PDF file.""" + mock_report = _make_mock_report() + md_path = _make_md_file(tmp_path, mock_report.report_id, mock_report.markdown_content) + + with patch('app.api.report.ReportManager.get_report', return_value=mock_report), \ + patch('app.api.report.ReportManager._get_report_markdown_path', return_value=md_path): + resp = client.get(f'/api/report/{mock_report.report_id}/download?format=pdf') + + assert resp.status_code == 200 + assert resp.headers.get('Content-Type', '').startswith('application/pdf') + assert resp.data[:4] == b'%PDF' + + def test_download_pdf_report_not_found(self, client): + """Returns 404 when report does not exist.""" + with patch('app.api.report.ReportManager.get_report', return_value=None): + resp = client.get('/api/report/nonexistent/download?format=pdf') + + assert resp.status_code == 404 + + def test_download_pdf_invalid_format(self, client, tmp_path): + """Returns 400 for unknown format parameter.""" + mock_report = _make_mock_report() + md_path = _make_md_file(tmp_path, mock_report.report_id, mock_report.markdown_content) + + with patch('app.api.report.ReportManager.get_report', return_value=mock_report), \ + patch('app.api.report.ReportManager._get_report_markdown_path', return_value=md_path): + resp = client.get(f'/api/report/{mock_report.report_id}/download?format=docx') + + assert resp.status_code == 400 +``` + +- [ ] **Step 2: Executar els tests per verificar que fallen** + +```bash +cd backend && uv run pytest tests/test_report_download.py -v 2>&1 | head -40 +``` + +Resultat esperat: FAILED (la nova funcionalitat PDF encara no existeix). + +--- + +## Task 3: Implementar la generació de PDF al backend + +**Files:** +- Modify: `backend/app/api/report.py` (funció `download_report`, línies 398–441) + +- [ ] **Step 1: Afegir imports al capdamunt de `report.py`** + +Localitza el bloc d'imports a `backend/app/api/report.py` (línies 1–19) i afegeix: + +```python +import io +import tempfile +import markdown as md_lib +import fitz # PyMuPDF +``` + +Si `import io` ja existeix, no el dupliquis. Afegeix els que faltin just després de `import traceback`. + +- [ ] **Step 2: Afegir la funció helper `_generate_pdf_bytes`** + +Afegeix aquesta funció just **abans** de `@report_bp.route('//download', ...)` (línia 398): + +```python +def _generate_pdf_bytes(markdown_content: str) -> bytes: + """Convert Markdown string to PDF bytes using PyMuPDF (fitz.Story).""" + html_body = md_lib.markdown( + markdown_content, + extensions=['tables', 'fenced_code'] + ) + html = f""" + + + + + +{html_body} +""" + + story = fitz.Story(html) + buf = io.BytesIO() + writer = fitz.DocumentWriter(buf) + mediabox = fitz.paper_rect("a4") + where = mediabox + (36, 36, -36, -36) # margins + more = True + while more: + device = writer.begin_page(mediabox) + more, _ = story.place(where) + story.draw(device) + writer.end_page() + writer.close() + return buf.getvalue() +``` + +- [ ] **Step 3: Substituir la funció `download_report` completa** + +Substitueix tot el contingut de `download_report` (línies 398–441) per: + +```python +@report_bp.route('//download', methods=['GET']) +def download_report(report_id: str): + """ + Download report in the requested format. + + Query params: + format: 'md' (default) | 'pdf' + """ + try: + fmt = request.args.get('format', 'md').lower() + if fmt not in ('md', 'pdf'): + return jsonify({ + "success": False, + "error": f"Unsupported format '{fmt}'. Use 'md' or 'pdf'." + }), 400 + + report = ReportManager.get_report(report_id) + if not report: + return jsonify({ + "success": False, + "error": t('api.reportNotFound', id=report_id) + }), 404 + + md_path = ReportManager._get_report_markdown_path(report_id) + if os.path.exists(md_path): + with open(md_path, 'r', encoding='utf-8') as f: + markdown_content = f.read() + else: + markdown_content = report.markdown_content + + if fmt == 'md': + if os.path.exists(md_path): + return send_file( + md_path, + as_attachment=True, + download_name=f"{report_id}.md" + ) + with tempfile.NamedTemporaryFile(mode='w', suffix='.md', + delete=False, encoding='utf-8') as f: + f.write(markdown_content) + temp_path = f.name + return send_file(temp_path, as_attachment=True, + download_name=f"{report_id}.md") + + # fmt == 'pdf' + pdf_bytes = _generate_pdf_bytes(markdown_content) + return send_file( + io.BytesIO(pdf_bytes), + mimetype='application/pdf', + as_attachment=True, + download_name=f"{report_id}.pdf" + ) + + except Exception as e: + logger.error(f"Failed to download report: {str(e)}") + return jsonify({ + "success": False, + "error": str(e), + "traceback": traceback.format_exc() + }), 500 +``` + +- [ ] **Step 4: Executar els tests** + +```bash +cd backend && uv run pytest tests/test_report_download.py -v +``` + +Resultat esperat: tots els tests en PASS. + +- [ ] **Step 5: Commit** + +```bash +git add backend/app/api/report.py backend/tests/test_report_download.py +git commit -m "feat(report): add PDF download endpoint via PyMuPDF" +``` + +--- + +## Task 4: Afegir helper al frontend API + +**Files:** +- Modify: `frontend/src/api/report.js` + +- [ ] **Step 1: Afegir `getReportDownloadUrl` al final de `report.js`** + +Afegeix al final de `frontend/src/api/report.js`: + +```javascript +/** + * Build the direct download URL for a report. + * @param {string} reportId + * @param {'md'|'pdf'} format + * @returns {string} URL absoluta per fer servir com a href de descàrrega + */ +export const getReportDownloadUrl = (reportId, format = 'md') => { + const base = import.meta.env.VITE_API_BASE_URL || '' + return `${base}/api/report/${reportId}/download?format=${format}` +} +``` + +- [ ] **Step 2: Verificar que el frontend compila sense errors** + +```bash +cd frontend && npm run build 2>&1 | tail -20 +``` + +Resultat esperat: build sense errors. + +- [ ] **Step 3: Commit** + +```bash +git add frontend/src/api/report.js +git commit -m "feat(report): add getReportDownloadUrl helper" +``` + +--- + +## Task 5: Afegir botó desplegable MD/PDF al component Vue + +**Files:** +- Modify: `frontend/src/components/Step4Report.vue` + +Els canvis s'estructuren en 3 sub-passos: (a) importar la funció, (b) afegir el template, (c) afegir els estils. + +- [ ] **Step 1: Afegir l'import de `getReportDownloadUrl`** + +A `Step4Report.vue`, localitza la línia d'imports de `report.js`. Busca quelcom com: + +```javascript +import { getReport, ... } from '../api/report' +``` + +Afegeix `getReportDownloadUrl` a la llista d'imports d'aquell fitxer. Si no hi ha imports de `report.js`, afegeix: + +```javascript +import { getReportDownloadUrl } from '../api/report' +``` + +- [ ] **Step 2: Afegir l'estat reactiu del menú** + +Localitza la línia `const isComplete = ref(false)` (línia ~427) i afegeix just a sota: + +```javascript +const showDownloadMenu = ref(false) +``` + +- [ ] **Step 3: Afegir la funció de tancament del menú al clicar fora** + +Localitza la funció `goToInteraction` (línia ~410) i afegeix just a sobre: + +```javascript +const closeDownloadMenu = () => { showDownloadMenu.value = false } +``` + +- [ ] **Step 4: Afegir el botó desplegable al template** + +Localitza el bloc del botó "next step" al template (línia ~130): + +```html + + + + +``` + +**Nota:** Vue 3 no té `v-click-outside` built-in. Si el projecte no té aquesta directiva, substitueix `v-click-outside="closeDownloadMenu"` per res i afegeix al `