Real LLMs (observed with anthropic/claude-haiku-4-5 on a 23-agent run)
sometimes return Likert values as JSON strings ('3' not 3). The 4 subagent
validators rejected this with isinstance(v, int), losing ~30% of agents at
N=23. Added a shared coerce_int helper in base.py that accepts ints and
numeric strings, rejects bools/floats/garbage, and is now used by:
- Longitudinal: response values 1-5
- Diversity: Q-sort placements -3..+3 and 6 Likert axes 1-7
- Delphi: R2 and R3 importance/plausibility 1-5
- Scenario: 4 dimensions 1-7
Validators now coerce in place so downstream code sees ints regardless of
the wire format. Added 8 tests (4 unit on coerce_int + 4 per-subagent
contract tests showing stringified values are accepted).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds SchemaValidationFailure exception carrying both retry attempts' raw
output, so audit.jsonl preserves what the model actually said when an
agent's response can't be coerced into the instrument schema. Lets us
diagnose persona-vs-format failures without re-running. Two new tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five tightly-coupled fixes that were causing the interview subsystem to silently
degrade in production:
- C1+C2: `_build_orchestrator` now resolves `graph_id` from
`SimulationManager().get_simulation(sim_id).graph_id` (the real persisted
state) instead of a `graph_id.txt` that nothing in the codebase writes.
`ZepGraphMemoryUpdater(graph_id=...)` is now called with the correct
positional argument; the bare `try/except Exception` that was swallowing the
TypeError is replaced with a narrow fallback that logs explicitly.
- C3: `SimulationManager._on_ready_hooks` / `_on_completed_hooks` are now
class-level (mirroring `SimulationRunner._on_completed_callbacks`).
Hooks registered at app startup now survive across the per-request
`SimulationManager()` instances created by the Flask API, so the T0
longitudinal auto-survey actually fires.
- C4: `ZepGraphMemoryUpdater` gains an explicit `add_text_episode(graph_id, text)`
method for synchronous text writes. `InterviewZepWriter._emit` no longer
silently falls back to a dict-shaped `add_activity` call that the real
implementation rejects (its `add_activity` requires an `AgentActivity`
dataclass).
- C5: `FileSystemPersonaProvider.agent_to_entity()` builds an
`{agent_id: zep_entity_uuid}` map from the persisted profile files; the map
is now passed to `ZepMemoryProvider` so `get_entity_with_context` is called
with real Zep UUIDs instead of `str(agent_id)`. To make this work,
`OasisProfileGenerator._save_reddit_json` and `_save_twitter_csv` now persist
`source_entity_uuid` (Reddit JSON: optional field; Twitter CSV: appended
column).
Tests: 51 unit + 2 integration pass (was 40 + 2). New tests lock in each fix:
- `test_hooks_survive_across_instances` (C3)
- `test_build_orchestrator_reads_graph_id_from_state` (C1+C2+C5)
- `test_build_orchestrator_falls_back_when_state_missing` (C1+C2)
- `test_emit_uses_add_text_episode_with_graph_id`,
`test_emit_raises_when_updater_lacks_add_text_episode`,
`test_real_updater_exposes_add_text_episode` (C4)
- `test_agent_to_entity_from_reddit_json`,
`test_agent_to_entity_empty_when_no_field`,
`test_agent_to_entity_falls_back_to_twitter_csv`,
`test_agent_to_entity_reddit_takes_precedence` (C5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add backend/app/services/interviews/lifecycle.py with install_hooks() that
registers on_ready (pre-survey) and on_completed (post-survey + synthesis)
daemon-thread callbacks on a SimulationManager.
- Add SimulationRunner.register_on_completed() / _fire_on_completed() so
external callbacks can be notified when _monitor_simulation transitions to
COMPLETED (both exit-code-0 path and simulation_end event path).
- Wire both in app/__init__.py: create singleton SimulationManager, install
lifecycle hooks, and register its _notify_on_completed with SimulationRunner.
- Add test_lifecycle.py: verifies install_hooks registers one callable for each
of ready and completed.
- All 40 unit tests + 2 integration tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background threads (graph building, simulation prep, report generation,
profile generation) now inherit the requesting user's locale preference.
Previously these fell back to 'zh' because Flask request context was
unavailable in spawned threads.
Ensure poster_type stays PascalCase English and stance stays English enum
values regardless of language setting. Only natural language fields follow
the user's language preference.
The language instruction was causing LLM to change entity/relation naming
conventions. Now explicitly enforce PascalCase/UPPER_SNAKE_CASE for technical
identifiers while only applying language preference to description fields.
- Decreased the maximum tool calls per section from 8 to 5.
- Reduced the maximum iterations in the ReACT loop from 8 to 5, streamlining the report generation process.
- Reduced maximum tool calls per chat from 5 to 2 for improved efficiency.
- Simplified system prompt to focus on concise responses and report content.
- Implemented report content retrieval with length limitation to prevent context overflow.
- Adjusted tool call execution to limit to one call per iteration, enhancing clarity in responses.
- Updated user message prompts to encourage concise answers based on retrieved data.
- Increased the maximum tool calls per section from 4 to 8, enhancing the agent's capabilities.
- Raised the maximum reflection rounds from 2 to 3 to allow for deeper analysis.
- Adjusted the maximum tool calls per chat from 3 to 5 for improved interaction.
- Expanded the maximum agents for interviews from 5 to 20, facilitating more comprehensive data gathering.
- Increased the maximum iterations for ReACT loops from 5 to 8 and from 3 to 5 in different contexts, optimizing the report generation process.
- Updated the `to_text` method in the `PanoramaResult` class to provide complete outputs for current facts, historical facts, and involved entities, improving data visibility.
- Modified the `to_text` method in the `AgentInterview` class to display the full agent bio without truncation.
- Adjusted the `ZepToolsService` class to retrieve all related entity details and facts without limiting the output, ensuring comprehensive data representation.
- Renamed log_section_complete to log_section_content to better reflect its purpose, and added is_subsection parameter for improved logging of subsection content.
- Introduced log_section_full_complete method to log the completion of entire sections, including all subsections, enhancing tracking of report generation status.
- Adjusted maximum tool call limits for sections and chats to optimize performance during report generation.
- Updated system prompts and user prompts in the ReportAgent class to clarify the report's focus on future predictions rather than current analysis.
- Enhanced the Step3Simulation and Step4Report components for improved user experience, including UI updates and better handling of report generation states.
- Updated the AgentInterview class to display the full agent bio, truncating only if it exceeds 1000 characters for better readability.
- Enhanced the Step4Report component to include structured display for tool results, allowing users to toggle between raw and structured views for various tools, improving user experience and clarity.
- Introduced new components for parsing and displaying results from different tools, including InsightForge, PanoramaSearch, InterviewAgents, and QuickSearch, providing a comprehensive view of the data.