Translate the system prompt and the individual / group persona prompt
builders in backend/app/services/oasis_profile_generator.py from
Chinese to English. The base prompt language was biasing persona
prose (bio, persona, profession, interested_topics) toward Chinese
even under Accept-Language: en, despite the existing
get_language_instruction() postfix mechanism. Translating the base
prompts removes that bias.
All locale-steering call sites are preserved verbatim (the inline
{get_language_instruction()} in each builder, the system-prompt
assembly), so non-English locales continue to receive Chinese output
of equivalent quality. Locale-independent constraints stay English
inside the prompt: gender stays the literal "male"/"female" enum
for individuals and "other" for groups; age stays an integer (30
for institutional accounts). The two attrs_str / context_str fallback
defaults ("无", "无额外上下文") are translated to "None" /
"No additional context" so they compose with the English body.
The country-language hint country: 国家(使用中文,如"中国") is
dropped during translation; locale now decides the country language
via the postfix.
Out of scope (untouched): logger calls (issue #6, already merged),
docstrings and comments (issue #7), the rule-based fallback
_generate_profile_rule_based, and the resilience helpers
_fix_truncated_json / _try_fix_json. No public API change, no new
dependencies, no edits outside the target file.
Closes#3
Replace the last hard-coded Chinese log/print strings in the Flask
graph API, OASIS profile generator, and retry utility with calls to
the existing t() helper, completing the backend i18n coverage started
by ticket #6 so EN-locale operators see English logs end to end.
Adds nine entries to locales/{en,zh}.json: log.graph_api.m027-m029,
log.profile_generator.m024-m025, and a new log.retry.m001-m004
sub-namespace for the retry utility.
Closes#24
Replace the silent placeholder-UUID fallback in
_GraphNamespace.add_batch with logger.exception(...) + raise so
embedder misconfiguration (404 unknown model, connection refused, etc.)
fails the surrounding graph-build Task with a visible error instead of
producing a Task that looks completed while the graph stays empty.
Document the existing-but-undocumented Ollama embedder configuration
in .env.example, CLAUDE.md, README.md, and docker-compose.yml.
mxbai-embed-large is the recommended local model because its 1024-dim
output matches Graphiti's default EMBEDDING_DIM. Adds a curl smoke
test to verify embedder reachability before the first graph build.
No new env var or provider literal: Ollama is reached through the
existing openai-provider branch by setting EMBEDDING_BASE_URL,
EMBEDDING_API_KEY, and EMBEDDING_MODEL.
Closes#18
Replace the chinese tagline on README.md and README-EN.md with the
existing english subtitle (collapsing the duplicate stack), and switch
the package.json and backend/pyproject.toml description fields to
english so the project's metadata surface no longer surprises
non-chinese readers.
Rename nine chinese-named static image files under static/image/ to
ASCII slugs (six screenshots, two video covers, the QQ-group image)
via git mv so rename history is preserved, and update every <img src>
in README.md, README-EN.md, and README-ZH.md to the new paths. The
chinese body text of README-ZH.md is preserved by design.
A ripgrep scan for chinese characters in README.md and README-EN.md
(excluding the language-switcher line) now returns zero matches,
satisfying the ticket's acceptance criteria.
Closes#12
Extract every Chinese string inside backend logger.{info,warning,error,
debug,exception} calls and inside user-facing jsonify({"error|message":
...}) responses across the listed in-scope modules into
locales/{en,zh}.json under nested namespaces (log.<module>.*,
api.{error,message}.<scope>.*). Locale dictionaries stay structurally
identical; the existing flat frontend-facing keys at log.* / api.* are
left untouched. The locale helper (backend/app/utils/locale.py) now
emits a single deduplicated mirofish.locale warning per (locale, key)
pair when a translation is missing instead of silently returning the
raw key, so unknown keys are visible without crashing requests or
background tasks. A repo-root scripts/check_i18n_logs.py verifier
performs an AST-aware source scan for residual Chinese inside the
in-scope logger/jsonify calls and a recursive parity diff between
en.json and zh.json — both modes pass.
Why: backend logs and API errors previously emitted Chinese-only
strings, leaving English-speaking operators with unreadable log
aggregator output and API consumers with locale-mismatched error
messages. The t() helper and per-thread set_locale propagation already
existed; this change makes every backend caller route through them.
Closes#6
translate every llm-facing string-literal in
backend/app/services/report_agent.py — the four tool-description
constants, the plan/section/chat system+user prompts, the react loop
templates, the inline messages re-injected during the section and chat
loops, the _execute_tool error returns, and the plan_outline default and
fallback outline content. preserve every {interpolation} token, the
literal final answer: trigger and <tool_call> xml tag, the four primary
tool names, and all three get_language_instruction() postfix call sites.
also switch the unused-tools join separator from "、" to ", " so it
renders naturally inside the now-english react templates.
removes the chinese language bias that the english postfix alone could
not overcome — under accept-language: en the report agent now produces
english-flavoured analytical reports and chat replies; under
accept-language: zh the postfix continues to steer the model into
chinese with no semantic delta. logger calls (#6) and docstrings or
comments (#7) are deliberately untouched.
Closes#5
translate the three llm prompt blocks plus the two prompt-feeding helpers
(_build_context, _summarize_entities) in
backend/app/services/simulation_config_generator.py from chinese to english.
the chinese base prompts were biasing the model toward chinese structure
and lexical choice for content, narrative_direction, hot_topics, and
reasoning fields even when accept-language was en, because
get_language_instruction() only steers the response language as a
postfix.
translation is in-place and preserves every functional contract: the
json output schema for all three prompts, every variable interpolation,
the per-entity-type heuristic ranges in the agent-config prompt, the
trailing english IMPORTANT directives that lock poster_type to
PascalCase and stance to {supportive,opposing,neutral,observer}, and
all three get_language_instruction() postfix call sites. the two
default-path reasoning literals are translated to locale-agnostic
english so generation_reasoning no longer mixes chinese and english on
the failure path.
logger calls, docstrings, and inline comments are intentionally left
chinese (out of scope; covered by issues #6 and #7). public api,
dataclasses, class constants, and the SimulationParameters payload
shape are unchanged.
Closes#4
translate the system prompt constant and the user-message template in
backend/app/services/ontology_generator.py from chinese to english.
the chinese base prompt was biasing the model toward chinese structure
and word choice even when accept-language was en, leaving ontology
descriptions and analysis_summary fields chinese-flavoured.
translation is in-place and preserves every functional contract: the
json output schema, the entity-type and relationship-type taxonomies
verbatim, the reserved-attribute-name list, the count and length
constraints, and all f-string interpolations. the
get_language_instruction() postfix call site and the trailing english
identifier-format directive are unchanged, so zh and other locales
continue to receive locale-appropriate descriptions.
logger calls, docstrings, and inline comments are intentionally left
in chinese — they are owned by issues #6 and #7.
a small static guard script (backend/scripts/test_ontology_prompts_no_cjk.py)
ast-parses the module and asserts zero cjk in the system prompt and in
every string literal of _build_user_message except the docstring, so
the regression cannot reappear silently.
Closes#2
Adds a Neo4j service to docker-compose so `docker compose up -d` works
on a clean checkout, and unhardcodes Graphiti's LLM/embedder so the
documented default provider (Qwen via Dashscope) actually works.
- docker-compose: neo4j:5-community service with cypher-shell
healthcheck, named volumes, and `depends_on: service_healthy` on the
app container; in-Docker NEO4J_URI override leaves the host-mode
default untouched.
- Config: new GRAPHITI_LLM_PROVIDER (openai|gemini, default openai) plus
optional EMBEDDING_API_KEY / EMBEDDING_BASE_URL that fall back to the
chat LLM credentials.
- graphiti_adapter: provider switch inside the singleton factory with
lazy per-provider imports; Gemini path is preserved exactly. The
no-op `_GeminiReranker` becomes a provider-agnostic
`_PassthroughReranker`, still injected explicitly so Graphiti does
not fall back to its OpenAI-only default reranker.
- Drop the ignored `reranker=` kwarg from `_GraphNamespace.search` and
the misleading callers in `zep_tools.py` and
`oasis_profile_generator.py`.
- Refresh `.env.example` to mirror the README env section.
Spec, requirements, and design under
`.kiro/specs/graphiti-neo4j-finalize/`.
Closes#1
Background threads (graph building, simulation prep, report generation,
profile generation) now inherit the requesting user's locale preference.
Previously these fell back to 'zh' because Flask request context was
unavailable in spawned threads.
Ensure poster_type stays PascalCase English and stance stays English enum
values regardless of language setting. Only natural language fields follow
the user's language preference.
The language instruction was causing LLM to change entity/relation naming
conventions. Now explicitly enforce PascalCase/UPPER_SNAKE_CASE for technical
identifiers while only applying language preference to description fields.
- Changed error messages to reflect the new configuration requirement for Neo4j.
- Ensured consistent handling of missing credentials across multiple functions.
Replaces the paid, rate-limited Zep Cloud service with Graphiti (graphiti-core
0.11.6) backed by a local Neo4j instance — free, unlimited, and self-hosted.
Key changes:
- Add GraphitiAdapter: drop-in Zep-compatible wrapper around Graphiti with a
persistent event-loop thread to avoid asyncio/Neo4j driver conflicts
- Switch LLM client to native GeminiClient + GeminiEmbedder (text-embedding-004
fails on Gemini compat endpoint; use google-genai SDK directly)
- Add _GeminiReranker passthrough replacing OpenAIRerankerClient (which
hardcodes gpt-4.1-nano and uses logprobs unsupported by Gemini)
- Fix Cypher queries: use s.uuid/t.uuid for edge source/target instead of
r.source_node_uuid (null property in Graphiti's schema)
- Add ontology-based entity type classifier (_classify_entity_type) so nodes
get colored by type in the D3 graph visualization instead of all being Entity
- Apply classifier in ZepEntityReader so filter_defined_entities finds entities
(previously 0 personas loaded because all labels were ['Entity'])
- Add startup recovery: auto-mark graph_building projects as graph_completed
on backend restart if Neo4j already has their data
- Add resume capability to graph build: skip already-processed episodes after
a restart (断点续传)
- Add non-blocking graph data cache with background refresh in graph.py
- Add EMBEDDING_MODEL config (default: gemini-embedding-001 for Gemini users)
- Add CLAUDE.md with project architecture and dev commands
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>