Summary
The current insight pipeline and NLP heuristics are optimized for English text. Users reading in other languages may get lower-quality or missing insights. This issue tracks improving multi-language support.
Areas to Investigate
Considerations
all-MiniLM-L6-v2 (the default model) has limited multilingual support — paraphrase-multilingual-MiniLM-L12-v2 may be a better default
- Insight classification heuristics may need language-aware variants
- This could be scoped incrementally — start with detection and graceful fallback, then add full support
Summary
The current insight pipeline and NLP heuristics are optimized for English text. Users reading in other languages may get lower-quality or missing insights. This issue tracks improving multi-language support.
Areas to Investigate
services/insights.pyfor English-specific assumptions (e.g., keyword lists, regex patterns)sentence-transformers) with non-English text — some models are multilingual by defaultservices/chat.pyworks with mixed-language librariesConsiderations
all-MiniLM-L6-v2(the default model) has limited multilingual support —paraphrase-multilingual-MiniLM-L12-v2may be a better default