Skip to content

Feature: Support non-English annotations in AI insights #3

@texasbe2trill

Description

@texasbe2trill

Summary

The current insight pipeline and NLP heuristics are optimized for English text. Users reading in other languages may get lower-quality or missing insights. This issue tracks improving multi-language support.

Areas to Investigate

  • Audit services/insights.py for English-specific assumptions (e.g., keyword lists, regex patterns)
  • Test the embedding model (sentence-transformers) with non-English text — some models are multilingual by default
  • Ensure the chat system prompt in services/chat.py works with mixed-language libraries
  • Add sample non-English annotations to test fixtures
  • Document supported languages in the README

Considerations

  • all-MiniLM-L6-v2 (the default model) has limited multilingual support — paraphrase-multilingual-MiniLM-L12-v2 may be a better default
  • Insight classification heuristics may need language-aware variants
  • This could be scoped incrementally — start with detection and graceful fallback, then add full support

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions