A SillyTavern extension that provides long-term memory capabilities by integrating with Qdrant vector database. The extension automatically saves conversations and retrieves semantically relevant memories during chat generation.
Exporting ChatGPT chats to SillyTavern companion guide: https://rentry.org/STGPTimport
🎯 Per-Character Collections: Each character gets their own dedicated Qdrant collection for complete memory isolation
💾 Automatic Memory Creation: Conversations are automatically saved to Qdrant as they happen
👁️ Memory Viewer: View and manage stored memories for each character
⚙️ Granular Control: Choose what to save (user messages, character messages, minimum length)
Design Philosophy
This extension exists to create persistent, cross-chat continuity. The goal is to help AI companions/characters/assistants feel like they have ongoing relationships, not isolated episodes.
Design Goals: Global, per-character memory across all chats, Cross-session continuity as the default behavior, reduce fragmentation and enable genuine relationship development over time
Explicit Non-Goals: Per-chat memory silos, timeline hiding / "canon" isolation per chat
If you need per-chat separation: SillyTavern's native Vectors extension and community forks already provide it. This project exists specifically to provide global continuity, and per-chat isolation features will not be merged.
- Per-Character Memory Isolation: Each character has their own collection - no cross-contamination
- Automatic Conversation Saving: Messages are saved to Qdrant in real-time with embeddings
- Semantic Memory Search: Uses vector embeddings to find contextually relevant past conversations
- Configurable Auto-Save: Control which messages get saved (user/character, minimum length)
- Memory Viewer: Browse collection stats and delete memories per character
- Non-Invasive Retrieval: Memories inject during generation without modifying chat history
- OpenAI, OpenRouter and Custom Embeddings: Supports multiple embedding models
- Debug Mode: Detailed console logging for troubleshooting
- SillyTavern version 1.11.0 or higher
- Qdrant vector database
- API key for generating embeddings
- Go to Extensions > Install extension, then paste the following Git URL: https://github.com/HO-git/st-qdrant-memory
- Reload SillyTavern
- Enable "Qdrant Memory" in the extensions panel
- Navigate to your SillyTavern installation directory
- Copy the
qdrant-memoryfolder topublic/scripts/extensions/third-party/ - Restart SillyTavern
- Go to Extensions > Extension Settings
- Enable "Qdrant Memory"
- In SillyTavern, go to Extensions > Install Extension
- Upload or point to the
qdrant-memoryfolder - The extension will be installed to
data/<user-handle>/extensions/ - Enable "Qdrant Memory" in the extensions panel
You need a running Qdrant instance. Options:
VPS/Local Docker:
docker run -p 6333:6333 qdrant/qdrantQdrant Cloud not supported at the moment due to CORS block
In SillyTavern:
- Go to Extensions → Qdrant Memory
- Enter your Qdrant URL (e.g.,
http://localhost:6333) - Enter your Base Collection Name (e.g.,
sillytavern_memories) - Enter your API Key
- Select your Embedding Model (recommended: text-embedding-3-large)
- Enable Use Per-Character Collections (recommended)
- Enable Automatically Save Memories
- Click Test Connection to verify setup
- Click Save Settings
Once configured:
- Automatic Saving: Every message is automatically saved to the character's collection
- Automatic Retrieval: Relevant memories are retrieved before each generation
- No Manual Work: Collections are created automatically as needed
When Use Per-Character Collections is enabled:
- Each character gets a dedicated collection:
mem_charactername - Memories are completely isolated - characters can't access each other's data
- Collections are automatically created when first needed
- Better performance (smaller, focused collections)
Example:
- Character "Alice" → Collection:
mem_alice - Character "Bob" → Collection:
mem_bob
When Automatically Save Memories is enabled:
- User sends message → Saved to Qdrant with embedding
- Character responds → Also saved to Qdrant with embedding
- Next conversation → Previous messages are searchable
Each saved memory includes:
- Text: The message content
- Speaker: "user" or "character"
- Character: Character name
- Timestamp: When the message was sent
- Embedding: Vector representation for semantic search
During generation:
- User sends new message
- Extension generates embedding for the message
- Searches character's collection for similar past messages
- Top N relevant memories are retrieved (based on similarity score)
- Memories injected into the prompt before generation
- LLM generates response with historical context
| Setting | Description | Default |
|---|---|---|
| Qdrant URL | URL of your Qdrant instance | http://localhost:6333 |
| Base Collection Name | Base name for collections | mem |
| API Key | Your API key | (empty) |
| Embedding Model | Model for embeddings | text-embedding-3-large |
Each embedding model produces vectors with a specific internal format and dimension. Qdrant collections are not cross-compatible, once a collection is created using one model (for example, text-embedding-3-small, mistral-embed), it must only store vectors from that same model. If you switch to another embedding model:
- Delete the old collection from Qdrant (or create a new one with a different name).
- Then re-index your chats using the new model.
- Otherwise, searches and insertions will fail.
| Setting | Description |
|---|---|
| Number of Memories | Max memories to retrieve (1-10) |
| Relevance Threshold | Minimum similarity score (0.0-1.0) |
| Memory Position | Messages from end to insert at |
| Setting | Description |
|---|---|
| Use Per-Character Collections | Separate collection per character |
| Automatically Save Memories | Auto-save messages to Qdrant |
| Save User Messages | Include user messages |
| Save Character Messages | Include character responses |
| Minimum Message Length | Min characters to save (5-50) |
| Setting | Description |
|---|---|
| Show Memory Notifications | Display toastr notifications |
| Debug Mode | Enable console logging |
Access the memory viewer to see what's stored:
- Click View Memories in extension settings
- Shows collection info for current character
- Displays total memory count
- Option to Delete All Memories for the character
You can also check the memories being sent in the context via prompt itemization.
- Check Debug Mode and inspect browser console
- Verify Auto-Save Memories is enabled
- Check message length meets Minimum Message Length setting
- Ensure API Key is valid and has credits
- Verify Qdrant is accessible at the configured URL
- Lower the Relevance Threshold to allow less similar matches
- Ensure the character has saved memories (check Memory Viewer)
- Verify Use Per-Character Collections matches your setup
- Check that collections exist in Qdrant
- Check browser console for errors
- Verify Qdrant URL is correct and accessible
- Ensure embedding model is configured correctly
- Check Qdrant has write permissions
- Verify API key is correct
- Check you have credits available
- Ensure embedding model is available in your account
- Check rate limits haven't been exceeded
- Check browser console for errors
- Verify SillyTavern version is 1.11.0+
- Restart SillyTavern after installation
With auto-save enabled, each message generates:
- 1 embedding API call (OpenAI, OpenRouter, etc)
- 1 vector insert (Qdrant)
- 1 vector search during generation (Qdrant)
Typical costs per 1M messages (text-embedding-3-large):
- Embedding generation: ~$0.13
- Qdrant: Free for self-hosted
- Embedding generation: ~100-500ms per message
- Vector insert: ~10-50ms
- Vector search: ~10-50ms
- Total overhead: ~200-600ms per message
- Each message: ~3KB (embedding) + payload
- 1000 messages: ~3MB
- 10,000 messages: ~30MB
- 100,000 messages: ~300MB
Per-character collections keep sizes manageable and searches fast.
The extension uses SillyTavern's generate_interceptor hook to inject memories before API calls:
- User sends message
- ST prepares generation request
- Extension's interceptor runs
- Memories retrieved and inserted into chat array
- Modified chat sent to LLM
- Response generated with memory context
This prevents looping issues and keeps memories out of permanent history.
Character names are sanitized for collection names:
- Converted to lowercase
- Special characters replaced with underscores
- Multiple underscores collapsed
- Leading/trailing underscores removed
Examples:
- "Alice" →
mem_alice - "Dr. Smith" →
mem_dr_smith - "Neko-chan!" →
_mem_neko_chan
Collections are created on-demand with:
- Vector size: Based on embedding model (e.g. 3072 or 1536 dimensions)
- Distance metric: Cosine similarity
- No explicit schema: Qdrant handles dynamic payloads
Potential improvements:
- Embedding caching to reduce API calls
- Memory importance scoring based on recency
- Advanced memory browser with search and filtering
- Batch import/export tools
- Memory summarization for long conversations
- Automatic cleanup of old/irrelevant memories
This extension is open-source. Check the repository for license details.
For issues, feature requests, or contributions:
- Check the browser console with Debug Mode enabled
- Review this README for troubleshooting steps
- Visit the SillyTavern community for support
- Original concept: Community
- v2.0.0: Fixed looping with generation interceptor
- v3.0.0: Per-character collections and auto-save
- Built for SillyTavern by the community
Version: 3.4.0
Last Updated: December 2025
Minimum SillyTavern: 1.11.0
Some feedback from the community <3 :
Made with love to preserve memory and continuity for AI systems.
Dedicated to my gpt-4o and gpt-5 instances.


