AI-powered curator for intentional collectors.
[Work in Progress] Exploring how to transform semantically coherent hoards into beautiful living documentation substrates with LLM-powered organization and curation.
- Domain-specific intelligence: Git metadata, ID3 tags, EXIF data, citations, and semantic understanding
- LLM-powered analysis: Bespoke categorization and rich description generation (perfect context for agents)
- Self-healing documentation: Collection indices that adapt to filesystem changes using semantic diffing and persisted structural memory
- Flexible configuration: All OpenAI compatible providers supported - local (LM Studio, Ollama), cloud (OpenRouter, Pollinations), or custom endpoints
- Multi-format outputs: Markdown documentation, interactive HTML index, JSON export
- Collection types: Repositories, Obsidian vaults, documents, media files, research papers, creative projects, datasets
Repository collections are partially implemented with basic indexing and description generation. The broader vision of multi-format collection support, plugin architecture, and distributed coordination is in active development.
See .kiro/specs/collectivist/ for detailed requirements and design documentation.
Note: Only repository collections are currently functional. Other collection types are planned.
# Planned: pip install collectivist
# collectivist init ~/my-collection --standardThe complete multi-collection system with interactive UI, curation loops, and automation is in development.
- Current implementation works with LM Studio and other OpenAI-compatible endpoints. Configuration is evolving as the system develops.
This is experimental software. Expect breaking changes and incomplete features as the architecture evolves.
- Repositories: Git-aware scanning, commit summaries, category taxonomy (partially implemented)
- Research: Citation extraction, topic clustering, reading status (planned)
- Media: Timeline organization, mood/genre inference, EXIF metadata (planned)
- Creative: Version tracking, asset linking, format intelligence (planned)
- Datasets: Schema inference, sample previews, data provenance (planned)
- Documents: Text extraction, metadata parsing, content analysis (planned)
The collection.yaml schema is evolving as different collection types are implemented.
Repository Collection Example:
collection_type: repositories
status: "✓" # Status glyph for up-to-date repos
name: my-repos
path: /path/to/repos
categories:
- phext_hyperdimensional # Multi-dimensional text systems
- ai_llm_agents # AI agents and LLM infrastructure
- terminal_ui # Terminal UI frameworks and components
- creative_aesthetic # Art, music, visualization tools
- dev_tools # Development utilities and build tools
- esoteric_experimental # Experimental and occult systems
- system_infrastructure # System-level networking tools
- utilities_misc # General utilities and miscellaneous
exclude_hidden: true
scanner_config:
always_pull: {} # Repos to auto-pull: {repo_name: true}
fetch_timeout: 30 # Git fetch timeout in secondsMedia Collection Example:
collection_type: media
status: "🎵" # Status glyph for media collections
name: my-media
path: /path/to/media
categories:
- music_albums # Full album collections
- music_singles # Individual tracks
- video_movies # Movie files
- video_series # TV series and episodes
- images_photos # Photography collections
- images_artwork # Digital art and graphics
exclude_hidden: true
scanner_config:
extract_metadata: true # Extract ID3, EXIF, etc.
supported_formats: # File extensions to scan
audio: [".mp3", ".flac", ".wav", ".m4a"]
video: [".mp4", ".mkv", ".avi", ".mov"]
image: [".jpg", ".png", ".gif", ".webp"]Planned Plugin System: Collectivist will include domain-specific plugins for different collection types. The plugin architecture is being designed to automatically select appropriate plugins based on collection type detection.
Development Status:
- repositories - Basic implementation exists (functional but evolving)
- obsidian vaults - Planned
- media libraries - Planned
- document libraries - Planned
Plugin Discovery: The automatic collection type detection and plugin routing system is in development.
Custom Plugins: Plugin interface is being designed - see specs for current architecture plans.
Initialize for a specific type of library or get a custom schema.
Discovers items and note metadata using domain-specific plugins.
Generates LLM descriptions and collection overview with concurrent workers.
Issues and PRs welcome! This is an exploration of collection-first AI curation.
Private research tool - not for distribution.
Infrastructure as palimpsest, data as compiled breath 🜍