Summary
The current index configuration is coupled to specific backend implementations (VectorBackendConfig). We want third-party developers to be able to create custom index backends that plug into OSA without modifying core code.
Current State
# osa/config.py - coupled to specific implementations
from osa.infrastructure.index.vector.config import VectorBackendConfig
AnyBackendConfig = Annotated[
Union[VectorBackendConfig], # Must modify this for each new backend
Field(discriminator=None),
]
Adding a new backend (e.g., Elasticsearch, Meilisearch) requires modifying config.py.
Goals
- Third-party backends can be installed as packages
- No modification to OSA core code required
- Config validation happens at startup (fail fast)
- Clear error messages for invalid configs
- Type-safe config within the backend implementation
Proposed Design
1. Backend Protocol with Config Class
The StorageBackend protocol should declare its config class:
# osa/sdk/index/backend.py
from typing import Any, ClassVar, Protocol
class StorageBackend(Protocol):
"""Protocol for pluggable index storage backends."""
# The config class for this backend - used for validation at load time
config_class: ClassVar[type[BackendConfig]]
@property
def name(self) -> str: ...
async def ingest(self, srn: str, record: dict[str, Any]) -> None: ...
async def delete(self, srn: str) -> None: ...
async def query(self, q: str, limit: int = 20) -> QueryResult: ...
async def health(self) -> bool: ...
2. Backend Registration via Entry Points
Third-party packages register their backends:
# Third-party pyproject.toml
[project.entry-points."osa.index.backends"]
elasticsearch = "my_plugin.elasticsearch:ElasticsearchBackend"
3. Discovery and Validation at Startup
# osa/infrastructure/index/registry.py
from importlib.metadata import entry_points
def discover_backends() -> dict[str, type[StorageBackend]]:
"""Discover all registered index backends."""
eps = entry_points(group="osa.index.backends")
return {ep.name: ep.load() for ep in eps}
def validate_index_config(backend_name: str, raw_config: dict) -> BackendConfig:
"""Validate raw config against the backend's config class."""
backends = discover_backends()
backend_cls = backends.get(backend_name)
if backend_cls is None:
raise ConfigError(f"Unknown index backend: {backend_name}. "
f"Available: {list(backends.keys())}")
config_cls = backend_cls.config_class
return config_cls.model_validate(raw_config)
4. Simplified Core Config
# osa/config.py - no longer coupled to implementations
class IndexConfig(BaseModel):
"""Configuration for a named index."""
backend: str # "vector", "elasticsearch", etc.
config: dict[str, Any] # Validated against backend's config_class at load time
5. Example Third-Party Backend
# my_plugin/elasticsearch.py
from pydantic import BaseModel
from osa.sdk.index.backend import StorageBackend, BackendConfig
class ElasticsearchConfig(BackendConfig):
host: str
port: int = 9200
index_name: str
api_key: str | None = None
class ElasticsearchBackend:
config_class = ElasticsearchConfig # Used for validation
def __init__(self, name: str, config: ElasticsearchConfig) -> None:
self._name = name
self._config = config
# ...
Built-in Backends
OSA ships with:
vector - ChromaDB + sentence-transformers (current)
These are registered via entry points in OSA's own pyproject.toml:
[project.entry-points."osa.index.backends"]
vector = "osa.infrastructure.index.vector:VectorStorageBackend"
Tasks
Future Considerations
- Same pattern should apply to Ingestors
- Consider a CLI command to list available backends:
osa backends list
- Consider config schema export for documentation
Summary
The current index configuration is coupled to specific backend implementations (VectorBackendConfig). We want third-party developers to be able to create custom index backends that plug into OSA without modifying core code.
Current State
Adding a new backend (e.g., Elasticsearch, Meilisearch) requires modifying
config.py.Goals
Proposed Design
1. Backend Protocol with Config Class
The
StorageBackendprotocol should declare its config class:2. Backend Registration via Entry Points
Third-party packages register their backends:
3. Discovery and Validation at Startup
4. Simplified Core Config
5. Example Third-Party Backend
Built-in Backends
OSA ships with:
vector- ChromaDB + sentence-transformers (current)These are registered via entry points in OSA's own
pyproject.toml:Tasks
StorageBackendprotocol to includeconfig_classIndexConfigto usedict[str, Any]Future Considerations
osa backends list