Skip to content

feat: design pluggable Index backend system #10

@rorybyrne

Description

@rorybyrne

Summary

The current index configuration is coupled to specific backend implementations (VectorBackendConfig). We want third-party developers to be able to create custom index backends that plug into OSA without modifying core code.

Current State

# osa/config.py - coupled to specific implementations
from osa.infrastructure.index.vector.config import VectorBackendConfig

AnyBackendConfig = Annotated[
    Union[VectorBackendConfig],  # Must modify this for each new backend
    Field(discriminator=None),
]

Adding a new backend (e.g., Elasticsearch, Meilisearch) requires modifying config.py.

Goals

  1. Third-party backends can be installed as packages
  2. No modification to OSA core code required
  3. Config validation happens at startup (fail fast)
  4. Clear error messages for invalid configs
  5. Type-safe config within the backend implementation

Proposed Design

1. Backend Protocol with Config Class

The StorageBackend protocol should declare its config class:

# osa/sdk/index/backend.py
from typing import Any, ClassVar, Protocol

class StorageBackend(Protocol):
    """Protocol for pluggable index storage backends."""
    
    # The config class for this backend - used for validation at load time
    config_class: ClassVar[type[BackendConfig]]
    
    @property
    def name(self) -> str: ...
    
    async def ingest(self, srn: str, record: dict[str, Any]) -> None: ...
    async def delete(self, srn: str) -> None: ...
    async def query(self, q: str, limit: int = 20) -> QueryResult: ...
    async def health(self) -> bool: ...

2. Backend Registration via Entry Points

Third-party packages register their backends:

# Third-party pyproject.toml
[project.entry-points."osa.index.backends"]
elasticsearch = "my_plugin.elasticsearch:ElasticsearchBackend"

3. Discovery and Validation at Startup

# osa/infrastructure/index/registry.py
from importlib.metadata import entry_points

def discover_backends() -> dict[str, type[StorageBackend]]:
    """Discover all registered index backends."""
    eps = entry_points(group="osa.index.backends")
    return {ep.name: ep.load() for ep in eps}

def validate_index_config(backend_name: str, raw_config: dict) -> BackendConfig:
    """Validate raw config against the backend's config class."""
    backends = discover_backends()
    backend_cls = backends.get(backend_name)
    if backend_cls is None:
        raise ConfigError(f"Unknown index backend: {backend_name}. "
                         f"Available: {list(backends.keys())}")
    
    config_cls = backend_cls.config_class
    return config_cls.model_validate(raw_config)

4. Simplified Core Config

# osa/config.py - no longer coupled to implementations
class IndexConfig(BaseModel):
    """Configuration for a named index."""
    backend: str  # "vector", "elasticsearch", etc.
    config: dict[str, Any]  # Validated against backend's config_class at load time

5. Example Third-Party Backend

# my_plugin/elasticsearch.py
from pydantic import BaseModel
from osa.sdk.index.backend import StorageBackend, BackendConfig

class ElasticsearchConfig(BackendConfig):
    host: str
    port: int = 9200
    index_name: str
    api_key: str | None = None

class ElasticsearchBackend:
    config_class = ElasticsearchConfig  # Used for validation
    
    def __init__(self, name: str, config: ElasticsearchConfig) -> None:
        self._name = name
        self._config = config
        # ...

Built-in Backends

OSA ships with:

  • vector - ChromaDB + sentence-transformers (current)

These are registered via entry points in OSA's own pyproject.toml:

[project.entry-points."osa.index.backends"]
vector = "osa.infrastructure.index.vector:VectorStorageBackend"

Tasks

  • Update StorageBackend protocol to include config_class
  • Add entry point registration for built-in vector backend
  • Implement backend discovery in DI provider
  • Update config loading to validate against discovered config classes
  • Simplify IndexConfig to use dict[str, Any]
  • Document how to create third-party backends
  • Add integration test for plugin discovery

Future Considerations

  • Same pattern should apply to Ingestors
  • Consider a CLI command to list available backends: osa backends list
  • Consider config schema export for documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    design-neededNeeds architectural discussion before implementationfeatureNew functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions