Per-connector JSON schemas + config validation & discovery APIs (credentials, source, sink) (support MCP client)

# Add a schema-driven connector catalog to the Nexla Python SDK

Each supported connector ships with JSON Schema files describing:
- Data credentials configuration
- Source configuration  
- Sink configuration (destinations)

plus lightweight metadata.

The SDK will expose discovery + validation helpers (e.g., `get_all_sources`, `get_all_sinks`, `get_schema(kind)`, and `validate_config(...)`). When users create credentials/sources/sinks, the SDK will validate the provided config against the corresponding schema before calling Nexla APIs—surfacing actionable errors early and doubling as human-readable docs. The same schemas will power an MCP client flow to guide users interactively.

---

## Motivation

- **Strong validation before API calls**: Fail fast with precise, field-level errors (missing required keys, wrong types, bad enums).
- **Self-documenting connectors**: Schemas serve as the single source of truth for required/optional fields.
- **Better UX for IDEs/agents**: Clear prompts and autocompletion become possible from structured definitions.
- **MCP integration**: Enable a generic agent flow: list connectors → fetch schema → collect inputs → validate → create.

---

## Scope

- Ship versioned JSON Schemas per connector for data credentials, source, and sink configs, plus metadata.
- Provide Python APIs to discover connectors, fetch schemas, validate configs, and list entities by connector.
- Enforce validation in `create_credential`, `create_source`, `create_sink` (opt-out flag for power users if needed).
- Include tests that meta-validate every schema and exercise typical/edge configs.
- Optional (nice-to-have): simple CLI to list connectors, show schemas, and validate a config file.

---

## Proposed file layout (SDK package)

```
nexla_sdk/
  connectors/
    s3_connector_schema.json
    example_connector_schema.json
    <connector_slug>_connector_schema.json
...
```

### Connector schema file format:

```json
{
    "name": "s3",
    "display_name": "Amazon S3",
    "config": {
        "isSource": true,
        "isSink": true,
        "connectionCategory": "file",
        "industryCategory": "File Systems"
    },
    "small_logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png",
    "logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png",
    "connection_type": "file",
    "data_credentials_json_schema": { /* JSON Schema for credentials */ },
    "source_configuration_json_schema": { /* JSON Schema for source config */ },
    "sink_configuration_json_schema": { /* JSON Schema for sink/destination config */ }
}
```

### Schema conventions

- Use JSON Schema draft-07 (`"$schema": "http://json-schema.org/draft-07/schema#"`).
- Include `"$id"` per schema; keep title, description, and examples.
- Prefer `"additionalProperties": false` for strictness.
- Document enums, formats, and constraints (regex for IDs, ranges, etc.).
- Use conditional validation with `allOf`, `if`, `then` for complex requirements.

---

## Example schema snippets

### data_credentials_json_schema

```json
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Amazon S3 Data Credentials",
    "description": "Provide authentication information to allow Nexla to access your Amazon S3 buckets.",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Credential Name",
            "minLength": 1
        },
        "description": {
            "type": "string",
            "title": "Description"
        },
        "credentials_type": {
            "type": "string",
            "const": "s3"
        },
        "credentials": {
            "type": "object",
            "title": "Credential Details",
            "properties": {
                "s3_auth_type": {
                    "type": "string",
                    "title": "Authenticate Using",
                    "description": "Select the AWS authentication mechanism you want to use.",
                    "enum": ["Access Key", "ARN", "Instance Role"],
                    "default": "Access Key"
                },
                "access_key_id": {
                    "type": "string",
                    "title": "AWS Access Key"
                },
                "secret_key": {
                    "type": "string",
                    "title": "AWS Secret Key",
                    "minLength": 1
                },
                "region": {
                    "type": "string",
                    "title": "AWS Region",
                    "default": "us-east-1",
                    "enum": ["us-east-2", "us-east-1", "us-west-1", "us-west-2"]
                }
            },
            "required": ["s3_auth_type"],
            "allOf": [
                {
                    "if": {
                        "properties": {
                            "s3_auth_type": { "const": "Access Key" }
                        }
                    },
                    "then": {
                        "required": ["access_key_id", "secret_key"]
                    }
                }
            ]
        }
    },
    "required": ["name", "credentials_type", "credentials"]
}
```

### source_configuration_json_schema

```json
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Amazon S3 Source Configuration",
    "description": "Configure a source to read data from Amazon S3.",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Source Name",
            "minLength": 1
        },
        "description": {
            "type": "string",
            "title": "Description"
        },
        "data_credentials_id": {
            "type": ["integer", "null"],
            "title": "Credential ID",
            "description": "Select the Amazon S3 credential to use."
        },
        "source_type": {
            "type": "string",
            "const": "s3"
        },
        "source_config": {
            "type": "object",
            "title": "Configuration",
            "properties": {
                "start.cron": {
                    "type": "string",
                    "title": "Check for files",
                    "description": "Cron expression that defines how frequently Nexla scans S3.",
                    "minLength": 1
                },
                "path": {
                    "type": "string",
                    "title": "Root Folder / Bucket",
                    "description": "Bucket or folder path to scan.",
                    "minLength": 1
                },
                "advanced_settings": {
                    "type": "string",
                    "title": "File Processor",
                    "enum": ["Auto Detect", "Custom Text Format", "XML", "JSON"],
                    "default": "Auto Detect"
                }
            },
            "required": ["start.cron", "path"]
        }
    },
    "required": ["name", "data_credentials_id", "source_type", "source_config"]
}
```

### sink_configuration_json_schema

```json
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Amazon S3 Sink Configuration",
    "description": "Configure a destination to write data to Amazon S3.",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Sink Name",
            "minLength": 1
        },
        "description": {
            "type": "string",
            "title": "Description"
        },
        "data_credentials_id": {
            "type": ["integer", "null"],
            "title": "Credential ID",
            "description": "Select the Amazon S3 credential to use."
        },
        "data_set_id": {
            "type": ["integer", "null"],
            "title": "Nexset ID"
        },
        "sink_type": {
            "type": "string",
            "const": "s3"
        },
        "sink_config": {
            "type": "object",
            "title": "Configuration",
            "properties": {
                "path": {
                    "type": "string",
                    "title": "Path to Write",
                    "minLength": 1
                },
                "data_format": {
                    "type": "string",
                    "title": "Data Format",
                    "enum": ["csv", "tsv", "json", "xml", "xlsx", "parquet"],
                    "default": "json"
                },
                "max.file.size.mb": {
                    "type": "integer",
                    "title": "Maximum File Size (MB)",
                    "default": 4096,
                    "minimum": 1
                }
            },
            "required": ["path", "data_format"]
        }
    },
    "required": ["name", "data_credentials_id", "data_set_id", "sink_type", "sink_config"]
}
```

---

## Public Python API (proposed)

```python
from typing import Literal, Dict, Any, List, Optional
from nexla import connectors, validation, entities, exceptions

# Discovery
connectors.list_connectors() -> List[connectors.ConnectorMeta]
connectors.get_connector(slug: str) -> connectors.Connector

# Schema access
Connector.get_schema(kind: Literal["data_credentials", "source_configuration", "sink_configuration"]) -> Dict[str, Any]
Connector.get_metadata() -> Dict[str, Any]

# Validation helpers
validation.validate_config(
    connector: str,
    kind: Literal["data_credentials", "source_configuration", "sink_configuration"],
    config: Dict[str, Any],
) -> None  # raises exceptions.ConfigValidationError

# CRUD (these already exist or are planned; validation happens first)
connectors.create_credential(connector: str, config: Dict[str, Any]) -> entities.Credential
connectors.create_source(connector: str, config: Dict[str, Any]) -> entities.Source
connectors.create_sink(connector: str, config: Dict[str, Any]) -> entities.Sink

# Listing by connector (names aligned with user ask)
connectors.get_all_sources(connector: Optional[str] = None) -> List[entities.Source]
connectors.get_all_sinks(connector: Optional[str] = None) -> List[entities.Sink]
```

**Behavior**: `create_*` calls first do `validate_config(...)`. If invalid, raise `ConfigValidationError` with:
- connector, kind, path (JSON Pointer), message, and context (if available).

**Dependency**: use `jsonschema` (or `referencing`) for validation.

**Extensibility**: future schemas can be added without breaking the API.

---

## Example usage

```python
from nexla import connectors, validation, exceptions

c = connectors.get_connector("s3")
cred_schema = c.get_schema("data_credentials")

user_cred = {
    "name": "My S3 Credentials",
    "credentials_type": "s3",
    "credentials": {
        "s3_auth_type": "Access Key",
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1"
    }
}

try:
    validation.validate_config("s3", "data_credentials", user_cred)
    cred = connectors.create_credential("s3", user_cred)
except exceptions.ConfigValidationError as e:
    print("Validation failed:", e.message, "at", e.path)
```

### Example error

```
ConfigValidationError: 'access_key_id' is required when s3_auth_type is 'Access Key' at $.credentials.access_key_id
```

---

## MCP client flow (target)

1. `nexla.list_connectors` → show available connectors.
2. `nexla.get_connector_schema(connector=<slug>, kind="data_credentials")` → prompt user for required fields.
3. `nexla.validate_config(...)` → show precise errors if any.
4. `nexla.create_credential(...)`.
5. Repeat for source and then sink using the respective schemas.
6. Optionally, `nexla.get_all_sources` / `nexla.get_all_sinks` to confirm setup.

---

## Docs & Generation

- Auto-generate Markdown reference pages from each schema: titles, descriptions, required/optional tables, enums, examples.
- Link docs from connector metadata (e.g., logos, display_name).
- Add a README section describing the schema architecture and how to extend a connector.

---

## Backwards compatibility

- Default behavior is non-breaking. Validation runs before API calls.
- If a user relies on previously permissive behavior, allow `validate=False` overrides on `create_*` (discouraged, but possible).

---

## Testing & CI

### Unit tests:
- Iterate all connector schema files and meta-validate against the JSON Schema meta-schema.
- Happy-path and failure-path tests for each connector.
- Test conditional validation logic (allOf, if/then constructs).

### CI:
- Lint JSON.
- Run jsonschema validation.
- Ensure schemas are importable and versioned.
- (Optional) Contract tests against sandbox connectors.

---

## Tasks

- Define schema conventions and author guidelines in `CONTRIBUTING.md`.
- Implement schema loader (by connector slug + schema kind).
- Implement `validation.validate_config(...)` with rich error types.
- Wire validation into `create_credential`, `create_source`, `create_sink`.
- Implement discovery helpers: `list_connectors`, `get_connector`, `get_all_sources`, `get_all_sinks`.
- Add first batch of connector schemas (`s3_connector_schema.json` + 2–3 real connectors).
- Docs generation from schemas.
- CLI (optional): `nexla connectors list|show|schema|validate`.
- Tests + CI checks for all the above.

---

## Definition of Done

- Users can discover connectors, view schemas, and validate configs locally.
- Creating credentials/sources/sinks fails fast with clear errors if configs are invalid.
- At least one real connector (S3) fully documented/validated by shipped schemas.
- Docs page(s) generated from schemas and linked in the SDK README.
- Tests and CI green.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-connector JSON schemas + config validation & discovery APIs (credentials, source, sink) (support MCP client) #6

Add a schema-driven connector catalog to the Nexla Python SDK

Motivation

Scope

Proposed file layout (SDK package)

Connector schema file format:

Schema conventions

Example schema snippets

data_credentials_json_schema

source_configuration_json_schema

sink_configuration_json_schema

Public Python API (proposed)

Example usage

Example error

MCP client flow (target)

Docs & Generation

Backwards compatibility

Testing & CI

Unit tests:

CI:

Tasks

Definition of Done

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Per-connector JSON schemas + config validation & discovery APIs (credentials, source, sink) (support MCP client) #6

Description

Add a schema-driven connector catalog to the Nexla Python SDK

Motivation

Scope

Proposed file layout (SDK package)

Connector schema file format:

Schema conventions

Example schema snippets

data_credentials_json_schema

source_configuration_json_schema

sink_configuration_json_schema

Public Python API (proposed)

Example usage

Example error

MCP client flow (target)

Docs & Generation

Backwards compatibility

Testing & CI

Unit tests:

CI:

Tasks

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions