Skip to content

Per-connector JSON schemas + config validation & discovery APIs (credentials, source, sink) (support MCP client) #6

@saksham-nexla

Description

@saksham-nexla

Add a schema-driven connector catalog to the Nexla Python SDK

Each supported connector ships with JSON Schema files describing:

  • Data credentials configuration
  • Source configuration
  • Sink configuration (destinations)

plus lightweight metadata.

The SDK will expose discovery + validation helpers (e.g., get_all_sources, get_all_sinks, get_schema(kind), and validate_config(...)). When users create credentials/sources/sinks, the SDK will validate the provided config against the corresponding schema before calling Nexla APIs—surfacing actionable errors early and doubling as human-readable docs. The same schemas will power an MCP client flow to guide users interactively.


Motivation

  • Strong validation before API calls: Fail fast with precise, field-level errors (missing required keys, wrong types, bad enums).
  • Self-documenting connectors: Schemas serve as the single source of truth for required/optional fields.
  • Better UX for IDEs/agents: Clear prompts and autocompletion become possible from structured definitions.
  • MCP integration: Enable a generic agent flow: list connectors → fetch schema → collect inputs → validate → create.

Scope

  • Ship versioned JSON Schemas per connector for data credentials, source, and sink configs, plus metadata.
  • Provide Python APIs to discover connectors, fetch schemas, validate configs, and list entities by connector.
  • Enforce validation in create_credential, create_source, create_sink (opt-out flag for power users if needed).
  • Include tests that meta-validate every schema and exercise typical/edge configs.
  • Optional (nice-to-have): simple CLI to list connectors, show schemas, and validate a config file.

Proposed file layout (SDK package)

nexla_sdk/
  connectors/
    s3_connector_schema.json
    example_connector_schema.json
    <connector_slug>_connector_schema.json
...

Connector schema file format:

{
    "name": "s3",
    "display_name": "Amazon S3",
    "config": {
        "isSource": true,
        "isSink": true,
        "connectionCategory": "file",
        "industryCategory": "File Systems"
    },
    "small_logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png",
    "logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png",
    "connection_type": "file",
    "data_credentials_json_schema": { /* JSON Schema for credentials */ },
    "source_configuration_json_schema": { /* JSON Schema for source config */ },
    "sink_configuration_json_schema": { /* JSON Schema for sink/destination config */ }
}

Schema conventions

  • Use JSON Schema draft-07 ("$schema": "http://json-schema.org/draft-07/schema#").
  • Include "$id" per schema; keep title, description, and examples.
  • Prefer "additionalProperties": false for strictness.
  • Document enums, formats, and constraints (regex for IDs, ranges, etc.).
  • Use conditional validation with allOf, if, then for complex requirements.

Example schema snippets

data_credentials_json_schema

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Amazon S3 Data Credentials",
    "description": "Provide authentication information to allow Nexla to access your Amazon S3 buckets.",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Credential Name",
            "minLength": 1
        },
        "description": {
            "type": "string",
            "title": "Description"
        },
        "credentials_type": {
            "type": "string",
            "const": "s3"
        },
        "credentials": {
            "type": "object",
            "title": "Credential Details",
            "properties": {
                "s3_auth_type": {
                    "type": "string",
                    "title": "Authenticate Using",
                    "description": "Select the AWS authentication mechanism you want to use.",
                    "enum": ["Access Key", "ARN", "Instance Role"],
                    "default": "Access Key"
                },
                "access_key_id": {
                    "type": "string",
                    "title": "AWS Access Key"
                },
                "secret_key": {
                    "type": "string",
                    "title": "AWS Secret Key",
                    "minLength": 1
                },
                "region": {
                    "type": "string",
                    "title": "AWS Region",
                    "default": "us-east-1",
                    "enum": ["us-east-2", "us-east-1", "us-west-1", "us-west-2"]
                }
            },
            "required": ["s3_auth_type"],
            "allOf": [
                {
                    "if": {
                        "properties": {
                            "s3_auth_type": { "const": "Access Key" }
                        }
                    },
                    "then": {
                        "required": ["access_key_id", "secret_key"]
                    }
                }
            ]
        }
    },
    "required": ["name", "credentials_type", "credentials"]
}

source_configuration_json_schema

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Amazon S3 Source Configuration",
    "description": "Configure a source to read data from Amazon S3.",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Source Name",
            "minLength": 1
        },
        "description": {
            "type": "string",
            "title": "Description"
        },
        "data_credentials_id": {
            "type": ["integer", "null"],
            "title": "Credential ID",
            "description": "Select the Amazon S3 credential to use."
        },
        "source_type": {
            "type": "string",
            "const": "s3"
        },
        "source_config": {
            "type": "object",
            "title": "Configuration",
            "properties": {
                "start.cron": {
                    "type": "string",
                    "title": "Check for files",
                    "description": "Cron expression that defines how frequently Nexla scans S3.",
                    "minLength": 1
                },
                "path": {
                    "type": "string",
                    "title": "Root Folder / Bucket",
                    "description": "Bucket or folder path to scan.",
                    "minLength": 1
                },
                "advanced_settings": {
                    "type": "string",
                    "title": "File Processor",
                    "enum": ["Auto Detect", "Custom Text Format", "XML", "JSON"],
                    "default": "Auto Detect"
                }
            },
            "required": ["start.cron", "path"]
        }
    },
    "required": ["name", "data_credentials_id", "source_type", "source_config"]
}

sink_configuration_json_schema

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Amazon S3 Sink Configuration",
    "description": "Configure a destination to write data to Amazon S3.",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Sink Name",
            "minLength": 1
        },
        "description": {
            "type": "string",
            "title": "Description"
        },
        "data_credentials_id": {
            "type": ["integer", "null"],
            "title": "Credential ID",
            "description": "Select the Amazon S3 credential to use."
        },
        "data_set_id": {
            "type": ["integer", "null"],
            "title": "Nexset ID"
        },
        "sink_type": {
            "type": "string",
            "const": "s3"
        },
        "sink_config": {
            "type": "object",
            "title": "Configuration",
            "properties": {
                "path": {
                    "type": "string",
                    "title": "Path to Write",
                    "minLength": 1
                },
                "data_format": {
                    "type": "string",
                    "title": "Data Format",
                    "enum": ["csv", "tsv", "json", "xml", "xlsx", "parquet"],
                    "default": "json"
                },
                "max.file.size.mb": {
                    "type": "integer",
                    "title": "Maximum File Size (MB)",
                    "default": 4096,
                    "minimum": 1
                }
            },
            "required": ["path", "data_format"]
        }
    },
    "required": ["name", "data_credentials_id", "data_set_id", "sink_type", "sink_config"]
}

Public Python API (proposed)

from typing import Literal, Dict, Any, List, Optional
from nexla import connectors, validation, entities, exceptions

# Discovery
connectors.list_connectors() -> List[connectors.ConnectorMeta]
connectors.get_connector(slug: str) -> connectors.Connector

# Schema access
Connector.get_schema(kind: Literal["data_credentials", "source_configuration", "sink_configuration"]) -> Dict[str, Any]
Connector.get_metadata() -> Dict[str, Any]

# Validation helpers
validation.validate_config(
    connector: str,
    kind: Literal["data_credentials", "source_configuration", "sink_configuration"],
    config: Dict[str, Any],
) -> None  # raises exceptions.ConfigValidationError

# CRUD (these already exist or are planned; validation happens first)
connectors.create_credential(connector: str, config: Dict[str, Any]) -> entities.Credential
connectors.create_source(connector: str, config: Dict[str, Any]) -> entities.Source
connectors.create_sink(connector: str, config: Dict[str, Any]) -> entities.Sink

# Listing by connector (names aligned with user ask)
connectors.get_all_sources(connector: Optional[str] = None) -> List[entities.Source]
connectors.get_all_sinks(connector: Optional[str] = None) -> List[entities.Sink]

Behavior: create_* calls first do validate_config(...). If invalid, raise ConfigValidationError with:

  • connector, kind, path (JSON Pointer), message, and context (if available).

Dependency: use jsonschema (or referencing) for validation.

Extensibility: future schemas can be added without breaking the API.


Example usage

from nexla import connectors, validation, exceptions

c = connectors.get_connector("s3")
cred_schema = c.get_schema("data_credentials")

user_cred = {
    "name": "My S3 Credentials",
    "credentials_type": "s3",
    "credentials": {
        "s3_auth_type": "Access Key",
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1"
    }
}

try:
    validation.validate_config("s3", "data_credentials", user_cred)
    cred = connectors.create_credential("s3", user_cred)
except exceptions.ConfigValidationError as e:
    print("Validation failed:", e.message, "at", e.path)

Example error

ConfigValidationError: 'access_key_id' is required when s3_auth_type is 'Access Key' at $.credentials.access_key_id

MCP client flow (target)

  1. nexla.list_connectors → show available connectors.
  2. nexla.get_connector_schema(connector=<slug>, kind="data_credentials") → prompt user for required fields.
  3. nexla.validate_config(...) → show precise errors if any.
  4. nexla.create_credential(...).
  5. Repeat for source and then sink using the respective schemas.
  6. Optionally, nexla.get_all_sources / nexla.get_all_sinks to confirm setup.

Docs & Generation

  • Auto-generate Markdown reference pages from each schema: titles, descriptions, required/optional tables, enums, examples.
  • Link docs from connector metadata (e.g., logos, display_name).
  • Add a README section describing the schema architecture and how to extend a connector.

Backwards compatibility

  • Default behavior is non-breaking. Validation runs before API calls.
  • If a user relies on previously permissive behavior, allow validate=False overrides on create_* (discouraged, but possible).

Testing & CI

Unit tests:

  • Iterate all connector schema files and meta-validate against the JSON Schema meta-schema.
  • Happy-path and failure-path tests for each connector.
  • Test conditional validation logic (allOf, if/then constructs).

CI:

  • Lint JSON.
  • Run jsonschema validation.
  • Ensure schemas are importable and versioned.
  • (Optional) Contract tests against sandbox connectors.

Tasks

  • Define schema conventions and author guidelines in CONTRIBUTING.md.
  • Implement schema loader (by connector slug + schema kind).
  • Implement validation.validate_config(...) with rich error types.
  • Wire validation into create_credential, create_source, create_sink.
  • Implement discovery helpers: list_connectors, get_connector, get_all_sources, get_all_sinks.
  • Add first batch of connector schemas (s3_connector_schema.json + 2–3 real connectors).
  • Docs generation from schemas.
  • CLI (optional): nexla connectors list|show|schema|validate.
  • Tests + CI checks for all the above.

Definition of Done

  • Users can discover connectors, view schemas, and validate configs locally.
  • Creating credentials/sources/sinks fails fast with clear errors if configs are invalid.
  • At least one real connector (S3) fully documented/validated by shipped schemas.
  • Docs page(s) generated from schemas and linked in the SDK README.
  • Tests and CI green.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions