Add a schema-driven connector catalog to the Nexla Python SDK
Each supported connector ships with JSON Schema files describing:
- Data credentials configuration
- Source configuration
- Sink configuration (destinations)
plus lightweight metadata.
The SDK will expose discovery + validation helpers (e.g., get_all_sources, get_all_sinks, get_schema(kind), and validate_config(...)). When users create credentials/sources/sinks, the SDK will validate the provided config against the corresponding schema before calling Nexla APIs—surfacing actionable errors early and doubling as human-readable docs. The same schemas will power an MCP client flow to guide users interactively.
Motivation
- Strong validation before API calls: Fail fast with precise, field-level errors (missing required keys, wrong types, bad enums).
- Self-documenting connectors: Schemas serve as the single source of truth for required/optional fields.
- Better UX for IDEs/agents: Clear prompts and autocompletion become possible from structured definitions.
- MCP integration: Enable a generic agent flow: list connectors → fetch schema → collect inputs → validate → create.
Scope
- Ship versioned JSON Schemas per connector for data credentials, source, and sink configs, plus metadata.
- Provide Python APIs to discover connectors, fetch schemas, validate configs, and list entities by connector.
- Enforce validation in
create_credential, create_source, create_sink (opt-out flag for power users if needed).
- Include tests that meta-validate every schema and exercise typical/edge configs.
- Optional (nice-to-have): simple CLI to list connectors, show schemas, and validate a config file.
Proposed file layout (SDK package)
nexla_sdk/
connectors/
s3_connector_schema.json
example_connector_schema.json
<connector_slug>_connector_schema.json
...
Connector schema file format:
{
"name": "s3",
"display_name": "Amazon S3",
"config": {
"isSource": true,
"isSink": true,
"connectionCategory": "file",
"industryCategory": "File Systems"
},
"small_logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png",
"logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png",
"connection_type": "file",
"data_credentials_json_schema": { /* JSON Schema for credentials */ },
"source_configuration_json_schema": { /* JSON Schema for source config */ },
"sink_configuration_json_schema": { /* JSON Schema for sink/destination config */ }
}
Schema conventions
- Use JSON Schema draft-07 (
"$schema": "http://json-schema.org/draft-07/schema#").
- Include
"$id" per schema; keep title, description, and examples.
- Prefer
"additionalProperties": false for strictness.
- Document enums, formats, and constraints (regex for IDs, ranges, etc.).
- Use conditional validation with
allOf, if, then for complex requirements.
Example schema snippets
data_credentials_json_schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Amazon S3 Data Credentials",
"description": "Provide authentication information to allow Nexla to access your Amazon S3 buckets.",
"type": "object",
"properties": {
"name": {
"type": "string",
"title": "Credential Name",
"minLength": 1
},
"description": {
"type": "string",
"title": "Description"
},
"credentials_type": {
"type": "string",
"const": "s3"
},
"credentials": {
"type": "object",
"title": "Credential Details",
"properties": {
"s3_auth_type": {
"type": "string",
"title": "Authenticate Using",
"description": "Select the AWS authentication mechanism you want to use.",
"enum": ["Access Key", "ARN", "Instance Role"],
"default": "Access Key"
},
"access_key_id": {
"type": "string",
"title": "AWS Access Key"
},
"secret_key": {
"type": "string",
"title": "AWS Secret Key",
"minLength": 1
},
"region": {
"type": "string",
"title": "AWS Region",
"default": "us-east-1",
"enum": ["us-east-2", "us-east-1", "us-west-1", "us-west-2"]
}
},
"required": ["s3_auth_type"],
"allOf": [
{
"if": {
"properties": {
"s3_auth_type": { "const": "Access Key" }
}
},
"then": {
"required": ["access_key_id", "secret_key"]
}
}
]
}
},
"required": ["name", "credentials_type", "credentials"]
}
source_configuration_json_schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Amazon S3 Source Configuration",
"description": "Configure a source to read data from Amazon S3.",
"type": "object",
"properties": {
"name": {
"type": "string",
"title": "Source Name",
"minLength": 1
},
"description": {
"type": "string",
"title": "Description"
},
"data_credentials_id": {
"type": ["integer", "null"],
"title": "Credential ID",
"description": "Select the Amazon S3 credential to use."
},
"source_type": {
"type": "string",
"const": "s3"
},
"source_config": {
"type": "object",
"title": "Configuration",
"properties": {
"start.cron": {
"type": "string",
"title": "Check for files",
"description": "Cron expression that defines how frequently Nexla scans S3.",
"minLength": 1
},
"path": {
"type": "string",
"title": "Root Folder / Bucket",
"description": "Bucket or folder path to scan.",
"minLength": 1
},
"advanced_settings": {
"type": "string",
"title": "File Processor",
"enum": ["Auto Detect", "Custom Text Format", "XML", "JSON"],
"default": "Auto Detect"
}
},
"required": ["start.cron", "path"]
}
},
"required": ["name", "data_credentials_id", "source_type", "source_config"]
}
sink_configuration_json_schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Amazon S3 Sink Configuration",
"description": "Configure a destination to write data to Amazon S3.",
"type": "object",
"properties": {
"name": {
"type": "string",
"title": "Sink Name",
"minLength": 1
},
"description": {
"type": "string",
"title": "Description"
},
"data_credentials_id": {
"type": ["integer", "null"],
"title": "Credential ID",
"description": "Select the Amazon S3 credential to use."
},
"data_set_id": {
"type": ["integer", "null"],
"title": "Nexset ID"
},
"sink_type": {
"type": "string",
"const": "s3"
},
"sink_config": {
"type": "object",
"title": "Configuration",
"properties": {
"path": {
"type": "string",
"title": "Path to Write",
"minLength": 1
},
"data_format": {
"type": "string",
"title": "Data Format",
"enum": ["csv", "tsv", "json", "xml", "xlsx", "parquet"],
"default": "json"
},
"max.file.size.mb": {
"type": "integer",
"title": "Maximum File Size (MB)",
"default": 4096,
"minimum": 1
}
},
"required": ["path", "data_format"]
}
},
"required": ["name", "data_credentials_id", "data_set_id", "sink_type", "sink_config"]
}
Public Python API (proposed)
from typing import Literal, Dict, Any, List, Optional
from nexla import connectors, validation, entities, exceptions
# Discovery
connectors.list_connectors() -> List[connectors.ConnectorMeta]
connectors.get_connector(slug: str) -> connectors.Connector
# Schema access
Connector.get_schema(kind: Literal["data_credentials", "source_configuration", "sink_configuration"]) -> Dict[str, Any]
Connector.get_metadata() -> Dict[str, Any]
# Validation helpers
validation.validate_config(
connector: str,
kind: Literal["data_credentials", "source_configuration", "sink_configuration"],
config: Dict[str, Any],
) -> None # raises exceptions.ConfigValidationError
# CRUD (these already exist or are planned; validation happens first)
connectors.create_credential(connector: str, config: Dict[str, Any]) -> entities.Credential
connectors.create_source(connector: str, config: Dict[str, Any]) -> entities.Source
connectors.create_sink(connector: str, config: Dict[str, Any]) -> entities.Sink
# Listing by connector (names aligned with user ask)
connectors.get_all_sources(connector: Optional[str] = None) -> List[entities.Source]
connectors.get_all_sinks(connector: Optional[str] = None) -> List[entities.Sink]
Behavior: create_* calls first do validate_config(...). If invalid, raise ConfigValidationError with:
- connector, kind, path (JSON Pointer), message, and context (if available).
Dependency: use jsonschema (or referencing) for validation.
Extensibility: future schemas can be added without breaking the API.
Example usage
from nexla import connectors, validation, exceptions
c = connectors.get_connector("s3")
cred_schema = c.get_schema("data_credentials")
user_cred = {
"name": "My S3 Credentials",
"credentials_type": "s3",
"credentials": {
"s3_auth_type": "Access Key",
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-east-1"
}
}
try:
validation.validate_config("s3", "data_credentials", user_cred)
cred = connectors.create_credential("s3", user_cred)
except exceptions.ConfigValidationError as e:
print("Validation failed:", e.message, "at", e.path)
Example error
ConfigValidationError: 'access_key_id' is required when s3_auth_type is 'Access Key' at $.credentials.access_key_id
MCP client flow (target)
nexla.list_connectors → show available connectors.
nexla.get_connector_schema(connector=<slug>, kind="data_credentials") → prompt user for required fields.
nexla.validate_config(...) → show precise errors if any.
nexla.create_credential(...).
- Repeat for source and then sink using the respective schemas.
- Optionally,
nexla.get_all_sources / nexla.get_all_sinks to confirm setup.
Docs & Generation
- Auto-generate Markdown reference pages from each schema: titles, descriptions, required/optional tables, enums, examples.
- Link docs from connector metadata (e.g., logos, display_name).
- Add a README section describing the schema architecture and how to extend a connector.
Backwards compatibility
- Default behavior is non-breaking. Validation runs before API calls.
- If a user relies on previously permissive behavior, allow
validate=False overrides on create_* (discouraged, but possible).
Testing & CI
Unit tests:
- Iterate all connector schema files and meta-validate against the JSON Schema meta-schema.
- Happy-path and failure-path tests for each connector.
- Test conditional validation logic (allOf, if/then constructs).
CI:
- Lint JSON.
- Run jsonschema validation.
- Ensure schemas are importable and versioned.
- (Optional) Contract tests against sandbox connectors.
Tasks
- Define schema conventions and author guidelines in
CONTRIBUTING.md.
- Implement schema loader (by connector slug + schema kind).
- Implement
validation.validate_config(...) with rich error types.
- Wire validation into
create_credential, create_source, create_sink.
- Implement discovery helpers:
list_connectors, get_connector, get_all_sources, get_all_sinks.
- Add first batch of connector schemas (
s3_connector_schema.json + 2–3 real connectors).
- Docs generation from schemas.
- CLI (optional):
nexla connectors list|show|schema|validate.
- Tests + CI checks for all the above.
Definition of Done
- Users can discover connectors, view schemas, and validate configs locally.
- Creating credentials/sources/sinks fails fast with clear errors if configs are invalid.
- At least one real connector (S3) fully documented/validated by shipped schemas.
- Docs page(s) generated from schemas and linked in the SDK README.
- Tests and CI green.
Add a schema-driven connector catalog to the Nexla Python SDK
Each supported connector ships with JSON Schema files describing:
plus lightweight metadata.
The SDK will expose discovery + validation helpers (e.g.,
get_all_sources,get_all_sinks,get_schema(kind), andvalidate_config(...)). When users create credentials/sources/sinks, the SDK will validate the provided config against the corresponding schema before calling Nexla APIs—surfacing actionable errors early and doubling as human-readable docs. The same schemas will power an MCP client flow to guide users interactively.Motivation
Scope
create_credential,create_source,create_sink(opt-out flag for power users if needed).Proposed file layout (SDK package)
Connector schema file format:
{ "name": "s3", "display_name": "Amazon S3", "config": { "isSource": true, "isSink": true, "connectionCategory": "file", "industryCategory": "File Systems" }, "small_logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png", "logo": "https://cdn.nexla.io/ui/assets/data-sources/s3.png", "connection_type": "file", "data_credentials_json_schema": { /* JSON Schema for credentials */ }, "source_configuration_json_schema": { /* JSON Schema for source config */ }, "sink_configuration_json_schema": { /* JSON Schema for sink/destination config */ } }Schema conventions
"$schema": "http://json-schema.org/draft-07/schema#")."$id"per schema; keep title, description, and examples."additionalProperties": falsefor strictness.allOf,if,thenfor complex requirements.Example schema snippets
data_credentials_json_schema
{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "Amazon S3 Data Credentials", "description": "Provide authentication information to allow Nexla to access your Amazon S3 buckets.", "type": "object", "properties": { "name": { "type": "string", "title": "Credential Name", "minLength": 1 }, "description": { "type": "string", "title": "Description" }, "credentials_type": { "type": "string", "const": "s3" }, "credentials": { "type": "object", "title": "Credential Details", "properties": { "s3_auth_type": { "type": "string", "title": "Authenticate Using", "description": "Select the AWS authentication mechanism you want to use.", "enum": ["Access Key", "ARN", "Instance Role"], "default": "Access Key" }, "access_key_id": { "type": "string", "title": "AWS Access Key" }, "secret_key": { "type": "string", "title": "AWS Secret Key", "minLength": 1 }, "region": { "type": "string", "title": "AWS Region", "default": "us-east-1", "enum": ["us-east-2", "us-east-1", "us-west-1", "us-west-2"] } }, "required": ["s3_auth_type"], "allOf": [ { "if": { "properties": { "s3_auth_type": { "const": "Access Key" } } }, "then": { "required": ["access_key_id", "secret_key"] } } ] } }, "required": ["name", "credentials_type", "credentials"] }source_configuration_json_schema
{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "Amazon S3 Source Configuration", "description": "Configure a source to read data from Amazon S3.", "type": "object", "properties": { "name": { "type": "string", "title": "Source Name", "minLength": 1 }, "description": { "type": "string", "title": "Description" }, "data_credentials_id": { "type": ["integer", "null"], "title": "Credential ID", "description": "Select the Amazon S3 credential to use." }, "source_type": { "type": "string", "const": "s3" }, "source_config": { "type": "object", "title": "Configuration", "properties": { "start.cron": { "type": "string", "title": "Check for files", "description": "Cron expression that defines how frequently Nexla scans S3.", "minLength": 1 }, "path": { "type": "string", "title": "Root Folder / Bucket", "description": "Bucket or folder path to scan.", "minLength": 1 }, "advanced_settings": { "type": "string", "title": "File Processor", "enum": ["Auto Detect", "Custom Text Format", "XML", "JSON"], "default": "Auto Detect" } }, "required": ["start.cron", "path"] } }, "required": ["name", "data_credentials_id", "source_type", "source_config"] }sink_configuration_json_schema
{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "Amazon S3 Sink Configuration", "description": "Configure a destination to write data to Amazon S3.", "type": "object", "properties": { "name": { "type": "string", "title": "Sink Name", "minLength": 1 }, "description": { "type": "string", "title": "Description" }, "data_credentials_id": { "type": ["integer", "null"], "title": "Credential ID", "description": "Select the Amazon S3 credential to use." }, "data_set_id": { "type": ["integer", "null"], "title": "Nexset ID" }, "sink_type": { "type": "string", "const": "s3" }, "sink_config": { "type": "object", "title": "Configuration", "properties": { "path": { "type": "string", "title": "Path to Write", "minLength": 1 }, "data_format": { "type": "string", "title": "Data Format", "enum": ["csv", "tsv", "json", "xml", "xlsx", "parquet"], "default": "json" }, "max.file.size.mb": { "type": "integer", "title": "Maximum File Size (MB)", "default": 4096, "minimum": 1 } }, "required": ["path", "data_format"] } }, "required": ["name", "data_credentials_id", "data_set_id", "sink_type", "sink_config"] }Public Python API (proposed)
Behavior:
create_*calls first dovalidate_config(...). If invalid, raiseConfigValidationErrorwith:Dependency: use
jsonschema(orreferencing) for validation.Extensibility: future schemas can be added without breaking the API.
Example usage
Example error
MCP client flow (target)
nexla.list_connectors→ show available connectors.nexla.get_connector_schema(connector=<slug>, kind="data_credentials")→ prompt user for required fields.nexla.validate_config(...)→ show precise errors if any.nexla.create_credential(...).nexla.get_all_sources/nexla.get_all_sinksto confirm setup.Docs & Generation
Backwards compatibility
validate=Falseoverrides oncreate_*(discouraged, but possible).Testing & CI
Unit tests:
CI:
Tasks
CONTRIBUTING.md.validation.validate_config(...)with rich error types.create_credential,create_source,create_sink.list_connectors,get_connector,get_all_sources,get_all_sinks.s3_connector_schema.json+ 2–3 real connectors).nexla connectors list|show|schema|validate.Definition of Done