refactor(wren): drop ibis for canner/postgres/mysql/mssql/trino/clickhouse/athena (combined) by goldmedal · Pull Request #2313 · Canner/WrenAI

goldmedal · 2026-05-21T05:23:41Z

Summary

Consolidates seven open PRs that each migrate one core/wren connector off ibis-framework to its native driver. They were merged into a single feature branch so conflicts on shared files (data_source.py, pyproject.toml, uv.lock, factory.py, tests/conftest.py, justfile) only had to be resolved once.

Bundled PRs:

refactor(canner): use psycopg native driver, drop ibis dependency #2269 — refactor(canner): use psycopg native driver
refactor(trino): use trino native driver, drop ibis dependency #2270 — refactor(trino): use trino native driver
refactor(athena): use pyathena native driver, drop ibis dependency #2271 — refactor(athena): use pyathena native driver
refactor(postgres): use psycopg native driver, drop ibis dependency #2272 — refactor(postgres): use psycopg native driver
refactor(mysql): use MySQLdb native driver, drop ibis dependency #2273 — refactor(mysql): use MySQLdb native driver
refactor(mssql): use pyodbc native driver, drop ibis dependency #2274 — refactor(mssql): use pyodbc native driver
refactor(clickhouse): use clickhouse-connect native driver, drop ibis dependency #2275 — refactor(clickhouse): use clickhouse-connect native driver

All seven were merged with --no-ff to preserve their individual commit history.

Conflict resolutions

core/wren/src/wren/model/data_source.py — unified the return-type alias to BackendOrConnection = Union[BaseBackend, "PyAthenaConnection", "MySQLdb.Connection", "psycopg.Connection"]; consolidated TYPE_CHECKING imports; kept the URL-dispatch fast-paths added by mysql/doris and trino PRs; dropped the now-orphaned import ssl, SSLMode, and _create_ssl_context helper (mysql PR moved SSL handling into wren.connector.mysql); dropped the orphaned import boto3 (athena PR moved STS into wren.connector.athena).
core/wren/src/wren/connector/factory.py — every connector now resolves to its own module (athena/clickhouse/trino out of wren.connector.ibis); _NEEDS_DATA_SOURCE trimmed to {mysql, doris, trino}.
core/wren/src/wren/connector/ibis.py — deleted; no remaining datasource resolves to it after the migrations.
core/wren/pyproject.toml — optional-deps union: psycopg[binary], mysqlclient, pyodbc, clickhouse-connect, trino, pyathena, snowflake-connector-python (no ibis-framework[<x>] extras for the migrated connectors).
core/wren/uv.lock — regenerated once at the end.
core/wren/tests/conftest.py and core/wren/justfile — additive concatenation of new pytest markers and just test-<connector> recipes.

Test plan

cd core/wren && uv lock — clean resolve
uv run ruff check src tests — same 32 pre-existing errors as origin/main, no regressions
uv run pytest tests/unit/test_athena_connector.py tests/unit/test_mssql_connection.py tests/unit/test_mysql_helpers.py — 54 passed, 1 skipped
Per-connector integration suites (Docker required): just test-canner, test-postgres, test-mysql, test-mssql, test-clickhouse, test-trino, test-athena

Notes

Pre-existing unit-test failures unrelated to this branch: tests/unit/test_context_cli.py::test_validate_strict_warns and 5 tests/unit/test_cube_cli.py cases that need a rebuilt wren-core-py wheel.
The seven source PRs (refactor(canner): use psycopg native driver, drop ibis dependency #2269–refactor(clickhouse): use clickhouse-connect native driver, drop ibis dependency #2275) can be closed once this PR lands.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Native connectors added for Athena, ClickHouse, Trino, and Canner; improved native MySQL, Postgres, and MSSQL connectors.
Improvements
- Better type conversion, timeout/error mapping, and more robust connection handling across backends.
- Connector factory updated to prefer dedicated native connectors.
Tests
- Expanded integration and unit tests for all connectors and new pytest targets for added backends.

Canner Enterprise speaks the Postgres wire protocol; the connector now uses psycopg directly instead of the ibis postgres backend. Changes - `connector/canner.py`: native psycopg cursor with a self-contained PG OID -> Arrow type map covering the canner-flavoured types (VARCHAR/CHAR -> string, DECIMAL -> decimal128, BIGINT/INT/SMALLINT -> int, BOOLEAN -> bool, DATE/TIMESTAMP/TIMESTAMPTZ -> date/timestamp, ROW/ARRAY/MAP serialised as JSON strings). Errors are wrapped as WrenError with the dialect SQL attached, mirroring the existing postgres connector contract. - `model/data_source.py::get_canner_connection`: returns a `psycopg.Connection` (autocommit) instead of an ibis backend. Tests - `tests/connectors/test_canner.py` exercises the type-mapping helpers and runs the connector against a PostgresContainer with the common canner result types (incl. JSON/JSONB and arrays). Marker `canner` is registered in `tests/conftest.py` and a `just test-canner` target is added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The trino connector now uses the `trino` python client directly, parsing Trino type strings via sqlglot to build PyArrow schemas. The trino extra installs `trino>=0.333,<1` instead of `ibis-framework[trino]`. Highlights - `connector/trino.py`: native cursor execution; type-string -> Arrow via sqlglot, including row(...) / array(...) / map(...) / decimal(p,s) / timestamp with time zone. - `model/data_source.py::get_trino_connection`: returns `trino.dbapi.Connection`; native code path no longer routes through ibis. - `pyproject.toml`: trino extra -> `trino>=0.333,<1`. Tests - `tests/connectors/test_trino.py` covers ~36 Trino type categories including nested row/array/map plus testcontainer-backed query suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The athena connector now uses pyathena directly with a Trino-style type lexer to materialise cursor results into PyArrow tables, removing the ibis-framework[athena] dependency from the athena extra. Highlights - connector/athena.py: native pyathena cursor; type strings parsed via sqlglot (varchar, decimal(p,s), array<T>, row(...), map<K,V>, etc.). - model/data_source.py::get_athena_connection: preserves the Web-Identity-Token (OIDC -> AssumeRoleWithWebIdentity) and access-key auth flows; returns a pyathena.connection.Connection. - pyproject.toml: athena extra -> pyathena[pandas]>=3. Tests - tests/unit/test_athena_connector.py mocks pyathena cursor + boto3 STS to verify the type lexer, cursor->Arrow materialisation, error mapping, and all three credential resolution paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The postgres connector now uses psycopg directly instead of going through ibis. This removes ibis-framework[postgres] from the postgres extra and gives us direct control over Arrow conversion for pg-specific types (numeric scale, arrays, intervals, jsonb). Changes - connector/postgres.py: native psycopg3 cursor execution + OID-driven Arrow type mapping (bool, int{2,4,8}, float{4,8}, decimal128(p,s), text, bytea, uuid, jsonb, timestamp[tz], date, time, array-of-T). - model/data_source.py::get_postgres_connection: returns psycopg.Connection. - pyproject.toml: postgres extra now installs psycopg[binary]>=3 instead of ibis-framework[postgres]. - connector/base.py and model/data_source.py: ibis imports made lazy so importing wren.connector.postgres no longer pulls in ibis. Tests - New direct-connector tests in tests/connectors/test_postgres.py exercise 10+ pg types (int4, int8, numeric(38,9), text, bool, bytea, uuid, jsonb, timestamp, timestamptz, int4[], text[], numeric[]) against a testcontainer postgres, asserting Arrow schema and value round-trip. - All existing tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The mysql / doris connector now uses mysqlclient (MySQLdb) directly instead of going through ibis. The mysql extra installs `mysqlclient` only. Changes - `connector/mysql.py`: native MySQLdb cursor + field-type-code → Arrow type mapping covering integer (signed/unsigned variants), decimal, float, char/varchar/text, json, blob, datetime/timestamp, date, time, bit. SSL kwargs derive from MySqlConnectionInfo. - `model/data_source.py`: `get_mysql_connection` / `get_doris_connection` return `MySQLdb.Connection`; SSL context construction is moved into the connector module. - `pyproject.toml`: mysql extra → `mysqlclient>=2.2`. Tests - `tests/connectors/test_mysql_connector.py` exercises 15+ MySQL field types via a testcontainer mysql. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The mssql connector now uses pyodbc directly with custom Arrow type inference and a datetimeoffset output converter. The mssql extra installs `pyodbc` instead of `ibis-framework[mssql]`. Highlights - `connector/mssql.py`: raw pyodbc cursor; sqlglot-based LIMIT/OFFSET rewrite (`OFFSET 0 ROWS FETCH NEXT n ROWS ONLY`); Arrow schema built from `cursor.description` + value sampling; dry_run falls back to `sys.dm_exec_describe_first_result_set` for precise error messages. - `model/data_source.py`: `_connect_mssql_pyodbc` builds an ODBC connection string with proper `{...}` escaping; `mssql://` URL parsing added; output converter for DATETIMEOFFSET (type code -155) decodes the 20-byte payload into a tz-aware datetime. - `pyproject.toml`: mssql extra -> `pyodbc>=5,<6`. Tests - `tests/connectors/test_mssql.py` covers int sizes (tinyint/smallint/ int/bigint), bit, varchar, decimal, datetime/datetime2, datetimeoffset (utc + non-utc), uniqueidentifier and varbinary; also exercises the URL connection path and pagination rewrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… dependency The ClickHouse connector now uses ``clickhouse-connect`` directly, parsing ClickHouse type strings via sqlglot to build PyArrow schemas rather than going through the ibis-project clickhouse backend. Highlights - ``connector/clickhouse.py``: native client; type lexer covers ``Nullable(T)`` / ``LowCardinality(T)`` / ``Array(T)`` / ``Tuple(...)`` / ``Map(K,V)`` / ``DateTime64(p, 'TZ')`` / ``Decimal(p,s)``, plus ``Int128/256`` and ``UInt128/256`` (surfaced as string to avoid silent truncation past 64-bit Arrow widths). - ``model/data_source.py::get_clickhouse_connection`` returns a ``clickhouse_connect.Client``; ``_handle_clickhouse_url`` now also accepts ``clickhouse+http://`` / ``clickhouse+https://`` URLs. - ``pyproject.toml``: clickhouse extra now pulls ``clickhouse-connect>=0.8`` instead of ``ibis-framework[clickhouse]``. Tests - ``tests/connectors/test_clickhouse.py`` exercises the full query path against a ClickHouse testcontainer (TPCH sf=0.01) and parametrises 35+ type strings through ``_parse_clickhouse_type``, including ``DateTime64`` with timezone and nested ``Tuple``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…bles SQL NULL in json/jsonb columns was being coerced into the string "null", breaking NULL semantics downstream. Drop the oid-114/3802 special case so None passes through unchanged. Arrow tables were built via dict(zip(names, arrays)), which silently drops duplicate column names (e.g. self-joins projecting two `id` columns). Switch to pa.Table.from_arrays(..., schema=schema) so positional construction keeps duplicates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace ``pa.table(dict(zip(...)), schema=schema)`` with positional ``pa.Table.from_arrays``. The dict-based form silently drops duplicate column names, which corrupts join results like ``SELECT a.id, b.id FROM t a, t b``. Add a regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The postgres branch returns a raw ``psycopg.Connection`` rather than an ibis ``BaseBackend``. Introduce a ``ConnectionLike`` alias (under TYPE_CHECKING) covering both shapes, apply it to ``get_connection``, and annotate ``get_postgres_connection`` with its concrete return type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CodeRabbit flagged that data_source.get_athena_connection re-implemented the pyathena connect-kwargs logic that also lives in connector/athena.py, letting them drift on schema_name / kill_on_interrupt / info.kwargs, and the get_connection signature claimed BaseBackend even though the Athena path returned a raw pyathena Connection. Route data_source.get_athena_connection through the connector's shared _build_connect_kwargs builder, and widen the return-type annotation to a BackendOrConnection Union so the type matches reality without forcing a runtime pyathena import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Reject Trino connection URLs without a username instead of silently falling back to "test"; raise a clear INVALID_CONNECTION_INFO error. - Wrap lazy ``import trino`` calls so users without the trino extra get a "pip install wren-engine[trino]" hint instead of a confusing ImportError. - Strip trailing whitespace and a single trailing semicolon before wrapping user SQL in ``SELECT * FROM (...) AS _sub LIMIT N`` for both ``query()`` and ``dry_run()``; previously a trailing ``;`` produced invalid SQL. - Route ``ConnectionUrl`` for trino through the native URL handler in ``DataSourceExtension.get_connection()`` so it no longer hits the removed generic ``ibis.connect()`` path. Adds regression unit tests covering each fix (semicolon stripping, missing-username URL, bad-scheme URL, and the ImportError path via monkeypatched ``__import__``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously only the password got unquote_plus() — username, the database path, and the query-string handling left other components encoded, so URL-encoded specials (e.g. '@' in a username, spaces in a database name) reached the ODBC driver still percent-encoded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous branch silently mishandled asymmetric credentials: user=None with password set crashed in password.get_secret_value(), and user set with password=None emitted UID= without PWD=, producing an incomplete ODBC connection string. Now: both absent falls back to Trusted_Connection=yes, exactly one absent raises INVALID_CONNECTION_INFO, both present uses normal SQL auth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

statement_timeout was cast to int only after pyodbc.connect() returned, so a non-numeric value (e.g. "5s") leaked the just-opened connection when int() raised ValueError. Validate and cast up-front; raise INVALID_CONNECTION_INFO before any connect call. Also adds unit tests with mocked pyodbc covering all three review findings: URL component decoding, asymmetric auth rejection, and the pre-connect timeout validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The native MySQL/Doris path returns a raw MySQLdb.Connection rather than an ibis BaseBackend. Reflect that in the return type via a ConnectionHandle union so static analysis matches runtime behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…type Address CodeRabbit major findings on the native MySQL connector: - Initialize ``_closed`` before opening the cursor and close the connection if the ``ANSI_QUOTES`` init query raises, so a failed init never leaks a live socket. - Coerce the ``limit`` argument via ``int()`` and reject negative values before interpolating into SQL — the previous f-string was a SQL-injection vector when callers passed an attacker-controlled value. - Append ``LIMIT n`` to the user's SQL (stripping a trailing semicolon) instead of wrapping it in ``SELECT * FROM (...) AS _sub``; the subquery approach failed with ``ER_DUP_FIELDNAME`` whenever the inner SELECT projected two columns with the same name (e.g. ``SELECT a.id, b.id`` in a join). ``dry_run`` switches to ``EXPLAIN <sql>`` for the same reason and to remain side-effect-free. - Derive ``DECIMAL`` precision/scale from ``cursor.description`` (PEP 249 ``precision``/``scale`` fields) instead of hard-coding ``decimal128(38, 9)``; that loss of precision silently truncated MySQL columns with ``D > 9``. Precision is recovered from MySQLdb's display-length convention and clamped to Arrow's ``decimal128`` maximum (38) plus MySQL's scale ceiling (30). Adds regression tests for duplicate-column SELECT, SQL-injection limit rejection, trailing semicolon, large-scale decimal round-trip, and unit tests for the new helpers. ``just test-mysql`` now runs both the engine suite and the type-coverage connector suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-2 review found _decimal_type defaulted scale=9 when the column typmod was missing, so Decimal.quantize silently rounded high-precision values (e.g. 18-significant-figure NUMERIC). Fall back to pa.string() for unconstrained NUMERIC and NUMERIC[] columns so the exact textual value round-trips. Same approach Trino's connector takes for dynamic-decimal casts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-2 review found that query() and dry_run() wrap user SQL as "SELECT * FROM ({sql}) AS _t LIMIT N", which Postgres/Canner reject when the inner SQL ends in a semicolon. Add a _strip_trailing_semicolon helper that only strips the terminating run of semicolons and whitespace (so semicolons inside string literals are preserved) and apply it on both call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lues MySQL ``TIME`` ranges ``-838:59:59`` to ``838:59:59``. PyArrow ``time64("us")`` only accepts 0-24h positive values, so the previous mapping silently truncated or corrupted any TIME value outside that window. Map to ``duration("us")`` instead so the full MySQL range round-trips, and convert MySQLdb's ``datetime.timedelta`` to signed microseconds explicitly (``timedelta.days`` is signed for negative values, ``seconds``/``microseconds`` are not — combining all three recovers the signed total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SQL Server TINYINT is an unsigned 8-bit integer (0..255), but the arrow-type helper was branching on the sampled value sign and could fall back to int8. Map internal_size == 1 directly to pa.uint8() so the schema reflects the driver-declared type regardless of the rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The DATETIMEOFFSET output converter assumed pyodbc would always hand it a 20-byte payload. Truncated or malformed buffers fell through and surfaced as a cryptic "month must be in 1..12" ValueError from datetime(). Reject non-20-byte payloads up front with a clear message that points at the actual length mismatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ettings In the ClickHouse client-kwargs assembly, ``out.update(kwargs)`` / ``client_kwargs.update(kwargs)`` would clobber the merged ``settings`` dict (carrying ``max_execution_time`` from ``statement_timeout``) whenever the caller also passed their own ``settings`` via ``kwargs``. Pop ``settings`` from incoming ``kwargs`` first and merge it into the local dict so the timeout survives. Also wrap driver ``TIMEOUT_EXCEEDED`` errors as the existing ``DatabaseTimeoutError`` instead of re-raising the raw driver exception, for consistency with the typed error model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wrap the readiness-loop client in try/finally so we close on failed attempts, not only on success. Replace the DuckDB TPCH extension fixture (which pulled the extension over the network on every run) with inline-fabricated rows so the test stays hermetic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous lazy initializers mutated module-level dicts/sets in place, so a thread that raced the first call could observe a partially-populated map. Switch each initializer to a @functools.cache'd accessor that returns a fully-built local container — the cache slot is GIL-protected and only published once initialization runs to completion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n sites ``pa.table(dict(zip(names, arrays)), schema=...)`` silently drops one of two columns that share a name because the intermediate dict collapses the key. Queries like ``SELECT a.id, b.id FROM t a JOIN t b`` would return a one-column table even though the cursor description carries both fields. Switch to ``pa.Table.from_arrays(arrays, schema=schema)`` so the schema is name-positional and both columns survive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

``mysql:8`` is a moving tag — when Docker Hub publishes a new ``8.x`` the CI run can pick up an image with different default flags (e.g. the deprecated-features warning behaviour changed between 8.0 and 8.4). Pin to ``mysql:8.0.36`` so the connector test suite stays reproducible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The unit-test job runs without the mysql extra, so importing MySQLdb fails. Guard the test with pytest.importorskip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-21T05:23:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ca0462eb-5ee1-47e5-a648-b788315dd0b7

📥 Commits

Reviewing files that changed from the base of the PR and between c1ba926 and ab79d14.

📒 Files selected for processing (4)

core/wren/src/wren/connector/clickhouse.py
core/wren/src/wren/connector/mssql.py
core/wren/tests/unit/test_clickhouse_helpers.py
core/wren/tests/unit/test_mssql_connection.py

Walkthrough

Replaces ibis-backed connectors with native DB-API implementations, updates factory routing and dependency groups, and adds extensive unit and integration tests to validate type parsing, value coercion, connection creation, and query/dry_run/close behavior.

Changes

Multi-Backend Native Connector Migration

Layer / File(s)	Summary
Configuration & test tasks `core/wren/justfile`, `core/wren/pyproject.toml`, `core/wren/tests/conftest.py`	Add just tasks for connector tests, register new pytest markers, and replace `ibis-framework[...]` extras with native driver optional deps (psycopg[binary], mysqlclient, clickhouse-connect, trino, pyodbc, pyathena).
Connector factory & base `core/wren/src/wren/connector/base.py`, `core/wren/src/wren/connector/factory.py`	Lazy-load ibis type imports under TYPE_CHECKING and update registry to route several DataSource values to dedicated native connector modules instead of the generic ibis module.
Athena native connector `core/wren/src/wren/connector/athena.py`, `core/wren/tests/unit/test_athena_connector.py`	New pyathena-backed connector with sqlglot-based type parsing, value coercion to PyArrow, credential resolution (static keys / web identity / boto3 chain), query/dry_run/close, and unit tests covering parsing, table building, credential flows, and error wrapping.
ClickHouse native connector `core/wren/src/wren/connector/clickhouse.py`, `core/wren/tests/connectors/test_clickhouse.py`, `core/wren/tests/unit/test_clickhouse_helpers.py`	New clickhouse-connect-backed connector with type descriptor parsing, client kwargs building, statement_timeout→settings mapping, SQL wrapping/stripping, error mapping, idempotent close, plus integration tests (container) and unit tests for parsing and settings merging.
Trino native connector `core/wren/src/wren/connector/trino.py`, `core/wren/tests/connectors/test_trino.py`	New trino.dbapi connector: sqlglot type parsing, value coercion, URL parsing/support for `trino+https`, lazy import with install hint, auth selection (token/basic), query/dry_run/close, and unit+integration TPCH tests.
Postgres native refactor `core/wren/src/wren/connector/postgres.py`, `core/wren/tests/connectors/test_postgres.py`	Postgres connector rewritten to psycopg3: OID→Arrow mapping, decimal handling, subquery-based LIMIT wrapping, dry_run via LIMIT 0, idempotent close, plus integration tests validating types, round-trips, NULLs, limits, and duplicate columns.
MySQL/Doris native refactor `core/wren/src/wren/connector/mysql.py`, `core/wren/tests/connectors/test_mysql.py`, `core/wren/tests/connectors/test_mysql_connector.py`, `core/wren/tests/unit/test_mysql_helpers.py`	MySql/Doris connectors reimplemented with mysqlclient: LIMIT helpers, cached FIELD_TYPE→Arrow mappings, DECIMAL precision/scale derivation and clamping, connection kwargs (including SSL), plus extensive integration/unit tests (type coverage, limit coercion/injection resistance, duplicate columns, TIME→duration handling, and concurrency regression for caches).
Canner native refactor `core/wren/src/wren/connector/canner.py`, `core/wren/tests/connectors/test_canner.py`	Canner connector now uses psycopg cursor execution with OID→Arrow mapping, JSON/JSONB handling, decimal quantization, semicolon-stripping, idempotent close, and tests verifying internals and end-to-end behavior.
MSSQL native refactor `core/wren/src/wren/connector/mssql.py`, `core/wren/tests/connectors/test_mssql.py`, `core/wren/tests/unit/test_mssql_connection.py`	MSSQL connector rewritten to pyodbc with cursor execution, SQL rewrite for pagination, sys.dm_exec_describe_first_result_set probing for dry_run errors, MSSQL-specific Arrow type inference (TINYINT→uint8, datetimeoffset decoding), close handling, and unit/integration tests for URL parsing, credential validation, statement_timeout, and decoding.
DataSource integration `core/wren/src/wren/model/data_source.py`	Introduce BackendOrConnection union, update DataSource/DataSourceExtension return types, and wire native connection builders for Athena, Canner, ClickHouse, MSSQL, MySQL/Doris, Postgres, and Trino; expand ClickHouse URL schemes and enforce secure=True for clickhouse+https.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Canner/WrenAI#2268: Concurrent refactor of Snowflake connector from ibis to native implementation with factory routing changes.

Suggested reviewers

douenergy
andreashimin

🐰 I hopped through code, from ibis to wire,
Native drivers now make queries sing and fly,
Types parsed, tables built, tests spin up with cheer,
Close calls idempotent — no more hidden tears,
A carrot for reviewers, speedy patches nigh!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/native-drivers-combined

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (7)

core/wren/tests/unit/test_athena_connector.py (2)
35-37: 💤 Low value

Consider using direct assignment for more reliable stubbing.

sys.modules.setdefault won't override if pyathena is already imported (e.g., in a test environment where the package is installed and another test imported it first). If test isolation is critical, consider explicit assignment or a fixture that saves/restores the original.

That said, for unit test suites with controlled import order, the current approach works.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_athena_connector.py` around lines 35 - 37, Replace
the non-overriding stub registration using sys.modules.setdefault with an
explicit assignment so the fake module reliably replaces any installed pyathena
during the test: save the original sys.modules.get("pyathena") into a temp (so
it can be restored), assign sys.modules["pyathena"] = _pyathena_module (which
already sets _pyathena_module.connect = _fake_pyathena_connect), and restore the
saved original after the test (or use a fixture/monkeypatch to set and undo the
assignment) to ensure test isolation.
312-328: 💤 Low value

The object.__setattr__ bypass is fragile but acceptable for this test.

The comment explains the intent—testing that the kwargs builder defensively reads an optional kwargs field. If AthenaConnectionInfo later adds a proper kwargs field, this test should be updated to use the model's constructor. For now, this adequately tests the connector's defensive handling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_athena_connector.py` around lines 312 - 328, Test
uses object.__setattr__ to inject a kwargs dict which is fragile; keep the
current defensive injection for now but change the test to construct
AthenaConnectionInfo with a real kwargs argument if/when AthenaConnectionInfo
gains a kwargs field. Update the test function
test_data_source_get_athena_connection_propagates_schema_and_kwargs to prefer
calling the AthenaConnectionInfo constructor with kwargs (instead of
object.__setattr__) when the model defines a kwargs attribute, otherwise fall
back to the current object.__setattr__ injection; reference AthenaConnectionInfo
and DataSourceExtension.get_athena_connection to locate the relevant code.
core/wren/tests/unit/test_mssql_connection.py (1)
57-77: ⚡ Quick win

Add a password decode case that covers + as space.

This test currently validates %20 decoding but not + handling. Adding a + case will lock in the expected unquote_plus behavior and prevent subtle credential parsing regressions.

Based on learnings: decode URL-derived passwords using urllib.parse.unquote_plus so + is treated as a space, and keep parsed.username as-is unless a verified mismatch exists.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/unit/test_mssql_connection.py` around lines 57 - 77, Add a
test case that asserts plus-signs in URL-encoded passwords decode to spaces by
updating test_mssql_url_decodes_user_database_and_password to include a password
containing '+' (e.g., "p+ss+word") and assert
DataSourceExtension.get_mssql_connection_from_url yields PWD "p ss word"; if the
implementation does not already use urllib.parse.unquote_plus for decoding
credentials, update the decoding in
DataSourceExtension.get_mssql_connection_from_url (and any helpers used by
_FakePyodbc/_parse_conn_str) to call urllib.parse.unquote_plus for the password
(and other credential fields that should treat '+' as space) while leaving
parsed.username unchanged unless you find a verified mismatch.
core/wren/src/wren/connector/postgres.py (2)
239-249: ⚖️ Poor tradeoff

Transaction left in aborted state after query errors.

When psycopg3 encounters an error during query execution, the transaction enters an aborted state requiring rollback() to recover. The exception handlers wrap the error but don't rollback, leaving the connection unusable for subsequent queries. The test file confirms this by explicitly calling connector.connection.rollback() as a workaround.

If the connector is intended for single-use, this may be acceptable. Otherwise, consider rolling back on exceptions or enabling autocommit mode.
♻️ Option: Rollback on exception
         except Exception as e:
+            try:
+                self.connection.rollback()
+            except Exception:
+                pass
             raise WrenError(
                 ErrorCode.GENERIC_USER_ERROR,
                 str(e),
                 phase=ErrorPhase.SQL_EXECUTION,
                 metadata={DIALECT_SQL: sql},
             ) from e
Also applies to: 256-266
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/connector/postgres.py` around lines 239 - 249, The
exception handlers that wrap DB errors (the except blocks that raise WrenError
with ErrorPhase.SQL_EXECUTION and metadata {DIALECT_SQL: sql}) do not roll back
the psycopg3 transaction, leaving the connection aborted; update those handlers
to call connection.rollback() (e.g., self.connection.rollback() or
connector.connection.rollback()) before raising so the connection is recovered
for further use, and apply the same change to the other symmetric handler block
referenced (the one around lines handling 256-266).
231-233: ⚡ Quick win

Consider validating limit defensively to prevent SQL injection.

While limit is typed as int | None, Python doesn't enforce this at runtime. A caller violating the type contract could inject arbitrary SQL. Given the PR objectives mention "limit sanitization to avoid SQL injection," consider adding explicit validation.
🛡️ Proposed defensive validation
     def query(self, sql: str, limit: int | None = None) -> pa.Table:
         if limit is not None:
+            if not isinstance(limit, int) or limit < 0:
+                raise ValueError("limit must be a non-negative integer")
             sql = f"SELECT * FROM ({sql}) AS _sub LIMIT {limit}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/connector/postgres.py` around lines 231 - 233, The query
method builds a LIMIT clause by interpolating the limit value directly; add
defensive validation in query(sql: str, limit: int | None = None) to ensure
limit is an actual non-negative integer (e.g., check isinstance(limit, int) and
limit >= 0 or attempt safe int() conversion and reject on failure) and raise
TypeError/ValueError for invalid values before constructing sql; keep using the
local variable limit only after validation so the f"SELECT * FROM ({sql}) AS
_sub LIMIT {limit}" interpolation cannot be exploited by a non-integer caller.
core/wren/src/wren/model/data_source.py (1)
57-59: 💤 Low value

Incomplete BackendOrConnection type alias.

The union is missing types returned by other connectors: clickhouse_connect.driver.client.Client (ClickHouse), pyodbc.Connection (MSSQL), and trino.dbapi.Connection (Trino). This causes type-checker mismatches for callers.
Suggested fix
 if TYPE_CHECKING:
     import MySQLdb
     import psycopg
     from pyathena.connection import Connection as PyAthenaConnection
+    from clickhouse_connect.driver.client import Client as ClickHouseClient
+    from trino.dbapi import Connection as TrinoConnection
 
 BackendOrConnection = Union[
-    BaseBackend, "PyAthenaConnection", "MySQLdb.Connection", "psycopg.Connection"
+    BaseBackend,
+    "PyAthenaConnection",
+    "MySQLdb.Connection",
+    "psycopg.Connection",
+    "pyodbc.Connection",
+    "ClickHouseClient",
+    "TrinoConnection",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/model/data_source.py` around lines 57 - 59, The
BackendOrConnection type alias in data_source.py is missing connector return
types causing mypy failures; update the BackendOrConnection Union (symbol:
BackendOrConnection) to include clickhouse_connect.driver.client.Client,
pyodbc.Connection, and trino.dbapi.Connection, and ensure imports are added
safely (use string literals or guard imports with TYPE_CHECKING) so the new
types are available to the type-checker without causing runtime import
side-effects.
core/wren/tests/connectors/test_trino.py (1)
275-297: 💤 Low value

Consider retry logic instead of fixed sleep for container readiness.

The time.sleep(5) is a pragmatic workaround for coordinator readiness, but it could either be too short (causing flaky tests) or too long (slowing CI). A retry loop with exponential backoff would be more robust.

That said, this is test code and the current approach is documented. This is a minor improvement that can be deferred.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/connectors/test_trino.py` around lines 275 - 297, The test
fixture engine currently uses a fixed time.sleep(5) after starting
TrinoContainer which can be flaky or slow; replace that fixed sleep with a retry
loop that polls the coordinator readiness (e.g., attempting a lightweight query
or connection) with exponential backoff and a max timeout before calling
_create_tpch_tables, referencing the engine fixture, TrinoContainer, and
_create_tpch_tables to locate where to implement the retry; ensure the loop
aborts with a clear test failure if readiness isn't achieved within the timeout.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/connector/clickhouse.py`:
- Around line 332-343: In dry_run, the input SQL isn't stripped of trailing
semicolons before being wrapped in the subquery; update the dry_run method
(function dry_run) to strip whitespace and trailing semicolons from the sql
argument (e.g., sql = sql.strip().rstrip(";")) before building the f"SELECT *
FROM ({sql}) AS _wren_sub LIMIT 0" query, preserving the existing exception
handling that maps TIMEOUT_EXCEEDED to DatabaseTimeoutError and other errors to
WrenError (ErrorCode.INVALID_SQL, phase=ErrorPhase.SQL_DRY_RUN,
metadata={DIALECT_SQL: sql}).
- Around line 264-265: The username/password extracted from the URL
(parsed.username and parsed.password) must be URL-decoded before use; update the
ClickHouse connector code that builds the auth dict to pass parsed.username and
parsed.password through urllib.parse.unquote_plus (or the repo's URL-decoding
helper) so percent-encoded characters (e.g. %40) are converted to their literal
characters before being sent to ClickHouse.
- Around line 315-330: The query method currently wraps the incoming sql into a
subquery without removing trailing semicolons, causing wrapped statements like
"SELECT * FROM (SELECT 1;) ..." to be invalid; fix by trimming trailing
semicolons and whitespace from the input sql before building the wrapped
statement (e.g., compute a cleaned_sql from the sql parameter and use that when
constructing statement when limit is set), but keep the original sql in
metadata={DIALECT_SQL: sql} if you want to preserve the caller's input; update
the logic in query (refer to the statement variable and the sql parameter) to
use the cleaned SQL for execution.

In `@core/wren/src/wren/connector/mssql.py`:
- Around line 41-50: The result construction currently zips names into a dict
which collapses duplicate column names; replace the dict(zip(names, arrays))
approach in the return of the method with building the pyarrow Table directly
from the arrays and the arrow_schema while preserving the original names (e.g.
use pa.Table.from_arrays or equivalent with names=names and
schema=arrow_schema). Locate the call site that uses cursor.description,
_build_mssql_arrow_schema, and _build_mssql_column and change the final return
to construct the table from arrays + schema + names so duplicate projected
columns (e.g. SELECT a, a) are preserved.

---

Nitpick comments:
In `@core/wren/src/wren/connector/postgres.py`:
- Around line 239-249: The exception handlers that wrap DB errors (the except
blocks that raise WrenError with ErrorPhase.SQL_EXECUTION and metadata
{DIALECT_SQL: sql}) do not roll back the psycopg3 transaction, leaving the
connection aborted; update those handlers to call connection.rollback() (e.g.,
self.connection.rollback() or connector.connection.rollback()) before raising so
the connection is recovered for further use, and apply the same change to the
other symmetric handler block referenced (the one around lines handling
256-266).
- Around line 231-233: The query method builds a LIMIT clause by interpolating
the limit value directly; add defensive validation in query(sql: str, limit: int
| None = None) to ensure limit is an actual non-negative integer (e.g., check
isinstance(limit, int) and limit >= 0 or attempt safe int() conversion and
reject on failure) and raise TypeError/ValueError for invalid values before
constructing sql; keep using the local variable limit only after validation so
the f"SELECT * FROM ({sql}) AS _sub LIMIT {limit}" interpolation cannot be
exploited by a non-integer caller.

In `@core/wren/src/wren/model/data_source.py`:
- Around line 57-59: The BackendOrConnection type alias in data_source.py is
missing connector return types causing mypy failures; update the
BackendOrConnection Union (symbol: BackendOrConnection) to include
clickhouse_connect.driver.client.Client, pyodbc.Connection, and
trino.dbapi.Connection, and ensure imports are added safely (use string literals
or guard imports with TYPE_CHECKING) so the new types are available to the
type-checker without causing runtime import side-effects.

In `@core/wren/tests/connectors/test_trino.py`:
- Around line 275-297: The test fixture engine currently uses a fixed
time.sleep(5) after starting TrinoContainer which can be flaky or slow; replace
that fixed sleep with a retry loop that polls the coordinator readiness (e.g.,
attempting a lightweight query or connection) with exponential backoff and a max
timeout before calling _create_tpch_tables, referencing the engine fixture,
TrinoContainer, and _create_tpch_tables to locate where to implement the retry;
ensure the loop aborts with a clear test failure if readiness isn't achieved
within the timeout.

In `@core/wren/tests/unit/test_athena_connector.py`:
- Around line 35-37: Replace the non-overriding stub registration using
sys.modules.setdefault with an explicit assignment so the fake module reliably
replaces any installed pyathena during the test: save the original
sys.modules.get("pyathena") into a temp (so it can be restored), assign
sys.modules["pyathena"] = _pyathena_module (which already sets
_pyathena_module.connect = _fake_pyathena_connect), and restore the saved
original after the test (or use a fixture/monkeypatch to set and undo the
assignment) to ensure test isolation.
- Around line 312-328: Test uses object.__setattr__ to inject a kwargs dict
which is fragile; keep the current defensive injection for now but change the
test to construct AthenaConnectionInfo with a real kwargs argument if/when
AthenaConnectionInfo gains a kwargs field. Update the test function
test_data_source_get_athena_connection_propagates_schema_and_kwargs to prefer
calling the AthenaConnectionInfo constructor with kwargs (instead of
object.__setattr__) when the model defines a kwargs attribute, otherwise fall
back to the current object.__setattr__ injection; reference AthenaConnectionInfo
and DataSourceExtension.get_athena_connection to locate the relevant code.

In `@core/wren/tests/unit/test_mssql_connection.py`:
- Around line 57-77: Add a test case that asserts plus-signs in URL-encoded
passwords decode to spaces by updating
test_mssql_url_decodes_user_database_and_password to include a password
containing '+' (e.g., "p+ss+word") and assert
DataSourceExtension.get_mssql_connection_from_url yields PWD "p ss word"; if the
implementation does not already use urllib.parse.unquote_plus for decoding
credentials, update the decoding in
DataSourceExtension.get_mssql_connection_from_url (and any helpers used by
_FakePyodbc/_parse_conn_str) to call urllib.parse.unquote_plus for the password
(and other credential fields that should treat '+' as space) while leaving
parsed.username unchanged unless you find a verified mismatch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6d880d85-dd2d-463e-a089-800df247ba4e

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae4eea and c1ba926.

⛔ Files ignored due to path filters (1)

core/wren/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (24)

core/wren/justfile
core/wren/pyproject.toml
core/wren/src/wren/connector/athena.py
core/wren/src/wren/connector/base.py
core/wren/src/wren/connector/canner.py
core/wren/src/wren/connector/clickhouse.py
core/wren/src/wren/connector/factory.py
core/wren/src/wren/connector/ibis.py
core/wren/src/wren/connector/mssql.py
core/wren/src/wren/connector/mysql.py
core/wren/src/wren/connector/postgres.py
core/wren/src/wren/connector/trino.py
core/wren/src/wren/model/data_source.py
core/wren/tests/conftest.py
core/wren/tests/connectors/test_canner.py
core/wren/tests/connectors/test_clickhouse.py
core/wren/tests/connectors/test_mssql.py
core/wren/tests/connectors/test_mysql.py
core/wren/tests/connectors/test_mysql_connector.py
core/wren/tests/connectors/test_postgres.py
core/wren/tests/connectors/test_trino.py
core/wren/tests/unit/test_athena_connector.py
core/wren/tests/unit/test_mssql_connection.py
core/wren/tests/unit/test_mysql_helpers.py

💤 Files with no reviewable changes (1)

core/wren/src/wren/connector/ibis.py

- clickhouse._build_clickhouse_client_kwargs: unquote_plus username/password parsed from a connection URL so percent-encoded credentials (e.g. %40 → @) don't reach ClickHouse verbatim and fail auth. - clickhouse.ClickHouseConnector.{query,dry_run}: strip the terminating run of ;/whitespace before wrapping in `SELECT * FROM (...) AS _wren_sub LIMIT N` — otherwise `SELECT 1;` becomes invalid SQL. Reuse canner's `[;\s]+\Z` regex so semicolons inside string literals are preserved. - mssql.MSSqlConnector.query and clickhouse._build_clickhouse_arrow_table: build the result table via `pa.Table.from_arrays(arrays, schema=...)` instead of `dict(zip(names, arrays))`, which silently collapses duplicate column names from projections like `SELECT a, a`. Adds pure-Python regression tests under tests/unit/ for all four behaviours.

Ibis is no longer a load-bearing dependency for the bulk of connectors after the recent direct-connector refactors (canner/postgres/mysql/mssql/ trino/clickhouse/athena dropped ibis in #2313, snowflake in #2268); the intro now only credits Apache DataFusion as the engine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

goldmedal and others added 30 commits May 14, 2026 11:18

fix(canner): dry_run must return None per ConnectorABC contract

43ebe33

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test(mysql): skip lazy-init test when MySQLdb not installed

59e1843

The unit-test job runs without the mysql extra, so importing MySQLdb fails. Guard the test with pytest.importorskip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

merge: refactor/athena-native-driver (PR #2271)

f83d59a

goldmedal added 7 commits May 21, 2026 13:09

merge: refactor/canner-native-driver (PR #2269)

ca58d8e

merge: refactor/clickhouse-native-driver (PR #2275)

7ad3492

merge: refactor/mssql-native-driver (PR #2274)

e1fba71

merge: refactor/mysql-native-driver (PR #2273)

15269c3

merge: refactor/postgres-native-driver (PR #2272)

ab3b2a0

merge: refactor/trino-native-driver (PR #2270)

9ad7642

chore: regenerate uv.lock for combined native-driver refactors

c1ba926

github-actions Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code core labels May 21, 2026

goldmedal changed the title ~~refactor: drop ibis for canner/postgres/mysql/mssql/trino/clickhouse/athena (combined)~~ refactor(wren): drop ibis for canner/postgres/mysql/mssql/trino/clickhouse/athena (combined) May 21, 2026

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread core/wren/src/wren/connector/clickhouse.py Outdated

Comment thread core/wren/src/wren/connector/clickhouse.py

Comment thread core/wren/src/wren/connector/clickhouse.py

Comment thread core/wren/src/wren/connector/mssql.py Outdated

goldmedal requested a review from douenergy May 21, 2026 06:19

douenergy approved these changes May 21, 2026

View reviewed changes

douenergy merged commit 6f67124 into main May 21, 2026
10 checks passed

goldmedal mentioned this pull request May 22, 2026

feat(wren): add YTsaurus (CHYT) connector #2258

Open

5 tasks

This was referenced May 24, 2026

docs: add support for Aurora PostgreSQL and MySQL #2320

Merged

refactor(wren): drop ibis-framework and remove DataSource.get_connection #2327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(wren): drop ibis for canner/postgres/mysql/mssql/trino/clickhouse/athena (combined)#2313

refactor(wren): drop ibis for canner/postgres/mysql/mssql/trino/clickhouse/athena (combined)#2313
douenergy merged 38 commits into
mainfrom
refactor/native-drivers-combined

goldmedal commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goldmedal commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Conflict resolutions

Test plan

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading