Skip to content

refactor(wren): drop ibis-framework and remove DataSource.get_connection#2327

Merged
douenergy merged 3 commits into
mainfrom
refactor/remove-ibis-framework
May 27, 2026
Merged

refactor(wren): drop ibis-framework and remove DataSource.get_connection#2327
douenergy merged 3 commits into
mainfrom
refactor/remove-ibis-framework

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented May 26, 2026

Summary

Two related cleanups that collapse the legacy connection-construction layer in wren.model.data_source:

  1. Remove ibis-framework as a wrenai dependency. All concrete connectors under wren.connector.* already build database connections natively (psycopg, MySQLdb, google-cloud-bigquery, oracledb, databricks-sql, snowflake-connector-python, clickhouse-connect, pyathena, trino, etc.). The remaining ibis touchpoints in data_source.py were unreachable dead code.
  2. Remove DataSource.get_connection entirely. Connectors now build their own driver connections directly in __init__. The dispatcher (DataSourceExtension.get_connection) and the per-source get_<name>_connection helpers it routed to were a thin pass-through that no concrete connector actually depended on.

After these two commits, wren.model.data_source only carries data-shape helpers (DataSource enum, get_connection_info, _build_connection_info, _handle_clickhouse_url). All driver imports and ODBC string assembly live next to the connectors that own them.

Commit 1 — drop ibis-framework

  • wren/connector/base.py: delete the unused IbisConnector base class and its ibis.expr.* imports.
  • wren/connector/__init__.py: drop IbisConnector from re-exports / __all__.
  • wren/model/data_source.py:
    • Remove module-level import ibis / from ibis import BaseBackend.
    • Replace the generic ibis.connect(connection_url) fallback with NotImplementedError directing callers to typed ConnectionInfo.
  • pyproject.toml:
    • Drop ibis-framework>=10 from base dependencies.
    • Switch the bigquery extra from ibis-framework[bigquery] to google-cloud-bigquery>=3.40,<4.
  • uv.lock: regenerated; ibis-framework and ibis-only transitive deps (atpublic, parsy, toolz, db-dtypes, google-cloud-bigquery-storage, pandas-gbq, pydata-google-auth, google-auth-oauthlib, requests-oauthlib) are gone.

Commit 2 — remove DataSource.get_connection

  • PostgresConnector.__init__: inline psycopg.connect(...). ConnectionUrl form raises WrenError(INVALID_CONNECTION_INFO) with guidance.
  • CannerConnector.__init__: inline psycopg.connect(...) with dbname=info.workspace.
  • MySqlConnector / DorisConnector: call MySQLdb.connect(**_build_mysql_connect_kwargs(info)) directly (the helpers already supported both typed and URL forms).
  • MSSqlConnector.__init__: calls a new module-local _build_mssql_connection(info) that owns URL-vs-typed branching plus pyodbc connection-string assembly. The _connect_mssql_pyodbc, _connect_mssql_from_url, _escape_odbc_value, _register_mssql_output_converters, _decode_mssql_datetimeoffset, and MSSQL_DATETIMEOFFSET_TYPE_CODE previously on DataSourceExtension move to wren/connector/mssql.py.
  • wren/model/data_source.py: delete DataSource.get_connection, the entire DataSourceExtension Enum (all get_<name>_connection helpers across athena/bigquery/canner/clickhouse/databricks/mssql/mysql/doris/oracle/postgres/snowflake/trino), BackendOrConnection Union, and the now-unused top-level imports.

Breaking changes

  • DataSource.<name>.get_connection(info) is removed. Use wren.connector.factory.get_connector(data_source, info) and read .connection on the returned connector if you need the raw driver handle.
  • DataSourceExtension is removed entirely.
  • IbisConnector is removed from wren.connector.
  • DataSourceExtension.get_bigquery_connection, get_oracle_connection, get_databricks_connection, get_athena_connection, get_postgres_connection, get_mysql_connection, get_doris_connection, get_canner_connection, get_clickhouse_connection, get_snowflake_connection, get_trino_connection, get_mssql_connection, get_mssql_connection_from_url, _connect_mssql_pyodbc, _decode_mssql_datetimeoffset are all gone.

Migration: instantiate the connector directly.

# before
from wren.model.data_source import DataSource
conn = DataSource.postgres.get_connection(info)
# after
from wren.connector.factory import get_connector
from wren.model.data_source import DataSource
connector = get_connector(DataSource.postgres, info)
conn = connector.connection  # raw psycopg.Connection

Verification

  • ruff format --check src/ and ruff check src/ pass.
  • uv sync --extra dev uninstalls ibis-framework from the project venv.
  • pytest tests/unit/530 passed, 38 skipped. The 5 tests/unit/test_cube_cli.py failures pre-date this PR; they trace to a missing cube_query_to_sql export in the locally installed wren_core wheel and are unrelated.
  • pytest tests/unit/test_mssql_connection.py tests/unit/test_athena_connector.py — 36/36 pass after migrating helpers to wren.connector.mssql and dropping the now-redundant dispatcher-parity tests.

Test plan

  • CI: full pytest matrix on linux + macos
  • CI: connector tests behind each pip extra still pass (bigquery, oracle, databricks, mssql especially)
  • Downstream: pip install wrenai in a fresh venv → ibis-framework is not pulled in
  • Downstream: anyone calling DataSource.<name>.get_connection(info) migrates to get_connector(...) per the snippet above

Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Streamlined connector architecture with direct native driver integration across PostgreSQL, MySQL, MSSQL, and other database connections
    • Removed ibis-framework as a dependency requirement
    • Updated BigQuery connector dependencies to use google-cloud-bigquery directly

Review Change Stack

All concrete connectors in wren.connector.* are native (psycopg, MySQLdb,
google-cloud-bigquery, oracledb, databricks-sql, snowflake-connector-python,
clickhouse-connect, pyathena, trino, etc.), so the runtime ibis paths in
data_source.get_*_connection() are dead code.

- Delete the unused IbisConnector base class and the ibis.expr.* imports it
  pulled in.
- Drop module-level ``import ibis`` / ``from ibis import BaseBackend`` in
  wren.model.data_source; tighten BackendOrConnection to the native types
  the remaining get_*_connection methods actually return.
- Remove get_bigquery_connection / get_oracle_connection /
  get_databricks_connection — their native connectors construct connections
  directly. Route the three through the unsupported-source NotImplementedError
  list in DataSourceExtension.get_connection so callers get a clear message
  to use wren.connector.factory.get_connector() instead.
- Replace the generic ibis.connect(connection_url) fallback with
  NotImplementedError telling callers to use a typed ConnectionInfo.
- pyproject: drop ``ibis-framework>=10`` from base deps; switch the bigquery
  extra from ``ibis-framework[bigquery]`` to ``google-cloud-bigquery``.

uv.lock no longer pins ibis-framework or its bigquery-side transitive deps
(db-dtypes, google-cloud-bigquery-storage, pandas-gbq, pydata-google-auth,
google-auth-oauthlib, requests-oauthlib, atpublic, parsy, toolz).

BREAKING CHANGE: DataSourceExtension.get_{bigquery,oracle,databricks}_connection
are removed. Use wren.connector.factory.get_connector(data_source, info)
to obtain a native connector; calling DataSource.<name>.get_connection() on
those three sources now raises NotImplementedError. IbisConnector is also
removed from wren.connector.
@github-actions github-actions Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code core labels May 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bff52e6c-f5c3-4edc-80c9-1d0442be70d4

📥 Commits

Reviewing files that changed from the base of the PR and between ac7c3e0 and bae4ba7.

📒 Files selected for processing (2)
  • core/wren/src/wren/connector/mssql.py
  • core/wren/tests/unit/test_mssql_connection.py

Walkthrough

This PR removes ibis-backed connection paths, eliminates the unconditional ibis dependency (replaces BigQuery extra with explicit google-cloud packages), removes IbisConnector from exports, tightens DataSource connection types to native drivers, moves connection construction into connector modules, and updates tests to call connector helpers.

Changes

Ibis Removal and Native Connector Migration

Layer / File(s) Summary
Dependency and package export cleanup
core/wren/pyproject.toml, core/wren/src/wren/connector/__init__.py, core/wren/src/wren/connector/base.py
Removes ibis-framework>=10 from main dependencies, replaces BigQuery extra with google-cloud-bigquery>=3.40,<4 and google-auth, removes IbisConnector from public exports, and simplifies the TYPE_CHECKING import pattern.
DataSource types and routing updates
core/wren/src/wren/model/data_source.py
Drops ibis/BaseBackend imports, narrows BackendOrConnection to native-driver connections, removes ibis-backed builders, and updates DataSourceExtension.get_connection() routing to disallow unsupported connection_url usage and route callers to connector factories.
Connector-level native connection construction
core/wren/src/wren/connector/mssql.py, core/wren/src/wren/connector/postgres.py, core/wren/src/wren/connector/mysql.py, core/wren/src/wren/connector/canner.py
Connectors now construct native-driver connections directly (psycopg, pyodbc, MySQLdb). MSSQL adds _build_mssql_connection, URL parsing helpers, ODBC escaping, converter registration, and DATETIMEOFFSET decoding. Canner/Postgres/MySQL/Doris constructors build connections without DataSource delegation.
Tests and import adjustments
core/wren/tests/...
Unit and connector tests updated to import and call connector-level helpers (MSSQL helpers, adjusted Athena and MSSQL assertions), patch connector module dependencies, and strengthen credential/assertion checks.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Canner/WrenAI#2313: Overlaps with this PR's ibis dependency removal and native connector routing refactor.

Suggested reviewers

  • douenergy
  • PaulChen79

Poem

🐰 I hopped through imports, nibbled ibis away,
Native drivers wake and greet the day.
BigQuery now speaks via google-cloud clear,
Connectors build close, tests call helpers near —
A cheerful rabbit gives a happy hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: removing ibis-framework and the DataSource.get_connection method, which are the primary refactoring goals across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/remove-ibis-framework

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/model/data_source.py`:
- Around line 283-305: The code currently raises NotImplementedError inside
DataSource.get_connection (for names like "bigquery", "oracle", etc.) but the
subsequent broad except Exception block wraps all exceptions into
WrenError(GET_CONNECTION_ERROR); change the exception handling so that if the
caught exception is a NotImplementedError it is re-raised unchanged (e.g., if
isinstance(e, NotImplementedError): raise) and only other exceptions are wrapped
into WrenError(GET_CONNECTION_ERROR); apply the same pattern for the other
similar branch (lines handling the other unsupported sources) so
NotImplementedError escapes as intended.
- Around line 292-301: The guard in DataSource.get_connection()
(core/wren/src/wren/model/data_source.py) omits file-backed sources that exist
in DataSourceExtension, causing calls for s3_file, minio_file, and gcs_file to
fall through to the generic missing-method path; update the set checked in the
if-statement inside get_connection() (the one currently listing "bigquery",
"databricks", "datafusion", "duckdb", "local_file", "oracle", "redshift",
"spark") to also include "s3_file", "minio_file", and "gcs_file" so those types
are explicitly blocked and return the intended migration guidance. Ensure the
change references the same set literal used in get_connection() to keep behavior
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dc1fe14d-cbf1-406e-b868-5f3edda167b6

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc2064 and cab6d6e.

⛔ Files ignored due to path filters (1)
  • core/wren/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • core/wren/pyproject.toml
  • core/wren/src/wren/connector/__init__.py
  • core/wren/src/wren/connector/base.py
  • core/wren/src/wren/model/data_source.py
💤 Files with no reviewable changes (1)
  • core/wren/src/wren/connector/base.py

Comment thread core/wren/src/wren/model/data_source.py Outdated
Comment thread core/wren/src/wren/model/data_source.py Outdated
…heir own

Every native connector under wren.connector.* now constructs its own
driver connection directly in __init__. The DataSource dispatcher and the
get_<name>_connection helpers it routed to had become a thin layer that
just unpacked typed ConnectionInfo into the driver's connect() call — for
data sources whose connectors already build connections natively (oracle,
bigquery, databricks, trino, clickhouse, snowflake, athena, redshift,
spark, duckdb, datafusion), the dispatcher was never reached.

Inline the remaining four:

- ``PostgresConnector.__init__`` calls ``psycopg.connect(host=..., port=...,
  dbname=..., user=..., password=..., **kwargs)`` directly. Passing a
  ``ConnectionUrl`` form raises ``WrenError(INVALID_CONNECTION_INFO)`` with
  guidance to use ``PostgresConnectionInfo`` instead.
- ``CannerConnector.__init__`` mirrors the same pattern with
  ``dbname=info.workspace``.
- ``MySqlConnector`` / ``DorisConnector`` call ``MySQLdb.connect(
  **_build_mysql_connect_kwargs(info))`` (already supported both typed and
  URL forms via the local helper).
- ``MSSqlConnector.__init__`` calls ``_build_mssql_connection(info)``, a
  new module-local helper that owns the URL-vs-typed branching plus the
  pyodbc connection-string assembly previously living on
  ``DataSourceExtension``.

Drop from ``wren/model/data_source.py``:

- ``DataSource.get_connection`` method.
- The whole ``DataSourceExtension`` Enum, including
  ``_connect_mssql_pyodbc``, ``_escape_odbc_value``,
  ``_register_mssql_output_converters``, ``_decode_mssql_datetimeoffset``,
  ``MSSQL_DATETIMEOFFSET_TYPE_CODE`` (moved to ``wren/connector/mssql.py``)
  and every ``get_<name>_connection`` static method (athena, canner,
  clickhouse, mssql, mysql, doris, postgres, snowflake, trino).
- ``BackendOrConnection`` Union alias and the TYPE_CHECKING imports it
  carried (``MySQLdb``, ``psycopg``, ``PyAthenaConnection``,
  ``TrinoDbApiConnection``).
- Top-level ``datetime`` / ``pyodbc`` / ``urlparse`` / ``unquote_plus`` /
  ``Enum`` imports that became unused once the class was gone.

After this change, ``DataSource`` only carries ``get_connection_info``
(which builds typed ``ConnectionInfo`` objects from request dicts +
applies header-driven timeouts) and ``_build_connection_info`` /
``_handle_clickhouse_url`` — pure data-shape helpers, no driver
involvement.

Tests are updated to import the relocated helpers from
``wren.connector.mssql`` and to use ``AthenaConnector(info)`` directly
where they previously asserted parity between the dispatcher and the
native connector path (that parity is now structurally enforced).

BREAKING CHANGE: ``DataSource.<name>.get_connection(info)`` no longer
exists. Callers should instantiate the connector directly:
``wren.connector.factory.get_connector(data_source, info)`` returns a
``ConnectorABC`` whose ``.connection`` attribute exposes the native
driver handle. ``DataSourceExtension`` is removed from public re-exports.
@goldmedal goldmedal changed the title refactor(wren): drop ibis-framework dependency refactor(wren): drop ibis-framework and remove DataSource.get_connection May 26, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/connector/mssql.py`:
- Around line 352-356: The current validation block in connector/mssql.py
incorrectly rejects credential-less MSSQL URLs by checking parsed.username;
update the validation so it only requires parsed.hostname and parsed.path
(database) and does not force a username, allowing the _connect_mssql_pyodbc()
path to handle the user is None and password is None case
(Trusted_Connection=yes). Keep raising
WrenError(ErrorCode.INVALID_CONNECTION_INFO, ...) when host or database is
missing, but remove the username presence check so integrated auth can proceed
to _connect_mssql_pyodbc().
- Around line 365-370: The username is being URL-decoded with unquote_plus which
turns literal '+' into spaces (causing usernames like "svc+etl" to become "svc
etl"); change the _connect_mssql_pyodbc call to pass parsed.username directly
(no unquote_plus) and only URL-decode the password (keep the existing
conditional unquote_plus(parsed.password) if parsed.password else None), leaving
database handling unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2c4b34fc-cbf3-4a35-a67b-5b909daf55fa

📥 Commits

Reviewing files that changed from the base of the PR and between cab6d6e and ac7c3e0.

📒 Files selected for processing (8)
  • core/wren/src/wren/connector/canner.py
  • core/wren/src/wren/connector/mssql.py
  • core/wren/src/wren/connector/mysql.py
  • core/wren/src/wren/connector/postgres.py
  • core/wren/src/wren/model/data_source.py
  • core/wren/tests/connectors/test_mssql.py
  • core/wren/tests/unit/test_athena_connector.py
  • core/wren/tests/unit/test_mssql_connection.py
💤 Files with no reviewable changes (1)
  • core/wren/tests/unit/test_athena_connector.py

Comment thread core/wren/src/wren/connector/mssql.py Outdated
Comment thread core/wren/src/wren/connector/mssql.py Outdated
…credentials

Two related URL-handling bugs in ``_connect_mssql_from_url`` (the helper
moved into ``wren/connector/mssql.py`` earlier in this branch):

1. **Trusted_Connection is now reachable via URL.** The validator
   rejected any URL without a username, but ``_connect_mssql_pyodbc``
   explicitly handles ``user is None and password is None`` by emitting
   ``Trusted_Connection=yes``. Drop the username check from the URL
   guard — only host + database are mandatory — and let the downstream
   validator own the user/password symmetry rule.

2. **``+`` in userinfo is now preserved.** ``unquote_plus`` treats ``+``
   as a space, which silently corrupted service-account names like
   ``svc+etl`` into ``svc etl``. Per RFC 3986, ``+`` only has form-
   encoded semantics inside query strings; userinfo and path treat it
   literally. Switch ``user`` / ``password`` / ``database`` from
   ``unquote_plus`` to ``unquote`` so ``%XX`` escapes still decode
   (e.g. ``us%40er`` → ``us@er``) while a literal ``+`` survives.

Add regression tests:

- ``test_mssql_url_preserves_literal_plus_in_user_and_password`` —
  ``svc+etl`` / ``p+wd`` survive unchanged.
- ``test_mssql_url_without_credentials_uses_trusted_connection`` — a
  credential-less ``mssql://host:port/db`` opens via
  ``Trusted_Connection=yes``.
- ``test_mssql_url_without_host_raises`` /
  ``test_mssql_url_without_database_raises`` — confirm the remaining
  guards still reject missing components.
@goldmedal goldmedal requested a review from douenergy May 26, 2026 10:14
@douenergy douenergy merged commit 2812d13 into main May 27, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants