refactor(wren): drop ibis-framework and remove DataSource.get_connection#2327
Conversation
All concrete connectors in wren.connector.* are native (psycopg, MySQLdb,
google-cloud-bigquery, oracledb, databricks-sql, snowflake-connector-python,
clickhouse-connect, pyathena, trino, etc.), so the runtime ibis paths in
data_source.get_*_connection() are dead code.
- Delete the unused IbisConnector base class and the ibis.expr.* imports it
pulled in.
- Drop module-level ``import ibis`` / ``from ibis import BaseBackend`` in
wren.model.data_source; tighten BackendOrConnection to the native types
the remaining get_*_connection methods actually return.
- Remove get_bigquery_connection / get_oracle_connection /
get_databricks_connection — their native connectors construct connections
directly. Route the three through the unsupported-source NotImplementedError
list in DataSourceExtension.get_connection so callers get a clear message
to use wren.connector.factory.get_connector() instead.
- Replace the generic ibis.connect(connection_url) fallback with
NotImplementedError telling callers to use a typed ConnectionInfo.
- pyproject: drop ``ibis-framework>=10`` from base deps; switch the bigquery
extra from ``ibis-framework[bigquery]`` to ``google-cloud-bigquery``.
uv.lock no longer pins ibis-framework or its bigquery-side transitive deps
(db-dtypes, google-cloud-bigquery-storage, pandas-gbq, pydata-google-auth,
google-auth-oauthlib, requests-oauthlib, atpublic, parsy, toolz).
BREAKING CHANGE: DataSourceExtension.get_{bigquery,oracle,databricks}_connection
are removed. Use wren.connector.factory.get_connector(data_source, info)
to obtain a native connector; calling DataSource.<name>.get_connection() on
those three sources now raises NotImplementedError. IbisConnector is also
removed from wren.connector.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR removes ibis-backed connection paths, eliminates the unconditional ibis dependency (replaces BigQuery extra with explicit google-cloud packages), removes IbisConnector from exports, tightens DataSource connection types to native drivers, moves connection construction into connector modules, and updates tests to call connector helpers. ChangesIbis Removal and Native Connector Migration
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/model/data_source.py`:
- Around line 283-305: The code currently raises NotImplementedError inside
DataSource.get_connection (for names like "bigquery", "oracle", etc.) but the
subsequent broad except Exception block wraps all exceptions into
WrenError(GET_CONNECTION_ERROR); change the exception handling so that if the
caught exception is a NotImplementedError it is re-raised unchanged (e.g., if
isinstance(e, NotImplementedError): raise) and only other exceptions are wrapped
into WrenError(GET_CONNECTION_ERROR); apply the same pattern for the other
similar branch (lines handling the other unsupported sources) so
NotImplementedError escapes as intended.
- Around line 292-301: The guard in DataSource.get_connection()
(core/wren/src/wren/model/data_source.py) omits file-backed sources that exist
in DataSourceExtension, causing calls for s3_file, minio_file, and gcs_file to
fall through to the generic missing-method path; update the set checked in the
if-statement inside get_connection() (the one currently listing "bigquery",
"databricks", "datafusion", "duckdb", "local_file", "oracle", "redshift",
"spark") to also include "s3_file", "minio_file", and "gcs_file" so those types
are explicitly blocked and return the intended migration guidance. Ensure the
change references the same set literal used in get_connection() to keep behavior
consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dc1fe14d-cbf1-406e-b868-5f3edda167b6
⛔ Files ignored due to path filters (1)
core/wren/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
core/wren/pyproject.tomlcore/wren/src/wren/connector/__init__.pycore/wren/src/wren/connector/base.pycore/wren/src/wren/model/data_source.py
💤 Files with no reviewable changes (1)
- core/wren/src/wren/connector/base.py
…heir own Every native connector under wren.connector.* now constructs its own driver connection directly in __init__. The DataSource dispatcher and the get_<name>_connection helpers it routed to had become a thin layer that just unpacked typed ConnectionInfo into the driver's connect() call — for data sources whose connectors already build connections natively (oracle, bigquery, databricks, trino, clickhouse, snowflake, athena, redshift, spark, duckdb, datafusion), the dispatcher was never reached. Inline the remaining four: - ``PostgresConnector.__init__`` calls ``psycopg.connect(host=..., port=..., dbname=..., user=..., password=..., **kwargs)`` directly. Passing a ``ConnectionUrl`` form raises ``WrenError(INVALID_CONNECTION_INFO)`` with guidance to use ``PostgresConnectionInfo`` instead. - ``CannerConnector.__init__`` mirrors the same pattern with ``dbname=info.workspace``. - ``MySqlConnector`` / ``DorisConnector`` call ``MySQLdb.connect( **_build_mysql_connect_kwargs(info))`` (already supported both typed and URL forms via the local helper). - ``MSSqlConnector.__init__`` calls ``_build_mssql_connection(info)``, a new module-local helper that owns the URL-vs-typed branching plus the pyodbc connection-string assembly previously living on ``DataSourceExtension``. Drop from ``wren/model/data_source.py``: - ``DataSource.get_connection`` method. - The whole ``DataSourceExtension`` Enum, including ``_connect_mssql_pyodbc``, ``_escape_odbc_value``, ``_register_mssql_output_converters``, ``_decode_mssql_datetimeoffset``, ``MSSQL_DATETIMEOFFSET_TYPE_CODE`` (moved to ``wren/connector/mssql.py``) and every ``get_<name>_connection`` static method (athena, canner, clickhouse, mssql, mysql, doris, postgres, snowflake, trino). - ``BackendOrConnection`` Union alias and the TYPE_CHECKING imports it carried (``MySQLdb``, ``psycopg``, ``PyAthenaConnection``, ``TrinoDbApiConnection``). - Top-level ``datetime`` / ``pyodbc`` / ``urlparse`` / ``unquote_plus`` / ``Enum`` imports that became unused once the class was gone. After this change, ``DataSource`` only carries ``get_connection_info`` (which builds typed ``ConnectionInfo`` objects from request dicts + applies header-driven timeouts) and ``_build_connection_info`` / ``_handle_clickhouse_url`` — pure data-shape helpers, no driver involvement. Tests are updated to import the relocated helpers from ``wren.connector.mssql`` and to use ``AthenaConnector(info)`` directly where they previously asserted parity between the dispatcher and the native connector path (that parity is now structurally enforced). BREAKING CHANGE: ``DataSource.<name>.get_connection(info)`` no longer exists. Callers should instantiate the connector directly: ``wren.connector.factory.get_connector(data_source, info)`` returns a ``ConnectorABC`` whose ``.connection`` attribute exposes the native driver handle. ``DataSourceExtension`` is removed from public re-exports.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/connector/mssql.py`:
- Around line 352-356: The current validation block in connector/mssql.py
incorrectly rejects credential-less MSSQL URLs by checking parsed.username;
update the validation so it only requires parsed.hostname and parsed.path
(database) and does not force a username, allowing the _connect_mssql_pyodbc()
path to handle the user is None and password is None case
(Trusted_Connection=yes). Keep raising
WrenError(ErrorCode.INVALID_CONNECTION_INFO, ...) when host or database is
missing, but remove the username presence check so integrated auth can proceed
to _connect_mssql_pyodbc().
- Around line 365-370: The username is being URL-decoded with unquote_plus which
turns literal '+' into spaces (causing usernames like "svc+etl" to become "svc
etl"); change the _connect_mssql_pyodbc call to pass parsed.username directly
(no unquote_plus) and only URL-decode the password (keep the existing
conditional unquote_plus(parsed.password) if parsed.password else None), leaving
database handling unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2c4b34fc-cbf3-4a35-a67b-5b909daf55fa
📒 Files selected for processing (8)
core/wren/src/wren/connector/canner.pycore/wren/src/wren/connector/mssql.pycore/wren/src/wren/connector/mysql.pycore/wren/src/wren/connector/postgres.pycore/wren/src/wren/model/data_source.pycore/wren/tests/connectors/test_mssql.pycore/wren/tests/unit/test_athena_connector.pycore/wren/tests/unit/test_mssql_connection.py
💤 Files with no reviewable changes (1)
- core/wren/tests/unit/test_athena_connector.py
…credentials Two related URL-handling bugs in ``_connect_mssql_from_url`` (the helper moved into ``wren/connector/mssql.py`` earlier in this branch): 1. **Trusted_Connection is now reachable via URL.** The validator rejected any URL without a username, but ``_connect_mssql_pyodbc`` explicitly handles ``user is None and password is None`` by emitting ``Trusted_Connection=yes``. Drop the username check from the URL guard — only host + database are mandatory — and let the downstream validator own the user/password symmetry rule. 2. **``+`` in userinfo is now preserved.** ``unquote_plus`` treats ``+`` as a space, which silently corrupted service-account names like ``svc+etl`` into ``svc etl``. Per RFC 3986, ``+`` only has form- encoded semantics inside query strings; userinfo and path treat it literally. Switch ``user`` / ``password`` / ``database`` from ``unquote_plus`` to ``unquote`` so ``%XX`` escapes still decode (e.g. ``us%40er`` → ``us@er``) while a literal ``+`` survives. Add regression tests: - ``test_mssql_url_preserves_literal_plus_in_user_and_password`` — ``svc+etl`` / ``p+wd`` survive unchanged. - ``test_mssql_url_without_credentials_uses_trusted_connection`` — a credential-less ``mssql://host:port/db`` opens via ``Trusted_Connection=yes``. - ``test_mssql_url_without_host_raises`` / ``test_mssql_url_without_database_raises`` — confirm the remaining guards still reject missing components.
Summary
Two related cleanups that collapse the legacy connection-construction layer in
wren.model.data_source:ibis-frameworkas a wrenai dependency. All concrete connectors underwren.connector.*already build database connections natively (psycopg, MySQLdb, google-cloud-bigquery, oracledb, databricks-sql, snowflake-connector-python, clickhouse-connect, pyathena, trino, etc.). The remaining ibis touchpoints indata_source.pywere unreachable dead code.DataSource.get_connectionentirely. Connectors now build their own driver connections directly in__init__. The dispatcher (DataSourceExtension.get_connection) and the per-sourceget_<name>_connectionhelpers it routed to were a thin pass-through that no concrete connector actually depended on.After these two commits,
wren.model.data_sourceonly carries data-shape helpers (DataSourceenum,get_connection_info,_build_connection_info,_handle_clickhouse_url). All driver imports and ODBC string assembly live next to the connectors that own them.Commit 1 — drop
ibis-frameworkwren/connector/base.py: delete the unusedIbisConnectorbase class and itsibis.expr.*imports.wren/connector/__init__.py: dropIbisConnectorfrom re-exports /__all__.wren/model/data_source.py:import ibis/from ibis import BaseBackend.ibis.connect(connection_url)fallback withNotImplementedErrordirecting callers to typedConnectionInfo.pyproject.toml:ibis-framework>=10from base dependencies.bigqueryextra fromibis-framework[bigquery]togoogle-cloud-bigquery>=3.40,<4.uv.lock: regenerated;ibis-frameworkand ibis-only transitive deps (atpublic,parsy,toolz,db-dtypes,google-cloud-bigquery-storage,pandas-gbq,pydata-google-auth,google-auth-oauthlib,requests-oauthlib) are gone.Commit 2 — remove
DataSource.get_connectionPostgresConnector.__init__: inlinepsycopg.connect(...).ConnectionUrlform raisesWrenError(INVALID_CONNECTION_INFO)with guidance.CannerConnector.__init__: inlinepsycopg.connect(...)withdbname=info.workspace.MySqlConnector/DorisConnector: callMySQLdb.connect(**_build_mysql_connect_kwargs(info))directly (the helpers already supported both typed and URL forms).MSSqlConnector.__init__: calls a new module-local_build_mssql_connection(info)that owns URL-vs-typed branching plus pyodbc connection-string assembly. The_connect_mssql_pyodbc,_connect_mssql_from_url,_escape_odbc_value,_register_mssql_output_converters,_decode_mssql_datetimeoffset, andMSSQL_DATETIMEOFFSET_TYPE_CODEpreviously onDataSourceExtensionmove towren/connector/mssql.py.wren/model/data_source.py: deleteDataSource.get_connection, the entireDataSourceExtensionEnum (allget_<name>_connectionhelpers across athena/bigquery/canner/clickhouse/databricks/mssql/mysql/doris/oracle/postgres/snowflake/trino),BackendOrConnectionUnion, and the now-unused top-level imports.Breaking changes
DataSource.<name>.get_connection(info)is removed. Usewren.connector.factory.get_connector(data_source, info)and read.connectionon the returned connector if you need the raw driver handle.DataSourceExtensionis removed entirely.IbisConnectoris removed fromwren.connector.DataSourceExtension.get_bigquery_connection,get_oracle_connection,get_databricks_connection,get_athena_connection,get_postgres_connection,get_mysql_connection,get_doris_connection,get_canner_connection,get_clickhouse_connection,get_snowflake_connection,get_trino_connection,get_mssql_connection,get_mssql_connection_from_url,_connect_mssql_pyodbc,_decode_mssql_datetimeoffsetare all gone.Migration: instantiate the connector directly.
Verification
ruff format --check src/andruff check src/pass.uv sync --extra devuninstallsibis-frameworkfrom the project venv.pytest tests/unit/— 530 passed, 38 skipped. The 5tests/unit/test_cube_cli.pyfailures pre-date this PR; they trace to a missingcube_query_to_sqlexport in the locally installedwren_corewheel and are unrelated.pytest tests/unit/test_mssql_connection.py tests/unit/test_athena_connector.py— 36/36 pass after migrating helpers towren.connector.mssqland dropping the now-redundant dispatcher-parity tests.Test plan
pip install wrenaiin a fresh venv →ibis-frameworkis not pulled inDataSource.<name>.get_connection(info)migrates toget_connector(...)per the snippet aboveGenerated with Claude Code
Summary by CodeRabbit