refactor(trino): use trino native driver, drop ibis dependency#2270
refactor(trino): use trino native driver, drop ibis dependency#2270goldmedal wants to merge 2 commits into
Conversation
The trino connector now uses the `trino` python client directly, parsing Trino type strings via sqlglot to build PyArrow schemas. The trino extra installs `trino>=0.333,<1` instead of `ibis-framework[trino]`. Highlights - `connector/trino.py`: native cursor execution; type-string -> Arrow via sqlglot, including row(...) / array(...) / map(...) / decimal(p,s) / timestamp with time zone. - `model/data_source.py::get_trino_connection`: returns `trino.dbapi.Connection`; native code path no longer routes through ibis. - `pyproject.toml`: trino extra -> `trino>=0.333,<1`. Tests - `tests/connectors/test_trino.py` covers ~36 Trino type categories including nested row/array/map plus testcontainer-backed query suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughThis PR replaces the Trino connector's dependency on ChangesTrino Native DB-API Connector Migration
Sequence Diagram(s)sequenceDiagram
participant TrinoConnector
participant trino.dbapi
participant cursor
participant sqlglot
participant PyArrow
TrinoConnector->>trino.dbapi: connect(host, port, auth, ...)
TrinoConnector->>cursor: execute(sql)
cursor-->>TrinoConnector: cursor.description (type strings)
TrinoConnector->>sqlglot: parse Trino type strings
sqlglot-->>TrinoConnector: Trino AST nodes
TrinoConnector->>TrinoConnector: _trino_data_type_to_arrow()
TrinoConnector-->>PyArrow: PyArrow DataType schema
cursor-->>TrinoConnector: rows with Python values
TrinoConnector->>TrinoConnector: _build_trino_column (normalize)
TrinoConnector-->>PyArrow: PyArrow Table
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
core/wren/tests/connectors/test_trino.py (1)
213-216: ⚡ Quick winPoll for readiness instead of sleeping a fixed 5 seconds.
This is still flaky on slow runners and always adds delay on fast ones. A short retry loop against a lightweight Trino query would make the integration suite much more stable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/connectors/test_trino.py` around lines 213 - 216, Replace the fixed sleep with a short retry loop that polls Trino readiness by executing a lightweight query (e.g., "SELECT 1") against the same host and port used by _create_tpch_tables(host, port); attempt the query every 0.5–1s, catch transient failures (including the "nodes is empty" error), and stop retrying when the query succeeds or a short timeout (e.g., 30s) elapses, then call _create_tpch_tables; update the test in test_trino.py to use this polling logic so fast runners don't wait and slow runners retry until the coordinator is ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/connector/trino.py`:
- Around line 355-356: Trailing semicolons on the SQL cause wrapping to produce
invalid SQL; before you build the wrapper f-string that uses the local variable
sql (the line that does sql = f"SELECT * FROM ({sql}) AS _sub LIMIT {limit}"),
strip any trailing semicolons and surrounding whitespace from sql (e.g., remove
any trailing ';' characters and trailing whitespace) so the subquery inside
parentheses is clean; apply the same change to the other occurrence around line
376 that wraps sql in a subquery.
- Around line 314-321: The current code silently defaults parsed.username to
"test" when building the connection dict (the "out" dict and its "user" key),
which can mask malformed URLs; instead validate parsed.username before using it
and fail fast: if parsed.username is None or empty raise a clear exception
(e.g., ValueError or custom error) with a message indicating the URL must
include a username, and remove the "test" default so the "user" value is only
set from parsed.username; update any callers of this code/path to handle the
raised error as needed.
- Around line 332-335: The Trino lazy imports inside TrinoConnector.__init__ can
raise ImportError after importlib.import_module() succeeds, so update
create_connector() to catch ImportError (or ModuleNotFoundError) thrown during
instantiation of TrinoConnector and re-raise a new ImportError that includes the
existing install hint (e.g., "pip install wren[trino]"); specifically, wrap the
TrinoConnector(...) call in a try/except that captures
ImportError/ModuleNotFoundError, preserve the original exception message or
chain it, and raise a clear ImportError with the install guidance so users see
the correct action when the optional trino dependency is missing.
In `@core/wren/src/wren/model/data_source.py`:
- Around line 427-454: The get_connection flow is routing Trino ConnectionUrl
inputs through the generic connection_url branch and into ibis.connect (which is
now removed), causing runtime failures; update the code that inspects connection
info (the logic around _build_connection_info and get_connection) to detect
Trino earlier and dispatch to get_trino_connection when the dialect/platform is
"trino" or when the parsed info is TrinoConnectionInfo, or alternatively
validate and reject ConnectionUrl for Trino at parse time so ConnectionUrl never
reaches the generic branch; ensure references to get_trino_connection,
TrinoConnectionInfo, ConnectionUrl and _build_connection_info are used to locate
and change the branching order or add the validation.
---
Nitpick comments:
In `@core/wren/tests/connectors/test_trino.py`:
- Around line 213-216: Replace the fixed sleep with a short retry loop that
polls Trino readiness by executing a lightweight query (e.g., "SELECT 1")
against the same host and port used by _create_tpch_tables(host, port); attempt
the query every 0.5–1s, catch transient failures (including the "nodes is empty"
error), and stop retrying when the query succeeds or a short timeout (e.g., 30s)
elapses, then call _create_tpch_tables; update the test in test_trino.py to use
this polling logic so fast runners don't wait and slow runners retry until the
coordinator is ready.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 53bdfb77-5f5d-45dd-8692-b3ebc674179e
⛔ Files ignored due to path filters (1)
core/wren/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
core/wren/justfilecore/wren/pyproject.tomlcore/wren/src/wren/connector/factory.pycore/wren/src/wren/connector/ibis.pycore/wren/src/wren/connector/trino.pycore/wren/src/wren/model/data_source.pycore/wren/tests/conftest.pycore/wren/tests/connectors/test_trino.py
💤 Files with no reviewable changes (1)
- core/wren/src/wren/connector/ibis.py
- Reject Trino connection URLs without a username instead of silently falling back to "test"; raise a clear INVALID_CONNECTION_INFO error. - Wrap lazy ``import trino`` calls so users without the trino extra get a "pip install wren-engine[trino]" hint instead of a confusing ImportError. - Strip trailing whitespace and a single trailing semicolon before wrapping user SQL in ``SELECT * FROM (...) AS _sub LIMIT N`` for both ``query()`` and ``dry_run()``; previously a trailing ``;`` produced invalid SQL. - Route ``ConnectionUrl`` for trino through the native URL handler in ``DataSourceExtension.get_connection()`` so it no longer hits the removed generic ``ibis.connect()`` path. Adds regression unit tests covering each fix (semicolon stripping, missing-username URL, bad-scheme URL, and the ImportError path via monkeypatched ``__import__``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Re: |
|
Superseded by #2313 — these seven native-driver refactors were consolidated into a single feature branch to resolve shared-file conflicts (data_source.py, pyproject.toml, uv.lock, factory.py, etc.) once instead of seven times. |
Summary
The trino connector now uses the
trinopython client directly, parsing Trino type strings via sqlglot to build PyArrow schemas. The trino extra installstrino>=0.333,<1instead ofibis-framework[trino].Highlights
connector/trino.py: native cursor execution; type-string -> Arrow via sqlglot, includingrow(...)/array(...)/map(...)/decimal(p,s)/timestamp with time zone.model/data_source.py::get_trino_connection: returnstrino.dbapi.Connection; the native code path no longer routes through ibis.pyproject.toml: trino extra ->trino>=0.333,<1.Test plan
_build_trino_columnunit tests cover map dict -> pairs, decimal string coercion, and row tuple -> dict conversion.wren.connector.trinodoes not pullibisintosys.modules.WrenQueryTestSuite(TPCHorders/customer).just lintpasses.Run locally:
just install-extra trino --dev just test-trino # requires Docker for the live coordinatorSummary by CodeRabbit
New Features
Dependencies
Tests