Skip to content

refactor(trino): use trino native driver, drop ibis dependency#2270

Closed
goldmedal wants to merge 2 commits into
mainfrom
refactor/trino-native-driver
Closed

refactor(trino): use trino native driver, drop ibis dependency#2270
goldmedal wants to merge 2 commits into
mainfrom
refactor/trino-native-driver

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented May 14, 2026

Summary

The trino connector now uses the trino python client directly, parsing Trino type strings via sqlglot to build PyArrow schemas. The trino extra installs trino>=0.333,<1 instead of ibis-framework[trino].

Highlights

  • connector/trino.py: native cursor execution; type-string -> Arrow via sqlglot, including row(...) / array(...) / map(...) / decimal(p,s) / timestamp with time zone.
  • model/data_source.py::get_trino_connection: returns trino.dbapi.Connection; the native code path no longer routes through ibis.
  • pyproject.toml: trino extra -> trino>=0.333,<1.

Test plan

  • Native type-parser unit tests cover boolean, tinyint, smallint, integer, bigint, real, double, varchar, char, varbinary, json, uuid, ipaddress, date, time, timestamp, timestamp with time zone, array, map, named/anonymous row, nested containers, decimal(p,s), and the unparseable / interval fallbacks.
  • _build_trino_column unit tests cover map dict -> pairs, decimal string coercion, and row tuple -> dict conversion.
  • Import-check test asserts wren.connector.trino does not pull ibis into sys.modules.
  • Trino testcontainer integration test exercises every supported type end-to-end against a live coordinator, plus the shared WrenQueryTestSuite (TPCH orders / customer).
  • just lint passes.

Run locally:

just install-extra trino --dev
just test-trino     # requires Docker for the live coordinator

Summary by CodeRabbit

  • New Features

    • Implemented native Trino connector with direct database access, improving type conversion and error handling.
    • Added support for authentication methods and timezone normalization.
  • Dependencies

    • Updated Trino dependency constraint to version 0.333 or higher.
  • Tests

    • Added comprehensive unit and integration test coverage for Trino connector functionality.

Review Change Stack

The trino connector now uses the `trino` python client directly,
parsing Trino type strings via sqlglot to build PyArrow schemas.
The trino extra installs `trino>=0.333,<1` instead of
`ibis-framework[trino]`.

Highlights
- `connector/trino.py`: native cursor execution; type-string -> Arrow
  via sqlglot, including row(...) / array(...) / map(...) /
  decimal(p,s) / timestamp with time zone.
- `model/data_source.py::get_trino_connection`: returns
  `trino.dbapi.Connection`; native code path no longer routes through
  ibis.
- `pyproject.toml`: trino extra -> `trino>=0.333,<1`.

Tests
- `tests/connectors/test_trino.py` covers ~36 Trino type categories
  including nested row/array/map plus testcontainer-backed query
  suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Warning

Rate limit exceeded

@goldmedal has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 30 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c69745a7-9016-43cc-9471-498a6135dd94

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3f67b and aa2c1b6.

📒 Files selected for processing (3)
  • core/wren/src/wren/connector/trino.py
  • core/wren/src/wren/model/data_source.py
  • core/wren/tests/connectors/test_trino.py

Walkthrough

This PR replaces the Trino connector's dependency on ibis-framework[trino] with a native Trino Python DB-API implementation. The new connector directly executes SQL via the Trino client, parses cursor type descriptions using sqlglot, maps Trino types to PyArrow schemas, and normalizes rows to Arrow arrays with full support for scalars, temporals, maps, and structs.

Changes

Trino Native DB-API Connector Migration

Layer / File(s) Summary
Native Trino Connector Implementation
core/wren/src/wren/connector/trino.py
New module implementing _parse_trino_data_type() and _trino_data_type_to_arrow() for type parsing, _build_trino_column() and _build_trino_arrow_table() for row normalization, connection helpers _build_trino_connect_kwargs() and _parse_trino_url() for URL/parameter construction, and TrinoConnector class with query/dry\_run/close methods plus create_connector() factory.
Connection Setup Integration in DataSource
core/wren/src/wren/model/data_source.py
DataSourceExtension.get_trino_connection() switched from ibis.trino.connect() to trino.dbapi.connect(), now explicitly managing BasicAuthentication, forcing timezone="UTC", and setting http_scheme="https".
Connector Registry and Dependency Wiring
core/wren/src/wren/connector/ibis.py, core/wren/src/wren/connector/factory.py, core/wren/pyproject.toml
Removed ibis TrinoConnector class and its mapping from _DATA_SOURCE_TO_CLASS, updated factory to route DataSource.trino to wren.connector.trino, and upgraded trino optional dependency to >=0.333,<1.
Unit and Integration Tests
core/wren/tests/connectors/test_trino.py
Parameterized unit tests for type parsing, decimal/struct/map handling, and ibis import isolation; integration tests with TrinoContainer that validate scalar, temporal, array, map, and row type mappings via live Trino queries against TPCH tables.
Build Configuration
core/wren/justfile, core/wren/tests/conftest.py
Added test-trino Just recipe and trino pytest marker indicating Docker requirement.

Sequence Diagram(s)

sequenceDiagram
  participant TrinoConnector
  participant trino.dbapi
  participant cursor
  participant sqlglot
  participant PyArrow
  TrinoConnector->>trino.dbapi: connect(host, port, auth, ...)
  TrinoConnector->>cursor: execute(sql)
  cursor-->>TrinoConnector: cursor.description (type strings)
  TrinoConnector->>sqlglot: parse Trino type strings
  sqlglot-->>TrinoConnector: Trino AST nodes
  TrinoConnector->>TrinoConnector: _trino_data_type_to_arrow()
  TrinoConnector-->>PyArrow: PyArrow DataType schema
  cursor-->>TrinoConnector: rows with Python values
  TrinoConnector->>TrinoConnector: _build_trino_column (normalize)
  TrinoConnector-->>PyArrow: PyArrow Table
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • onlyjackfrost
  • andreashimin
  • paopa

Poem

🐰 A rabbit hops through Trino's new DB-API path,
No ibis in sight, just arrows and math!
Types parse swift with sqlglot's keen sight,
Rows convert cleanly to schemas so bright—
Native and nimble, the connector takes flight! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: refactoring the Trino connector to use the native driver instead of ibis, which is the primary objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/trino-native-driver

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code core labels May 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
core/wren/tests/connectors/test_trino.py (1)

213-216: ⚡ Quick win

Poll for readiness instead of sleeping a fixed 5 seconds.

This is still flaky on slow runners and always adds delay on fast ones. A short retry loop against a lightweight Trino query would make the integration suite much more stable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/connectors/test_trino.py` around lines 213 - 216, Replace the
fixed sleep with a short retry loop that polls Trino readiness by executing a
lightweight query (e.g., "SELECT 1") against the same host and port used by
_create_tpch_tables(host, port); attempt the query every 0.5–1s, catch transient
failures (including the "nodes is empty" error), and stop retrying when the
query succeeds or a short timeout (e.g., 30s) elapses, then call
_create_tpch_tables; update the test in test_trino.py to use this polling logic
so fast runners don't wait and slow runners retry until the coordinator is
ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/connector/trino.py`:
- Around line 355-356: Trailing semicolons on the SQL cause wrapping to produce
invalid SQL; before you build the wrapper f-string that uses the local variable
sql (the line that does sql = f"SELECT * FROM ({sql}) AS _sub LIMIT {limit}"),
strip any trailing semicolons and surrounding whitespace from sql (e.g., remove
any trailing ';' characters and trailing whitespace) so the subquery inside
parentheses is clean; apply the same change to the other occurrence around line
376 that wraps sql in a subquery.
- Around line 314-321: The current code silently defaults parsed.username to
"test" when building the connection dict (the "out" dict and its "user" key),
which can mask malformed URLs; instead validate parsed.username before using it
and fail fast: if parsed.username is None or empty raise a clear exception
(e.g., ValueError or custom error) with a message indicating the URL must
include a username, and remove the "test" default so the "user" value is only
set from parsed.username; update any callers of this code/path to handle the
raised error as needed.
- Around line 332-335: The Trino lazy imports inside TrinoConnector.__init__ can
raise ImportError after importlib.import_module() succeeds, so update
create_connector() to catch ImportError (or ModuleNotFoundError) thrown during
instantiation of TrinoConnector and re-raise a new ImportError that includes the
existing install hint (e.g., "pip install wren[trino]"); specifically, wrap the
TrinoConnector(...) call in a try/except that captures
ImportError/ModuleNotFoundError, preserve the original exception message or
chain it, and raise a clear ImportError with the install guidance so users see
the correct action when the optional trino dependency is missing.

In `@core/wren/src/wren/model/data_source.py`:
- Around line 427-454: The get_connection flow is routing Trino ConnectionUrl
inputs through the generic connection_url branch and into ibis.connect (which is
now removed), causing runtime failures; update the code that inspects connection
info (the logic around _build_connection_info and get_connection) to detect
Trino earlier and dispatch to get_trino_connection when the dialect/platform is
"trino" or when the parsed info is TrinoConnectionInfo, or alternatively
validate and reject ConnectionUrl for Trino at parse time so ConnectionUrl never
reaches the generic branch; ensure references to get_trino_connection,
TrinoConnectionInfo, ConnectionUrl and _build_connection_info are used to locate
and change the branching order or add the validation.

---

Nitpick comments:
In `@core/wren/tests/connectors/test_trino.py`:
- Around line 213-216: Replace the fixed sleep with a short retry loop that
polls Trino readiness by executing a lightweight query (e.g., "SELECT 1")
against the same host and port used by _create_tpch_tables(host, port); attempt
the query every 0.5–1s, catch transient failures (including the "nodes is empty"
error), and stop retrying when the query succeeds or a short timeout (e.g., 30s)
elapses, then call _create_tpch_tables; update the test in test_trino.py to use
this polling logic so fast runners don't wait and slow runners retry until the
coordinator is ready.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 53bdfb77-5f5d-45dd-8692-b3ebc674179e

📥 Commits

Reviewing files that changed from the base of the PR and between 544ecab and 7b3f67b.

⛔ Files ignored due to path filters (1)
  • core/wren/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • core/wren/justfile
  • core/wren/pyproject.toml
  • core/wren/src/wren/connector/factory.py
  • core/wren/src/wren/connector/ibis.py
  • core/wren/src/wren/connector/trino.py
  • core/wren/src/wren/model/data_source.py
  • core/wren/tests/conftest.py
  • core/wren/tests/connectors/test_trino.py
💤 Files with no reviewable changes (1)
  • core/wren/src/wren/connector/ibis.py

Comment thread core/wren/src/wren/connector/trino.py
Comment thread core/wren/src/wren/connector/trino.py
Comment thread core/wren/src/wren/connector/trino.py Outdated
Comment thread core/wren/src/wren/model/data_source.py
- Reject Trino connection URLs without a username instead of silently
  falling back to "test"; raise a clear INVALID_CONNECTION_INFO error.
- Wrap lazy ``import trino`` calls so users without the trino extra
  get a "pip install wren-engine[trino]" hint instead of a confusing
  ImportError.
- Strip trailing whitespace and a single trailing semicolon before
  wrapping user SQL in ``SELECT * FROM (...) AS _sub LIMIT N`` for
  both ``query()`` and ``dry_run()``; previously a trailing ``;``
  produced invalid SQL.
- Route ``ConnectionUrl`` for trino through the native URL handler in
  ``DataSourceExtension.get_connection()`` so it no longer hits the
  removed generic ``ibis.connect()`` path.

Adds regression unit tests covering each fix (semicolon stripping,
missing-username URL, bad-scheme URL, and the ImportError path via
monkeypatched ``__import__``).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goldmedal
Copy link
Copy Markdown
Collaborator Author

Re: test_trino.py:213-216 fixed-5s sleep nitpick — skipping for now. This matches existing testcontainer setup patterns elsewhere in the suite; test-infra reliability is being tracked separately and we'd prefer to roll the polling-loop change across all connectors together.

@goldmedal
Copy link
Copy Markdown
Collaborator Author

Superseded by #2313 — these seven native-driver refactors were consolidated into a single feature branch to resolve shared-file conflicts (data_source.py, pyproject.toml, uv.lock, factory.py, etc.) once instead of seven times.

@goldmedal goldmedal closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant