refactor(canner): use psycopg native driver, drop ibis dependency by goldmedal · Pull Request #2269 · Canner/WrenAI

goldmedal · 2026-05-14T03:19:05Z

Summary

Canner Enterprise speaks the Postgres wire protocol; the connector now uses
psycopg directly instead of the ibis postgres backend.

Changes

connector/canner.py: native psycopg cursor with a self-contained PG OID -> Arrow type map covering the canner-flavoured types (VARCHAR/CHAR -> string, DECIMAL -> decimal128, BIGINT/INT/SMALLINT -> int, BOOLEAN -> bool, DATE/TIMESTAMP/TIMESTAMPTZ -> date/timestamp, ROW/ARRAY/MAP serialised as JSON strings). Errors are wrapped as WrenError with the dialect SQL attached, mirroring the existing postgres connector contract.
model/data_source.py::get_canner_connection: returns a psycopg.Connection (autocommit) instead of an ibis backend.

Tests

tests/connectors/test_canner.py exercises the type-mapping helpers and runs the connector against a PostgresContainer with the common canner result types (incl. JSON/JSONB and arrays).
New canner marker registered in tests/conftest.py; just test-canner target added.

Test plan

just test-canner (7 passed)
just lint
Verify against a live canner endpoint in a downstream environment

Wren Engine PR template acceptance criteria:

No ibis import path remains through wren.connector.canner.
Tests green.

Summary by CodeRabbit

New Features
- Connector now runs queries directly against Postgres with improved type mapping and more accurate handling of numerics, JSON, arrays, and limits; dry-run and connection-close behavior improved.
Tests
- Added comprehensive unit and integration tests covering types, query execution, limits, dry-run, error handling, and regressions; pytest marker for connector tests added.
Chores
- New test-canner recipe to run the connector test suite.

Canner Enterprise speaks the Postgres wire protocol; the connector now uses psycopg directly instead of the ibis postgres backend. Changes - `connector/canner.py`: native psycopg cursor with a self-contained PG OID -> Arrow type map covering the canner-flavoured types (VARCHAR/CHAR -> string, DECIMAL -> decimal128, BIGINT/INT/SMALLINT -> int, BOOLEAN -> bool, DATE/TIMESTAMP/TIMESTAMPTZ -> date/timestamp, ROW/ARRAY/MAP serialised as JSON strings). Errors are wrapped as WrenError with the dialect SQL attached, mirroring the existing postgres connector contract. - `model/data_source.py::get_canner_connection`: returns a `psycopg.Connection` (autocommit) instead of an ibis backend. Tests - `tests/connectors/test_canner.py` exercises the type-mapping helpers and runs the connector against a PostgresContainer with the common canner result types (incl. JSON/JSONB and arrays). Marker `canner` is registered in `tests/conftest.py` and a `just test-canner` target is added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-14T03:19:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c2f35e11-3805-4f24-a496-d8347a70a0e5

📥 Commits

Reviewing files that changed from the base of the PR and between bb18b0f and 43ebe33.

📒 Files selected for processing (2)

core/wren/src/wren/connector/canner.py
core/wren/tests/connectors/test_canner.py

🚧 Files skipped from review as they are similar to previous changes (2)

core/wren/src/wren/connector/canner.py
core/wren/tests/connectors/test_canner.py

Walkthrough

Replaces ibis-backed Canner connector with direct psycopg execution, adds Postgres OID→PyArrow type mapping and Arrow table construction, updates connection creation to use psycopg with autocommit, registers a canner pytest marker, and adds unit and testcontainer-based integration tests plus a justfile recipe.

Changes

Canner psycopg migration

Layer / File(s)	Summary
Type mapping and conversion utilities `core/wren/src/wren/connector/canner.py`	Postgres OID→PyArrow mapping, decimal precision/scale selection, decimal quantization/coercion, and Arrow array building with JSON/binary/list handling; table construction preserving duplicate names and typed empty columns.
Connector query execution `core/wren/src/wren/connector/canner.py`	CannerConnector executes SQL via psycopg cursor (optional LIMIT wrapper), constructs pyarrow.Table from cursor results, maps SQL errors to WrenError with SQL metadata, implements dry_run as LIMIT 0, and adds guarded close.
Connection factory `core/wren/src/wren/model/data_source.py`	DataSourceExtension.get_canner_connection now returns a psycopg.connect(...) with `dbname=info.workspace`, user/password from connection info, host/port, and `autocommit=True`.
Test setup and unit tests `core/wren/tests/conftest.py`, `core/wren/tests/connectors/test_canner.py`	Registered `canner` pytest marker; unit tests validate _arrow_type mappings, _build_column serialization/quantization, semicolon stripping; dry_run unit test; fixture DDL and rows cover scalar, numeric, temporal, array, and JSON types.
Integration/e2e tests `core/wren/tests/connectors/test_canner.py`	Testcontainers Postgres fixture and end-to-end tests asserting query() returns expected Arrow schema/values, limit enforcement, duplicate column-name preservation, dry_run success/failure, numeric/string round-trip regressions, and semicolon-wrapping behavior.
Test automation recipe `core/wren/justfile`	Added `test-canner` recipe to run the connector tests with pytest marker `canner`.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant CannerConnector
  participant psycopg
  participant PyArrow

  Caller->>CannerConnector: query(sql, limit)
  CannerConnector->>CannerConnector: wrap SQL with LIMIT if provided
  CannerConnector->>psycopg: cursor.execute(wrapped_sql)
  psycopg-->>CannerConnector: cursor.description + rows
  CannerConnector->>PyArrow: _build_arrow_table(cursor)
  PyArrow-->>CannerConnector: pyarrow.Table
  CannerConnector-->>Caller: return Table or raise WrenError (with metadata[DIALECT_SQL])

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped from ibis to psycopg land,
Rows became Arrows placed by hand,
Decimals trimmed, JSON twined as strings,
Containers hummed and the test bell rings,
A closed connection, and a recipe to stand.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: refactoring the Canner connector to use psycopg native driver instead of ibis, which is the primary objective across all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/canner-native-driver

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

core/wren/tests/connectors/test_canner.py (1)

208-212: ⚡ Quick win

Assert the SQL payload in WrenError for invalid dry-run

Line 211-212 verifies only the exception type. Please also assert the failing SQL is attached to the raised WrenError so the connector error-contract is locked by tests.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/connectors/test_canner.py` around lines 208 - 212, The test
only checks the exception type; update
test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised
WrenError via pytest.raises as excinfo when calling
canner_connector.dry_run("SELECT * FROM no_such_table") and assert the failing
SQL is attached to the error—e.g., assert getattr(excinfo.value, "sql", None) ==
"SELECT * FROM no_such_table" and/or assert "SELECT * FROM no_such_table" in
str(excinfo.value) so the connector's error contract (WrenError carrying the SQL
payload) is enforced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/connector/canner.py`:
- Line 210: The current return uses dict(zip([field.name for field in schema],
arrays)) which will drop duplicate column names; instead pass the arrays and
schema directly to PyArrow so duplicate field names are preserved—replace the
dict-based construction with a call like pa.table(arrays, schema=schema) or
pa.Table.from_arrays(arrays, schema=schema), keeping the existing schema and
arrays variables and the pa.table / pa.Table API to avoid name-based
overwriting.
- Around line 124-133: The code currently maps SQL NULLs for JSON/JSONB
(pg_type_oid in {114, 3802}) to the string "null" inside the arrow_type ==
pa.string() handling, which changes NULL semantics; instead, stop coercing None
to "null" — treat None like any other NULL by appending None to processed.
Locate the block handling arrow_type == pa.string(), remove the special-case
that appends "null" for pg_type_oid 114/3802, and ensure the loop over values
appends None for SQL NULLs (value is None) so downstream logic receives actual
nulls rather than the literal string; keep existing behavior for non-None values
unchanged.

---

Nitpick comments:
In `@core/wren/tests/connectors/test_canner.py`:
- Around line 208-212: The test only checks the exception type; update
test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised
WrenError via pytest.raises as excinfo when calling
canner_connector.dry_run("SELECT * FROM no_such_table") and assert the failing
SQL is attached to the error—e.g., assert getattr(excinfo.value, "sql", None) ==
"SELECT * FROM no_such_table" and/or assert "SELECT * FROM no_such_table" in
str(excinfo.value) so the connector's error contract (WrenError carrying the SQL
payload) is enforced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b5353458-9c2d-4703-843a-8f6eb0320e40

📥 Commits

Reviewing files that changed from the base of the PR and between 544ecab and a372ba4.

📒 Files selected for processing (5)

core/wren/justfile
core/wren/src/wren/connector/canner.py
core/wren/src/wren/model/data_source.py
core/wren/tests/conftest.py
core/wren/tests/connectors/test_canner.py

…bles SQL NULL in json/jsonb columns was being coerced into the string "null", breaking NULL semantics downstream. Drop the oid-114/3802 special case so None passes through unchanged. Arrow tables were built via dict(zip(names, arrays)), which silently drops duplicate column names (e.g. self-joins projecting two `id` columns). Switch to pa.Table.from_arrays(..., schema=schema) so positional construction keeps duplicates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/connector/canner.py`:
- Around line 91-100: The current _decimal_type function defaults unconstrained
NUMERIC columns to scale=9 which causes silent rounding later where values are
quantized; change the logic so that when column.scale is None you do NOT assign
a fixed scale (remove scale=9 default) and instead either (a) infer
precision/scale from observed row values before constructing pa.decimal128, or
(b) preserve values as strings or decimal.Decimal until an explicit
precision/scale is decided; update any downstream quantization code that
currently forces values to the _decimal_type scale (the quantization block
referenced around lines 114–116 and the type-resolution paths handling NUMERIC
and NUMERIC[] arrays) to respect the "unspecified scale" case and avoid rounding
unless an explicit target scale is chosen. Ensure references to _decimal_type,
the NUMERIC handling branch, and the array NUMERIC[] branch are updated so
unconstrained NUMERIC stays lossless until an explicit precision/scale is
determined.
- Line 221: The SQL string is embedded into subqueries at the points where sql
is wrapped (e.g., in the dry_run() path and the limited query wrapper that
builds "SELECT * FROM ({sql}) AS _t LIMIT {limit}"), and trailing semicolons
cause PostgreSQL to reject the subselect; before wrapping the user-supplied sql
variable, strip any trailing semicolons and whitespace (e.g., sql =
sql.rstrip().rstrip(';')) so the constructed subquery is valid—apply this
normalization wherever the code builds subqueries from the sql variable
(referencing the sql variable and the dry_run/limited-query construction sites).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2b93b718-223d-4ae1-884a-9809f3cb7bff

📥 Commits

Reviewing files that changed from the base of the PR and between a372ba4 and 793fdda.

📒 Files selected for processing (2)

core/wren/src/wren/connector/canner.py
core/wren/tests/connectors/test_canner.py

🚧 Files skipped from review as they are similar to previous changes (1)

core/wren/tests/connectors/test_canner.py

Round-2 review found _decimal_type defaulted scale=9 when the column typmod was missing, so Decimal.quantize silently rounded high-precision values (e.g. 18-significant-figure NUMERIC). Fall back to pa.string() for unconstrained NUMERIC and NUMERIC[] columns so the exact textual value round-trips. Same approach Trino's connector takes for dynamic-decimal casts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-2 review found that query() and dry_run() wrap user SQL as "SELECT * FROM ({sql}) AS _t LIMIT N", which Postgres/Canner reject when the inner SQL ends in a semicolon. Add a _strip_trailing_semicolon helper that only strips the terminating run of semicolons and whitespace (so semicolons inside string literals are preserved) and apply it on both call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

core/wren/tests/connectors/test_canner.py (1)

273-277: ⚡ Quick win

Assert the WrenError metadata contract in this failure-path test.

Line 276 currently verifies only exception type. The connector contract also requires dialect SQL to be attached; assert it here so regressions are caught.

Proposed test tightening

 def test_canner_connector_dry_run_raises_for_invalid_sql(canner_connector) -> None:
-    from wren.model.error import WrenError  # noqa: PLC0415
+    from wren.model.error import DIALECT_SQL, WrenError  # noqa: PLC0415

-    with pytest.raises(WrenError):
-        canner_connector.dry_run("SELECT * FROM no_such_table")
+    sql = "SELECT * FROM no_such_table"
+    with pytest.raises(WrenError) as exc:
+        canner_connector.dry_run(sql)
+    assert exc.value.metadata[DIALECT_SQL] == sql

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/connectors/test_canner.py` around lines 273 - 277, Update the
test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised
WrenError (use pytest.raises as excinfo) and assert the connector metadata
contract: verify excinfo.value.metadata contains the "dialect" key (e.g., equals
"sql") and that the "sql" (or "query") metadata includes the attempted SQL
string (e.g., contains "SELECT * FROM no_such_table"); locate this change around
the test_canner_connector_dry_run_raises_for_invalid_sql function and the
canner_connector.dry_run call.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@core/wren/tests/connectors/test_canner.py`:
- Around line 273-277: Update the
test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised
WrenError (use pytest.raises as excinfo) and assert the connector metadata
contract: verify excinfo.value.metadata contains the "dialect" key (e.g., equals
"sql") and that the "sql" (or "query") metadata includes the attempted SQL
string (e.g., contains "SELECT * FROM no_such_table"); locate this change around
the test_canner_connector_dry_run_raises_for_invalid_sql function and the
canner_connector.dry_run call.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0860f7a5-42d4-46b2-a4c2-59ae128f2e10

📥 Commits

Reviewing files that changed from the base of the PR and between 793fdda and bb18b0f.

📒 Files selected for processing (2)

core/wren/src/wren/connector/canner.py
core/wren/tests/connectors/test_canner.py

🚧 Files skipped from review as they are similar to previous changes (1)

core/wren/src/wren/connector/canner.py

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

goldmedal · 2026-05-14T04:49:56Z

Re: CodeRabbit nitpicks on test_canner.py:208-212 and test_canner.py:273-277 (tighten WrenError contract on dry-run failure paths) — skipping. The existing pytest.raises(WrenError) checks are sufficient for the error-type contract; the DIALECT_SQL metadata is already covered by other tests in the file, and we keep the assertion style consistent across the suite.

goldmedal · 2026-05-21T05:29:24Z

Superseded by #2313 — these seven native-driver refactors were consolidated into a single feature branch to resolve shared-file conflicts (data_source.py, pyproject.toml, uv.lock, factory.py, etc.) once instead of seven times.

github-actions Bot added python Pull requests that update Python code core labels May 14, 2026

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread core/wren/src/wren/connector/canner.py Outdated

Comment thread core/wren/src/wren/connector/canner.py Outdated

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread core/wren/src/wren/connector/canner.py Outdated

Comment thread core/wren/src/wren/connector/canner.py Outdated

goldmedal and others added 2 commits May 14, 2026 11:58

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

goldmedal mentioned this pull request May 14, 2026

refactor(postgres): use psycopg native driver, drop ibis dependency #2272

Closed

4 tasks

fix(canner): dry_run must return None per ConnectorABC contract

43ebe33

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

goldmedal requested a review from douenergy May 14, 2026 05:49

goldmedal mentioned this pull request May 21, 2026

refactor(wren): drop ibis for canner/postgres/mysql/mssql/trino/clickhouse/athena (combined) #2313

Merged

4 tasks

goldmedal closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(canner): use psycopg native driver, drop ibis dependency#2269

refactor(canner): use psycopg native driver, drop ibis dependency#2269
goldmedal wants to merge 5 commits into
mainfrom
refactor/canner-native-driver

goldmedal commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

goldmedal commented May 14, 2026

Uh oh!

goldmedal commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

goldmedal commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

goldmedal commented May 14, 2026

Uh oh!

goldmedal commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

goldmedal commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading