refactor(canner): use psycopg native driver, drop ibis dependency#2269
refactor(canner): use psycopg native driver, drop ibis dependency#2269goldmedal wants to merge 5 commits into
Conversation
Canner Enterprise speaks the Postgres wire protocol; the connector now uses psycopg directly instead of the ibis postgres backend. Changes - `connector/canner.py`: native psycopg cursor with a self-contained PG OID -> Arrow type map covering the canner-flavoured types (VARCHAR/CHAR -> string, DECIMAL -> decimal128, BIGINT/INT/SMALLINT -> int, BOOLEAN -> bool, DATE/TIMESTAMP/TIMESTAMPTZ -> date/timestamp, ROW/ARRAY/MAP serialised as JSON strings). Errors are wrapped as WrenError with the dialect SQL attached, mirroring the existing postgres connector contract. - `model/data_source.py::get_canner_connection`: returns a `psycopg.Connection` (autocommit) instead of an ibis backend. Tests - `tests/connectors/test_canner.py` exercises the type-mapping helpers and runs the connector against a PostgresContainer with the common canner result types (incl. JSON/JSONB and arrays). Marker `canner` is registered in `tests/conftest.py` and a `just test-canner` target is added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughReplaces ibis-backed Canner connector with direct psycopg execution, adds Postgres OID→PyArrow type mapping and Arrow table construction, updates connection creation to use psycopg with autocommit, registers a ChangesCanner psycopg migration
Sequence Diagram(s)sequenceDiagram
participant Caller
participant CannerConnector
participant psycopg
participant PyArrow
Caller->>CannerConnector: query(sql, limit)
CannerConnector->>CannerConnector: wrap SQL with LIMIT if provided
CannerConnector->>psycopg: cursor.execute(wrapped_sql)
psycopg-->>CannerConnector: cursor.description + rows
CannerConnector->>PyArrow: _build_arrow_table(cursor)
PyArrow-->>CannerConnector: pyarrow.Table
CannerConnector-->>Caller: return Table or raise WrenError (with metadata[DIALECT_SQL])
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
core/wren/tests/connectors/test_canner.py (1)
208-212: ⚡ Quick winAssert the SQL payload in
WrenErrorfor invalid dry-runLine 211-212 verifies only the exception type. Please also assert the failing SQL is attached to the raised
WrenErrorso the connector error-contract is locked by tests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/connectors/test_canner.py` around lines 208 - 212, The test only checks the exception type; update test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised WrenError via pytest.raises as excinfo when calling canner_connector.dry_run("SELECT * FROM no_such_table") and assert the failing SQL is attached to the error—e.g., assert getattr(excinfo.value, "sql", None) == "SELECT * FROM no_such_table" and/or assert "SELECT * FROM no_such_table" in str(excinfo.value) so the connector's error contract (WrenError carrying the SQL payload) is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/connector/canner.py`:
- Line 210: The current return uses dict(zip([field.name for field in schema],
arrays)) which will drop duplicate column names; instead pass the arrays and
schema directly to PyArrow so duplicate field names are preserved—replace the
dict-based construction with a call like pa.table(arrays, schema=schema) or
pa.Table.from_arrays(arrays, schema=schema), keeping the existing schema and
arrays variables and the pa.table / pa.Table API to avoid name-based
overwriting.
- Around line 124-133: The code currently maps SQL NULLs for JSON/JSONB
(pg_type_oid in {114, 3802}) to the string "null" inside the arrow_type ==
pa.string() handling, which changes NULL semantics; instead, stop coercing None
to "null" — treat None like any other NULL by appending None to processed.
Locate the block handling arrow_type == pa.string(), remove the special-case
that appends "null" for pg_type_oid 114/3802, and ensure the loop over values
appends None for SQL NULLs (value is None) so downstream logic receives actual
nulls rather than the literal string; keep existing behavior for non-None values
unchanged.
---
Nitpick comments:
In `@core/wren/tests/connectors/test_canner.py`:
- Around line 208-212: The test only checks the exception type; update
test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised
WrenError via pytest.raises as excinfo when calling
canner_connector.dry_run("SELECT * FROM no_such_table") and assert the failing
SQL is attached to the error—e.g., assert getattr(excinfo.value, "sql", None) ==
"SELECT * FROM no_such_table" and/or assert "SELECT * FROM no_such_table" in
str(excinfo.value) so the connector's error contract (WrenError carrying the SQL
payload) is enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b5353458-9c2d-4703-843a-8f6eb0320e40
📒 Files selected for processing (5)
core/wren/justfilecore/wren/src/wren/connector/canner.pycore/wren/src/wren/model/data_source.pycore/wren/tests/conftest.pycore/wren/tests/connectors/test_canner.py
…bles SQL NULL in json/jsonb columns was being coerced into the string "null", breaking NULL semantics downstream. Drop the oid-114/3802 special case so None passes through unchanged. Arrow tables were built via dict(zip(names, arrays)), which silently drops duplicate column names (e.g. self-joins projecting two `id` columns). Switch to pa.Table.from_arrays(..., schema=schema) so positional construction keeps duplicates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren/src/wren/connector/canner.py`:
- Around line 91-100: The current _decimal_type function defaults unconstrained
NUMERIC columns to scale=9 which causes silent rounding later where values are
quantized; change the logic so that when column.scale is None you do NOT assign
a fixed scale (remove scale=9 default) and instead either (a) infer
precision/scale from observed row values before constructing pa.decimal128, or
(b) preserve values as strings or decimal.Decimal until an explicit
precision/scale is decided; update any downstream quantization code that
currently forces values to the _decimal_type scale (the quantization block
referenced around lines 114–116 and the type-resolution paths handling NUMERIC
and NUMERIC[] arrays) to respect the "unspecified scale" case and avoid rounding
unless an explicit target scale is chosen. Ensure references to _decimal_type,
the NUMERIC handling branch, and the array NUMERIC[] branch are updated so
unconstrained NUMERIC stays lossless until an explicit precision/scale is
determined.
- Line 221: The SQL string is embedded into subqueries at the points where sql
is wrapped (e.g., in the dry_run() path and the limited query wrapper that
builds "SELECT * FROM ({sql}) AS _t LIMIT {limit}"), and trailing semicolons
cause PostgreSQL to reject the subselect; before wrapping the user-supplied sql
variable, strip any trailing semicolons and whitespace (e.g., sql =
sql.rstrip().rstrip(';')) so the constructed subquery is valid—apply this
normalization wherever the code builds subqueries from the sql variable
(referencing the sql variable and the dry_run/limited-query construction sites).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2b93b718-223d-4ae1-884a-9809f3cb7bff
📒 Files selected for processing (2)
core/wren/src/wren/connector/canner.pycore/wren/tests/connectors/test_canner.py
🚧 Files skipped from review as they are similar to previous changes (1)
- core/wren/tests/connectors/test_canner.py
Round-2 review found _decimal_type defaulted scale=9 when the column typmod was missing, so Decimal.quantize silently rounded high-precision values (e.g. 18-significant-figure NUMERIC). Fall back to pa.string() for unconstrained NUMERIC and NUMERIC[] columns so the exact textual value round-trips. Same approach Trino's connector takes for dynamic-decimal casts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 review found that query() and dry_run() wrap user SQL as
"SELECT * FROM ({sql}) AS _t LIMIT N", which Postgres/Canner reject
when the inner SQL ends in a semicolon. Add a _strip_trailing_semicolon
helper that only strips the terminating run of semicolons and
whitespace (so semicolons inside string literals are preserved) and
apply it on both call sites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
core/wren/tests/connectors/test_canner.py (1)
273-277: ⚡ Quick winAssert the
WrenErrormetadata contract in this failure-path test.Line 276 currently verifies only exception type. The connector contract also requires dialect SQL to be attached; assert it here so regressions are caught.
Proposed test tightening
def test_canner_connector_dry_run_raises_for_invalid_sql(canner_connector) -> None: - from wren.model.error import WrenError # noqa: PLC0415 + from wren.model.error import DIALECT_SQL, WrenError # noqa: PLC0415 - with pytest.raises(WrenError): - canner_connector.dry_run("SELECT * FROM no_such_table") + sql = "SELECT * FROM no_such_table" + with pytest.raises(WrenError) as exc: + canner_connector.dry_run(sql) + assert exc.value.metadata[DIALECT_SQL] == sql🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/tests/connectors/test_canner.py` around lines 273 - 277, Update the test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised WrenError (use pytest.raises as excinfo) and assert the connector metadata contract: verify excinfo.value.metadata contains the "dialect" key (e.g., equals "sql") and that the "sql" (or "query") metadata includes the attempted SQL string (e.g., contains "SELECT * FROM no_such_table"); locate this change around the test_canner_connector_dry_run_raises_for_invalid_sql function and the canner_connector.dry_run call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@core/wren/tests/connectors/test_canner.py`:
- Around line 273-277: Update the
test_canner_connector_dry_run_raises_for_invalid_sql to capture the raised
WrenError (use pytest.raises as excinfo) and assert the connector metadata
contract: verify excinfo.value.metadata contains the "dialect" key (e.g., equals
"sql") and that the "sql" (or "query") metadata includes the attempted SQL
string (e.g., contains "SELECT * FROM no_such_table"); locate this change around
the test_canner_connector_dry_run_raises_for_invalid_sql function and the
canner_connector.dry_run call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0860f7a5-42d4-46b2-a4c2-59ae128f2e10
📒 Files selected for processing (2)
core/wren/src/wren/connector/canner.pycore/wren/tests/connectors/test_canner.py
🚧 Files skipped from review as they are similar to previous changes (1)
- core/wren/src/wren/connector/canner.py
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Re: CodeRabbit nitpicks on |
|
Superseded by #2313 — these seven native-driver refactors were consolidated into a single feature branch to resolve shared-file conflicts (data_source.py, pyproject.toml, uv.lock, factory.py, etc.) once instead of seven times. |
Summary
Canner Enterprise speaks the Postgres wire protocol; the connector now uses
psycopgdirectly instead of the ibis postgres backend.Changes
connector/canner.py: native psycopg cursor with a self-contained PG OID -> Arrow type map covering the canner-flavoured types (VARCHAR/CHAR -> string, DECIMAL -> decimal128, BIGINT/INT/SMALLINT -> int, BOOLEAN -> bool, DATE/TIMESTAMP/TIMESTAMPTZ -> date/timestamp, ROW/ARRAY/MAP serialised as JSON strings). Errors are wrapped asWrenErrorwith the dialect SQL attached, mirroring the existing postgres connector contract.model/data_source.py::get_canner_connection: returns apsycopg.Connection(autocommit) instead of an ibis backend.Tests
tests/connectors/test_canner.pyexercises the type-mapping helpers and runs the connector against aPostgresContainerwith the common canner result types (incl. JSON/JSONB and arrays).cannermarker registered intests/conftest.py;just test-cannertarget added.Test plan
just test-canner(7 passed)just lintWren Engine PR template acceptance criteria:
wren.connector.canner.Summary by CodeRabbit
New Features
Tests
Chores