Skip to content

refactor(mssql): use pyodbc native driver, drop ibis dependency#2274

Closed
goldmedal wants to merge 6 commits into
mainfrom
refactor/mssql-native-driver
Closed

refactor(mssql): use pyodbc native driver, drop ibis dependency#2274
goldmedal wants to merge 6 commits into
mainfrom
refactor/mssql-native-driver

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented May 14, 2026

Summary

The mssql connector now uses pyodbc directly with custom Arrow type inference and a datetimeoffset output converter. The mssql extra installs pyodbc instead of ibis-framework[mssql].

Highlights

  • connector/mssql.py: raw pyodbc cursor; sqlglot-based LIMIT/OFFSET rewrite (OFFSET 0 ROWS FETCH NEXT n ROWS ONLY); Arrow schema built from cursor.description + value sampling; dry_run falls back to sys.dm_exec_describe_first_result_set for precise error messages.
  • model/data_source.py: _connect_mssql_pyodbc builds an ODBC connection string with proper {...} escaping; mssql:// URL parsing added; output converter for DATETIMEOFFSET (type code -155) decodes the 20-byte little-endian payload into a tz-aware datetime.
  • pyproject.toml: mssql extra -> pyodbc>=5,<6.

Test plan

  • pip install -e ".[mssql]" no longer pulls ibis-framework
  • DATETIMEOFFSET round-trips correctly (utc + non-utc offsets)
  • dry_run returns a helpful error via sys.dm_exec_describe_first_result_set when a column is unknown
  • tests/connectors/test_mssql.py covers int sizes (tinyint/smallint/int/bigint), bit, varchar, decimal, datetime/datetime2, datetimeoffset, uniqueidentifier, varbinary; URL connection path; pagination rewrite
  • just lint clean; existing unit tests unaffected

Summary by CodeRabbit

  • New Features

    • MSSQL connector now uses a native ODBC driver path (pyodbc) for connections and richer type handling (timezone-aware timestamps, decimals, binary/UUID).
  • Bug Fixes

    • Improved dry-run error reporting for invalid MSSQL queries and safer connection shutdown.
  • Tests

    • Added extensive unit and integration tests covering MSSQL connection URLs, type mappings, pagination SQL, and error cases.
  • Chores

    • Updated optional MSSQL dependency to pyodbc>=5,<6.

Review Change Stack

The mssql connector now uses pyodbc directly with custom Arrow type
inference and a datetimeoffset output converter. The mssql extra
installs `pyodbc` instead of `ibis-framework[mssql]`.

Highlights
- `connector/mssql.py`: raw pyodbc cursor; sqlglot-based LIMIT/OFFSET
  rewrite (`OFFSET 0 ROWS FETCH NEXT n ROWS ONLY`); Arrow schema built
  from `cursor.description` + value sampling; dry_run falls back to
  `sys.dm_exec_describe_first_result_set` for precise error messages.
- `model/data_source.py`: `_connect_mssql_pyodbc` builds an ODBC
  connection string with proper `{...}` escaping; `mssql://` URL
  parsing added; output converter for DATETIMEOFFSET (type code -155)
  decodes the 20-byte payload into a tz-aware datetime.
- `pyproject.toml`: mssql extra -> `pyodbc>=5,<6`.

Tests
- `tests/connectors/test_mssql.py` covers int sizes (tinyint/smallint/
  int/bigint), bit, varchar, decimal, datetime/datetime2,
  datetimeoffset (utc + non-utc), uniqueidentifier and varbinary;
  also exercises the URL connection path and pagination rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code core labels May 14, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: abf8134b-badf-4557-965a-5c1c2510cca4

📥 Commits

Reviewing files that changed from the base of the PR and between bf2edbe and f069519.

📒 Files selected for processing (3)
  • core/wren/src/wren/connector/mssql.py
  • core/wren/src/wren/model/data_source.py
  • core/wren/tests/unit/test_mssql_connection.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/wren/src/wren/connector/mssql.py
  • core/wren/src/wren/model/data_source.py

Walkthrough

Refactors MSSQL from an Ibis-backed connector to a pyodbc-based implementation: updates optional dependency, adds URL/ODBC connection plumbing with datetimeoffset decoding, implements cursor execution with SQL pagination and Arrow schema/array construction, adds connection lifecycle handling, and adds unit and integration tests covering types, pagination, and error reporting.

Changes

MSSQL Connector Refactor: Ibis to Native Pyodbc

Layer / File(s) Summary
Dependency update to pyodbc
core/wren/pyproject.toml
Replaces ibis-framework[mssql] with pyodbc>=5,<6 in optional dependencies.
DataSource MSSQL pyodbc connection builder
core/wren/src/wren/model/data_source.py
Adds optional pyodbc import and MSSQL_DATETIMEOFFSET_TYPE_CODE. Routes URL-based MSSQL connections to get_mssql_connection_from_url(). Implements _connect_mssql_pyodbc() to build ODBC connection strings (trusted vs UID/PWD), apply connection kwargs, validate/apply statement_timeout, create the connection, and register an output converter to decode MSSQL datetimeoffset.
MSSqlConnector cursor execution and Arrow schema inference
core/wren/src/wren/connector/mssql.py
Refactors MSSqlConnector from IbisConnector to ConnectorABC with DB-API cursor execution, SQL pagination rewriting (FETCH NEXT/OFFSET injection), describe-first error extraction for dry_run(), Arrow schema inference from cursor.description and rows, type-aware column builders (timestamps with tz, integer sizing/unsigned, decimals, float/boolean coercions, string fallback with JSON), and adds an idempotent close() method.
Test framework setup and fixtures
core/wren/tests/conftest.py, core/wren/tests/connectors/test_mssql.py
Registers an mssql pytest marker requiring Docker. Adds integration test wiring that conditionally skips when pyodbc/driver absent, containerized SQL Server startup, TPCH data load helpers, mssql_container and conn_info fixtures, TestMSSqlEngine test-suite wiring, types_table fixture creating many MSSQL types, and a connector fixture ensuring connector cleanup.
Type mapping, error handling, and integration tests
core/wren/tests/connectors/test_mssql.py
Integration tests asserting Arrow type mappings (TINYINT→uint8, SMALLINT/INT/BIGINT, BIT→bool, VARCHAR→string, DECIMAL→string, DATETIME/DATETIME2→timestamp[ns], DATETIMEOFFSET→timestamp[ns,tz], UNIQUEIDENTIFIER→string, VARBINARY→binary), verifies dry_run() error extraction on invalid SQL, pagination SQL rewriting behavior, and URL-based connection execution.
Unit tests for pyodbc connection plumbing
core/wren/tests/unit/test_mssql_connection.py
Unit tests mock pyodbc to verify mssql:// URL → ODBC string parsing, credential validation (Trusted_Connection vs UID/PWD), statement_timeout validation/application, TINYINT mapping and round-trip, and _decode_mssql_datetimeoffset error handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • douenergy
  • wwwy3y3

🐰 From Ibis chains we hop so free,
We stitch pyodbc, cursor, and key,
Arrow rows sing types so bright,
Timezones dance in timestamp light,
Tests and fixtures keep hops tidy.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: refactoring the MSSQL connector to use pyodbc native driver instead of ibis dependency.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/mssql-native-driver

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
core/wren/tests/connectors/test_mssql.py (2)

272-282: ⚡ Quick win

Pagination test misses the SQL Server ORDER BY requirement.

test_raw_cursor_sql_injects_fetch_next asserts only that FETCH NEXT 10 ROWS ONLY and OFFSET 0 ROWS appear in the rewritten string. SQL Server rejects OFFSET … FETCH NEXT … unless an ORDER BY clause is present in the same SELECT, so the real invariant under test should be: "the rewritten SQL is executable on SQL Server".

Strengthen the test to either (a) assert that an ORDER BY is present in the rewrite when none was in the input, or (b) execute the rewritten SQL against the mssql_container fixture (e.g., cursor.execute(rewritten)) to prove the server accepts it. Without one of these, this unit test cannot catch the regression where sqlglot or the rewrite drops ORDER BY synthesis. See the related concern on MSSqlConnector._raw_cursor_sql in mssql.py.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/connectors/test_mssql.py` around lines 272 - 282, Update the
test test_raw_cursor_sql_injects_fetch_next to ensure the rewritten SQL is
executable on SQL Server by verifying an ORDER BY is present or by executing the
rewritten statement against the mssql_container; specifically call
MSSqlConnector._raw_cursor_sql("SELECT * FROM orders", 10) (as currently done),
then either assert that the returned string contains an ORDER BY clause (e.g.,
"order by") or use the mssql_container fixture to obtain a cursor and run
cursor.execute(rewritten) to confirm the server accepts the OFFSET...FETCH NEXT
syntax; this ensures MSSqlConnector._raw_cursor_sql (and any sqlglot rewrite)
synthesizes ORDER BY when needed.

264-269: 💤 Low value

Tighten the dry-run assertion so a generic WrenError doesn't trivially satisfy it.

"dry run failed" in str(exc.value).lower() will match any WrenError whose message starts with the canned "The sql dry run failed. {error_message}." prefix — including the case where _describe_sql_for_error_message returned "Unknown reason" and the connector therefore did not wrap the original exception (Line 58-64 of mssql.py: if error_message == "Unknown reason" we raise the original Exception and WrenError is not raised at all).

If the describe path ever silently returns "Unknown reason" due to a regression (e.g., DMV permission change), this test still passes because pytest catches the un-wrapped pyodbc exception — except that pytest.raises(WrenError) will actually fail in that case. So the assertion is OK on the happy path. Consider also asserting on the metadata to lock in the describe-path output, e.g. assert "not_a_column" in str(exc.value) or assert exc.value.metadata.get(DIALECT_SQL).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/tests/connectors/test_mssql.py` around lines 264 - 269, The
assertion in test_dry_run_invalid_column_returns_describe_error is too generic;
tighten it so the test verifies the describe-path output rather than any
WrenError prefix by asserting the error message contains the offending
identifier (e.g., "not_a_column") and/or that exc.value.metadata contains the
SQL under the DIALECT_SQL key; update the with pytest.raises(WrenError) block to
assert "not_a_column" in str(exc.value).lower() and/or assert
exc.value.metadata.get(DIALECT_SQL) is not None to ensure the
connector.dry_run() raised the wrapped describe error.
core/wren/src/wren/connector/mssql.py (5)

223-245: 💤 Low value

TINYINT (internal_size == 1) should always be uint8 in MSSQL.

SQL Server TINYINT is unsigned 0–255 by definition; it cannot store negative values. Falling back to int8 when the sampled values happen to all be negative is impossible for a real TINYINT column and produces a confusing type only when the cursor's internal_size==1 comes from something else (which shouldn't happen for SQL Server). Consider dropping the non_negative branch here:

-        if internal_size == 1:
-            return pa.uint8() if non_negative else pa.int8()
+        if internal_size == 1:
+            return pa.uint8()

Otherwise this is fine.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/connector/mssql.py` around lines 223 - 245, In
_mssql_integer_arrow_type, remove the non_negative branch for the TINYINT case
so that when internal_size == 1 the function always returns pa.uint8() (SQL
Server TINYINT is unsigned); update the conditional handling in the function to
unconditionally return pa.uint8() for internal_size == 1 and leave the rest of
the size/precision logic unchanged (i.e., remove the pa.int8() fallback and any
dependency on the values/non_negative check for the internal_size==1 path).

138-154: ⚡ Quick win

_describe_sql_for_error_message uses the same connection that just failed and silently swallows the describe error.

Two concerns:

  1. Reusing self.connection after a query error is generally fine with pyodbc, but if autocommit=False is ever in effect, the prior failure may have left an open implicit transaction that affects this describe call. Worth confirming the connection is in autocommit=True by default for this connector, or explicitly rolling back before describing.
  2. The bare except Exception: return "Unknown reason" hides real failures of the describe path (driver error, permission error on the DMV, etc.), which then surfaces to the user as the original exception with no diagnostic. Logging the describe-path exception via loguru.logger.debug (or .warning) would preserve the existing UX while making misconfiguration debuggable.
♻️ Suggested logging tweak
-        except Exception:
+        except Exception as describe_error:
+            logger.debug(
+                f"sys.dm_exec_describe_first_result_set failed: {describe_error}"
+            )
             return "Unknown reason"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/connector/mssql.py` around lines 138 - 154, The
_describe_sql_for_error_message function currently reuses self.connection after
a failed query and swallows any exception; change it to ensure the connection is
safe for the describe call by either calling self.connection.rollback() (or
verifying/setting self.connection.autocommit = True) before creating a cursor,
and do not silence errors—catch exceptions from the describe path and log them
with loguru.logger.debug or .warning (including the exception text and the SQL
being described) before returning "Unknown reason"; keep references to
sge.convert(sql).sql("mssql"), describe_sql, and
closing(self.connection.cursor()) so you update only the transaction handling
and add the logging instead of removing the existing describe logic.

247-285: ⚡ Quick win

Decimal coercion path is unreachable.

_mssql_arrow_type always returns pa.string() (not pa.decimal*) for PyDecimal columns (Line 202–203). As a result, the pa.types.is_decimal(arrow_type) branch at Line 261–270 of _build_mssql_column is dead code — every decimal value gets routed through the is_string branch and is serialized via str(value).

Either (a) flip _mssql_arrow_type to return pa.decimal128(...)/pa.decimal256(...) when precision/scale are known and keep this branch, or (b) drop the unreachable decimal branch to make the contract obvious. The PR's stated rationale ("Decimals serialise as strings to avoid arrow decimal precision pitfalls" in the test) suggests option (b).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/connector/mssql.py` around lines 247 - 285, The
pa.types.is_decimal branch in _build_mssql_column is unreachable because
_mssql_arrow_type returns pa.string() for PyDecimal columns; remove the
decimal-specific branch and any PyDecimal-only coercion in _build_mssql_column
so decimals are routed through the pa.types.is_string handling (serialised via
str or json as the tests expect), and delete or stop referencing PyDecimal in
that function (or remove the import if unused). This keeps the contract
consistent with _mssql_arrow_type and the test expectation that decimals
serialize as strings.

33-49: ⚖️ Poor tradeoff

Schema inferred only from rows may mis-type unsampled columns.

_build_mssql_arrow_schema infers Arrow types from cursor.description plus the fetched rows. For query(sql, limit=N) with N small (e.g., 1), or when all sampled values in a column are None, _mssql_arrow_type falls back through every isinstance(sample, ...) branch and only matches on type_code is X — which only works for the handful of Python types pyodbc actually returns (bool, int, float, bytes/bytearray, datetime.datetime, datetime.date, datetime.time). Columns whose pyodbc type code is none of those (e.g., Decimal, uuid.UUID, str) and whose sampled values are all None will silently fall through to the pa.string() default at Line 207, which is acceptable for decimals/UUIDs but will coerce all-null numeric/temporal columns to string.

Consider keying the type decision primarily off type_code (with isinstance as a tie-breaker), since cursor.description[i][1] is authoritative and independent of which rows happened to be sampled.


174-207: 💤 Low value

Type-dispatch order and type_code is X checks need a closer look.

A couple of subtle issues in _mssql_arrow_type:

  • Line 181: type_code is bool works only if pyodbc returns the actual bool class for BIT columns. In practice it does, but if any future driver/version returns int for BIT, an all-null BIT column would be classified as int (and the value-based isinstance(sample, bool) short-circuit only catches non-null samples). Worth noting in a comment.
  • Lines 189/192: dtlib.datetime is checked before dtlib.date, which is correct because datetime is a subclass of date. Good — please keep this ordering invariant explicit with a brief comment so a future refactor doesn't reorder them.
  • Lines 196-201: isinstance(sample, float) or type_code is float is checked before isinstance(sample, int). Since bool ⊂ int, the earlier bool guard at Line 181 is required to protect against True/False falling into the integer branch — also worth a brief comment.

These are robustness/maintainability nits; the current behavior is correct for the tested matrix.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/connector/mssql.py` around lines 174 - 207, Add brief
explanatory comments inside MSSqlConnector._mssql_arrow_type around the
type-dispatch checks to make the current fragile assumptions explicit: note that
type_code checks (the local variable type_code) rely on driver-specific returned
classes and may vary across drivers/versions, that the bool check
(isinstance(sample, bool) or type_code is bool) must come before integer
handling to prevent bool (a subclass of int) from being classified as integer,
that datetime must remain before date (dtlib.datetime before dtlib.date) to
preserve correct dispatch and reference MSSqlConnector._mssql_timezone_name
usage, and that float is intentionally checked before int to preserve numeric
precedence used by MSSqlConnector._mssql_integer_arrow_type; these comments
should live next to the corresponding checks for sample/type_code to prevent
accidental reordering.
core/wren/src/wren/model/data_source.py (1)

444-471: 💤 Low value

Add length validation to prevent silent garbage values on malformed DATETIMEOFFSET payloads.

The byte slicing here does not raise on short input—int.from_bytes(b"", "little") returns 0—so malformed payloads silently yield datetime(0, 0, 0, …) which fails at the constructor with a cryptic ValueError: month must be in 1..12 rather than a clear message about the payload size.

Per pyodbc's DATETIMEOFFSET handling, the payload is always 20 bytes (SQL_SS_TIMESTAMPOFFSET_STRUCT), and the fraction field is documented as nanoseconds (0–999999999), so the nanoseconds // 1000 conversion is correct. Add a length check to catch future mismatches:

Suggested fix
     if value is None:
         return None
+    if len(value) != 20:
+        raise ValueError(
+            f"Unexpected MSSQL DATETIMEOFFSET payload length: {len(value)} (expected 20)"
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/model/data_source.py` around lines 444 - 471, The
_decode_mssql_datetimeoffset function should validate the incoming bytes length
before parsing to avoid silent garbage values; add an explicit check that value
is either None or has length exactly 20 (SQL_SS_TIMESTAMPOFFSET_STRUCT) and
raise a clear ValueError (e.g., "invalid DATETIMEOFFSET payload length: expected
20, got {len(value)}") when it is not, then proceed with the existing
byte-slicing and conversion logic (including nanoseconds // 1000) so malformed
payloads fail with a clear message rather than cryptic constructor errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren/src/wren/model/data_source.py`:
- Around line 386-431: In _connect_mssql_pyodbc: the current authentication
branch treats Trusted_Connection only when both user and password are None and
will call DataSourceExtension._escape_odbc_value(None) if one side is missing,
and it validates statement_timeout after opening the pyodbc connection which can
leak connections; fix by adding pre-flight validation: if both user and password
are None use Trusted_Connection=yes, if exactly one is None raise
WrenError(ErrorCode.INVALID_CONNECTION_INFO, "...") rejecting partial
credentials, and parse/cast/validate statement_timeout (convert to int or raise
WrenError) before calling pyodbc.connect so the connection is opened only after
inputs are validated; update construction to never call _escape_odbc_value on
None (use the validated user/password values).
- Around line 355-384: The URL components are not decoded consistently: only
password is unquoted. In get_mssql_connection_from_url decode parsed.username
and the database string (parsed.path.lstrip("/")) with
urllib.parse.unquote_plus, and apply unquote_plus to each value in kwargs after
building it from parse_qsl so query params are consistently decoded; then pass
those decoded user, database, password, driver and kwargs into
_connect_mssql_pyodbc as before.

In `@core/wren/tests/connectors/test_mssql.py`:
- Around line 285-305: The test constructs an mssql URL without encoding
credentials so special characters in the SqlServerContainer default
password/user can break parsing; update test_url_connection to URL-encode the
user and password when building the f"mssql://..." string (e.g., use
urllib.parse.quote or quote_plus) before passing the URL into
DataSource.mssql.get_connection_info/get_connection so the URL is unambiguous
and pairs with the decoding change in get_mssql_connection_from_url.

---

Nitpick comments:
In `@core/wren/src/wren/connector/mssql.py`:
- Around line 223-245: In _mssql_integer_arrow_type, remove the non_negative
branch for the TINYINT case so that when internal_size == 1 the function always
returns pa.uint8() (SQL Server TINYINT is unsigned); update the conditional
handling in the function to unconditionally return pa.uint8() for internal_size
== 1 and leave the rest of the size/precision logic unchanged (i.e., remove the
pa.int8() fallback and any dependency on the values/non_negative check for the
internal_size==1 path).
- Around line 138-154: The _describe_sql_for_error_message function currently
reuses self.connection after a failed query and swallows any exception; change
it to ensure the connection is safe for the describe call by either calling
self.connection.rollback() (or verifying/setting self.connection.autocommit =
True) before creating a cursor, and do not silence errors—catch exceptions from
the describe path and log them with loguru.logger.debug or .warning (including
the exception text and the SQL being described) before returning "Unknown
reason"; keep references to sge.convert(sql).sql("mssql"), describe_sql, and
closing(self.connection.cursor()) so you update only the transaction handling
and add the logging instead of removing the existing describe logic.
- Around line 247-285: The pa.types.is_decimal branch in _build_mssql_column is
unreachable because _mssql_arrow_type returns pa.string() for PyDecimal columns;
remove the decimal-specific branch and any PyDecimal-only coercion in
_build_mssql_column so decimals are routed through the pa.types.is_string
handling (serialised via str or json as the tests expect), and delete or stop
referencing PyDecimal in that function (or remove the import if unused). This
keeps the contract consistent with _mssql_arrow_type and the test expectation
that decimals serialize as strings.
- Around line 174-207: Add brief explanatory comments inside
MSSqlConnector._mssql_arrow_type around the type-dispatch checks to make the
current fragile assumptions explicit: note that type_code checks (the local
variable type_code) rely on driver-specific returned classes and may vary across
drivers/versions, that the bool check (isinstance(sample, bool) or type_code is
bool) must come before integer handling to prevent bool (a subclass of int) from
being classified as integer, that datetime must remain before date
(dtlib.datetime before dtlib.date) to preserve correct dispatch and reference
MSSqlConnector._mssql_timezone_name usage, and that float is intentionally
checked before int to preserve numeric precedence used by
MSSqlConnector._mssql_integer_arrow_type; these comments should live next to the
corresponding checks for sample/type_code to prevent accidental reordering.

In `@core/wren/src/wren/model/data_source.py`:
- Around line 444-471: The _decode_mssql_datetimeoffset function should validate
the incoming bytes length before parsing to avoid silent garbage values; add an
explicit check that value is either None or has length exactly 20
(SQL_SS_TIMESTAMPOFFSET_STRUCT) and raise a clear ValueError (e.g., "invalid
DATETIMEOFFSET payload length: expected 20, got {len(value)}") when it is not,
then proceed with the existing byte-slicing and conversion logic (including
nanoseconds // 1000) so malformed payloads fail with a clear message rather than
cryptic constructor errors.

In `@core/wren/tests/connectors/test_mssql.py`:
- Around line 272-282: Update the test test_raw_cursor_sql_injects_fetch_next to
ensure the rewritten SQL is executable on SQL Server by verifying an ORDER BY is
present or by executing the rewritten statement against the mssql_container;
specifically call MSSqlConnector._raw_cursor_sql("SELECT * FROM orders", 10) (as
currently done), then either assert that the returned string contains an ORDER
BY clause (e.g., "order by") or use the mssql_container fixture to obtain a
cursor and run cursor.execute(rewritten) to confirm the server accepts the
OFFSET...FETCH NEXT syntax; this ensures MSSqlConnector._raw_cursor_sql (and any
sqlglot rewrite) synthesizes ORDER BY when needed.
- Around line 264-269: The assertion in
test_dry_run_invalid_column_returns_describe_error is too generic; tighten it so
the test verifies the describe-path output rather than any WrenError prefix by
asserting the error message contains the offending identifier (e.g.,
"not_a_column") and/or that exc.value.metadata contains the SQL under the
DIALECT_SQL key; update the with pytest.raises(WrenError) block to assert
"not_a_column" in str(exc.value).lower() and/or assert
exc.value.metadata.get(DIALECT_SQL) is not None to ensure the
connector.dry_run() raised the wrapped describe error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c5b0cee5-91ff-4890-9a17-6ed87aa18eca

📥 Commits

Reviewing files that changed from the base of the PR and between 544ecab and bf2edbe.

⛔ Files ignored due to path filters (1)
  • core/wren/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • core/wren/pyproject.toml
  • core/wren/src/wren/connector/mssql.py
  • core/wren/src/wren/model/data_source.py
  • core/wren/tests/conftest.py
  • core/wren/tests/connectors/test_mssql.py

Comment thread core/wren/src/wren/model/data_source.py
Comment thread core/wren/src/wren/model/data_source.py
Comment thread core/wren/tests/connectors/test_mssql.py
goldmedal and others added 5 commits May 14, 2026 11:38
Previously only the password got unquote_plus() — username, the
database path, and the query-string handling left other components
encoded, so URL-encoded specials (e.g. '@' in a username, spaces in a
database name) reached the ODBC driver still percent-encoded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous branch silently mishandled asymmetric credentials:
user=None with password set crashed in password.get_secret_value(),
and user set with password=None emitted UID= without PWD=, producing
an incomplete ODBC connection string. Now: both absent falls back to
Trusted_Connection=yes, exactly one absent raises INVALID_CONNECTION_INFO,
both present uses normal SQL auth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
statement_timeout was cast to int only after pyodbc.connect() returned,
so a non-numeric value (e.g. "5s") leaked the just-opened connection
when int() raised ValueError. Validate and cast up-front; raise
INVALID_CONNECTION_INFO before any connect call.

Also adds unit tests with mocked pyodbc covering all three review
findings: URL component decoding, asymmetric auth rejection, and the
pre-connect timeout validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SQL Server TINYINT is an unsigned 8-bit integer (0..255), but the
arrow-type helper was branching on the sampled value sign and could
fall back to int8. Map internal_size == 1 directly to pa.uint8() so
the schema reflects the driver-declared type regardless of the rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The DATETIMEOFFSET output converter assumed pyodbc would always hand it
a 20-byte payload. Truncated or malformed buffers fell through and
surfaced as a cryptic "month must be in 1..12" ValueError from
datetime(). Reject non-20-byte payloads up front with a clear message
that points at the actual length mismatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@goldmedal
Copy link
Copy Markdown
Collaborator Author

Follow-up on the CodeRabbit nitpicks bundled in review 4287043625:

Addressed

  • mssql.py:~223-245 (TINYINT → uint8): SQL Server's TINYINT is unconditionally unsigned 0-255, so the sign-sampling branch was wrong. _mssql_integer_arrow_type now maps internal_size == 1 directly to pa.uint8(). Covered by unit tests in tests/unit/test_mssql_connection.py. Addressed in 60f94e7.
  • data_source.py:~444-471 (_decode_mssql_datetimeoffset length check): added an explicit len(value) != 20 guard so truncated payloads raise a clear "expected 20, got N" ValueError instead of the cryptic "month must be in 1..12" that bubbled up from datetime(). Addressed in f069519.

Skipped (with rationale)

  • mssql.py:~138-154 (describe swallows the inner exception): the connection is opened with autocommit=True, so there's no transaction state to roll back after a failed describe. The describe path is best-effort error enrichment; swallowing the inner exception preserves the original error message for the caller.
  • mssql.py:~247-270 (unreachable decimal branch): leaving the branch as defensive. pyodbc's row-value type for DECIMAL varies across driver versions (Decimal on some, string on others). The branch costs nothing and protects against regressions.
  • mssql.py:~174-207 (type-dispatch ordering comments): the isinstance ordering follows Python's built-in subclass hierarchy (bool is int, datetime is date). Restating language semantics in comments would just rot.
  • test_mssql.py:~272-282 (FETCH NEXT ORDER BY assertion): the test asserts the rewriter inserts the right keywords; full SQL execution is covered by the testcontainer suite in the same file.
  • test_mssql.py:~264-269 (dry-run assertion tighten): existing assertion verifies the error surface (WrenError with the expected error code); tightening to substring match would couple the test to error message wording.
  • test_mssql.py:~305 (URL-encode user/password in test): testcontainer password is generated alphanumerically without URL-special chars; encoding would add noise without behavioural benefit. The URL-decoding path itself is covered by tests/unit/test_mssql_connection.py with deliberately-encoded fixtures.

@goldmedal
Copy link
Copy Markdown
Collaborator Author

Superseded by #2313 — these seven native-driver refactors were consolidated into a single feature branch to resolve shared-file conflicts (data_source.py, pyproject.toml, uv.lock, factory.py, etc.) once instead of seven times.

@goldmedal goldmedal closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant