FIX: Improvement to parameter type inference and handling#215
FIX: Improvement to parameter type inference and handling#215gargsaumya merged 9 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request improves parameter type inference and handling in the executemany method by replacing the static _select_best_sample_value method with a more comprehensive _compute_column_type method. The changes enhance accuracy for batch operations with varied data types, particularly for integer range detection and Data At Execution (DAE) handling.
- Replaced static sample value selection with comprehensive column analysis that computes representative values, DAE flags, and min/max ranges
- Enhanced integer type mapping to use min/max values for more accurate SQL type selection
- Updated parameter creation pipeline to pass computed range information through the type mapping process
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
sumitmsft
left a comment
There was a problem hiding this comment.
Got this from Copilot:
Missing test coverage for:
Batch decimal precision/scale aggregation
Large string threshold (NVARCHAR vs NVARCHAR(MAX)) including astral (UTF‑16 surrogate pair) edge cases
executemany sampling when a later row crosses the 4000 UTF‑16 unit boundary
Mixed Unicode/non-Unicode large strings (type chosen, DAE flag)
All‑NULL column inference (no crash, consistent default type)
Mixed bool and int values (BIT vs widen to INT)
Mixed numeric & string column fallback (string coercion, no silent truncation)
Large negative integers (e.g. −2^63+1) and 64‑bit boundary extremes (±2^63−1, ±2^31, etc.)
Mixed bool + very large int (widening logic)
Astral-heavy string where len(s) < 4000 but UTF‑16 units > 4000 (correct MAX path)
Large binary (>8000) bytes vs bytearray parity (same SQL type + isDAE)
Binary exactly at 8000 and just over (boundary behavior)
All parameters None in a column (executemany robustness)
Some tests may not be worth, so take your call and select the relevant ones.
bewithgaurav
left a comment
There was a problem hiding this comment.
need to merge to latest main
d528403 to
5bcb1bc
Compare
f983c82
a9f633c
f983c82 to
a9f633c
Compare
Work Item / Issue Reference
Summary
This pull request refactors the logic for inferring SQL types from Python parameters in the
mssql_python/cursor.pymodule, especially for batch operations usingexecutemany. The main improvements are more accurate type detection for integer columns by considering the minimum and maximum values in the data, and a cleaner separation of concerns in the codebase.Improvements to type inference and parameter handling:
_map_sql_typeto usemin_valandmax_valfor integer columns, allowing for more accurate type selection based on the actual range of values in the data._create_parameter_types_listto accept and forwardmin_valandmax_val, supporting the improved type inference for batch operations._select_best_sample_valuewith a new_compute_column_typemethod, which determines a representative sample value and computes min/max for integer columns, enhancing how types are inferred for each parameter column inexecutemany. [1] [2]executemanymethod to use_compute_column_typefor each parameter column, passing the computed min/max values to_create_parameter_types_listfor better type assignment.Code cleanup:
_select_best_sample_valuestatic method, consolidating logic and reducing code duplication.