FEAT: streaming support in fetchone for nvarcharmax data type#220
Merged
gargsaumya merged 6 commits intosaumya/streaming-fetchonefrom Sep 15, 2025
Merged
FEAT: streaming support in fetchone for nvarcharmax data type#220gargsaumya merged 6 commits intosaumya/streaming-fetchonefrom
gargsaumya merged 6 commits intosaumya/streaming-fetchonefrom
Conversation
sumitmsft
reviewed
Sep 9, 2025
sumitmsft
reviewed
Sep 9, 2025
sumitmsft
reviewed
Sep 9, 2025
sumitmsft
reviewed
Sep 9, 2025
f6b7389 to
e21b47e
Compare
Collaborator
bewithgaurav
left a comment
There was a problem hiding this comment.
will wait for latest main branch merge, will refactor this PR to a great extent
598a6be to
7f67326
Compare
63834f3 to
d0ccd44
Compare
Contributor
Author
The conflicts are now resolved. You can go ahead and re-review. |
sumitmsft
approved these changes
Sep 12, 2025
bewithgaurav
approved these changes
Sep 15, 2025
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#34162](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/34162) [AB#38110](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/38110) <!-- External contributors: GitHub Issue --> > GitHub Issue: #<ISSUE_NUMBER> ------------------------------------------------------------------- ### Summary <!-- Insert your summary of changes below. Minimum 10 characters required. --> This pull request improves the handling of large object (LOB) columns when fetching data from SQL Server in the `mssql_python/pybind/ddbc_bindings.cpp` file. The main change is to detect LOB columns (such as large strings or binary data) and switch to a per-row fetching strategy using `SQLGetData_wrap` for those columns, ensuring correct streaming and memory usage. For non-LOB columns, the batch fetching logic remains unchanged, but now includes logic to handle LOBs appropriately during batch fetches. **LOB detection and fetch strategy:** * Added logic in both `FetchMany_wrap` and `FetchAll_wrap` to detect LOB columns by checking column data types and sizes, and to fall back to per-row fetch using `SQLGetData_wrap` when LOBs are present. This avoids buffer overflows and streams LOBs correctly. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2647-R2675) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2769-R2797) * Updated calls to `FetchBatchData` in both wrappers to pass the list of detected LOB columns, so batch fetches can handle them appropriately. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2677-R2690) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2770-R2812) **Batch fetch improvements:** * Modified the signature of `FetchBatchData` to accept the LOB columns list and updated its logic to handle LOB columns differently: instead of throwing exceptions when buffer sizes are insufficient, it now streams LOB data using `FetchLobColumnData`. [[1]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2345-R2345) [[2]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2388-R2396) [[3]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1R2408-R2410) [[4]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2428-R2424) [[5]](diffhunk://#diff-dde2297345718ec449a14e7dff91b7bb2342b008ecc071f562233646d71144a1L2520-R2518) <!-- ### PR Title Guide > For feature requests FEAT: (short-description) > For non-feature requests like test case updates, config updates , dependency updates etc CHORE: (short-description) > For Fix requests FIX: (short-description) > For doc update requests DOC: (short-description) > For Formatting, indentation, or styling update STYLE: (short-description) > For Refactor, without any feature changes REFACTOR: (short-description) > For release related changes, without any feature changes RELEASE: #<RELEASE_VERSION> (short-description) ### Contribution Guidelines External contributors: - Create a GitHub issue first: https://github.com/microsoft/mssql-python/issues/new - Link the GitHub issue in the "GitHub Issue" section above - Follow the PR title format and provide a meaningful summary mssql-python maintainers: - Create an ADO Work Item following internal processes - Link the ADO Work Item in the "ADO Work Item" section above - Follow the PR title format and provide a meaningful summary -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work Item / Issue Reference
Summary
This pull request improves NVARCHAR data handling in the SQL Server Python bindings and adds comprehensive tests for NVARCHAR(MAX) scenarios. The main changes include switching to streaming for large NVARCHAR values, optimizing direct fetch for smaller values, and adding tests for edge cases and boundaries to ensure correctness.
NVARCHAR data handling improvements:
ddbc_bindings.cppto use streaming for large NVARCHAR/NCHAR columns (over 4000 characters or unknown size) and direct fetch for smaller values, optimizing performance and reliability.std::wstringfor conversion and simplifying platform-specific handling for both macOS/Linux and Windows.Testing enhancements:
test_004_cursor.pyfor NVARCHAR(MAX) covering short strings, boundary conditions (4000 chars), streaming (4100+ chars), large values (100,000 chars), empty strings, NULLs, and transaction rollback scenarios to verify correct behavior across all edge cases.VARCHAR/CHAR fetch improvements: