Add Multidimensional array support#329
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end support for nested ClickHouse arrays (Array(Array(T)) and deeper) across HTTP-parameter formatting, binary/bulk insert writing, and type inference, including accepting rectangular CLR multidimensional arrays (T[,], T[,,], …) on the write path and enabling GetFieldValue<T> materialization back into multidimensional CLR arrays when the data is rectangular.
Changes:
- Add
MultiDimArrayHelperto slice rank>1 CLR arrays into jagged “rows” for formatting/writing, and to materialize jagged results back into rectangular multidimensional arrays on read. - Make
TypeConverter.ToClickHouseTyperank-aware for both type-based and value-based inference to preserve array nesting depth for multidimensional CLR arrays. - Update HTTP formatter and
ArrayType.Writeto avoid flattening multidimensional CLR arrays; expand diagnostics and add extensive unit/integration test coverage.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| RELEASENOTES.md | Documents nested/multidimensional array support and improved HTTP mismatch diagnostics. |
| CHANGELOG.md | Mirrors release notes for nested/multidimensional array support and type-inference fix. |
| ClickHouse.Driver/Types/TypeConverter.cs | Makes array type inference rank-aware (type- and value-based). |
| ClickHouse.Driver/Types/MultiDimArrayHelper.cs | New helper for slicing multidimensional arrays and materializing jagged → multidimensional. |
| ClickHouse.Driver/Types/ArrayType.cs | Adds rank>1 CLR array handling on the binary write path via slicing. |
| ClickHouse.Driver/Formats/HttpParameterFormatter.cs | Adds rank>1 CLR array formatting path and improves/rehydrates nested mismatch error messages. |
| ClickHouse.Driver/ADO/Readers/ClickHouseDataReader.cs | Extends GetFieldValue<T> to materialize jagged nested arrays into multidimensional CLR arrays for rank>=2 T. |
| ClickHouse.Driver.Tests/Types/TypeMappingTests.cs | Adds unit tests for deeper nested arrays and multidimensional CLR array type mapping/inference + formatter coverage. |
| ClickHouse.Driver.Tests/Types/MultiDimArrayHelperTests.cs | Adds focused unit tests for slicing and jagged→multidimensional materialization (incl. ragged rejection). |
| ClickHouse.Driver.Tests/SQL/NestedArrayParameterTests.cs | Adds end-to-end integration tests covering nested arrays across select/insert/bulk/client APIs and multidimensional read materialization. |
| ClickHouse.Driver.Tests/SQL/NestedArrayParameterFormDataTests.cs | Adds multipart/form-data parameter-path coverage for nested arrays. |
| ClickHouse.Driver.Tests/BulkCopy/BulkCopyTests.cs | Adds bulk-copy test cases for nested arrays. |
alex-clickhouse
left a comment
There was a problem hiding this comment.
Thanks for this! Broadly looks good, but a few points before we can merge.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
switches MultiDimArrayHelper to attempt to single sweep allocate. tidies up tests. adds support for non zero-bound arrays. updates changelog + release notes
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 877041e. Configure here.

Summary
#320 raised an issue about multidimensional-array support across the driver's write path.
Gets thrown when the user tried to send a parameter typed as
Array(Array(UInt8)). Tracing it revealed three layered problems, not one:HttpParameterFormatter.cs:127: the formatter flattens multidim arrays. TheArray(Array(T))branch didforeach element in value, then recursed to format each element as another inner array. That works for jagged arrays (byte[][]) where each element is itself abyte[]. Butbyte[,]is different:foreachover a 2D array walks every scalar cell in row-major order, not row by row. So forbyte[,] { {1,2,3}, {4,5,6} }, the formatter saw1,2,3,4,5,6(six scalars) and tried to format each one as if it were a whole inner array. The first call (Format(Array(UInt8), 1, ...)) fell off the end of the type-dispatch switch and threw"Cannot convert 1 to Array(UInt8)".ArrayType.Write: same problem on the binary insert path. The binary writer cast the value toIList, wrotecollection.Countas the outer length, then indexed into it row by row. That's fine forbyte[][]:Countis the number of rows andvalue[0]returns the first row asbyte[]. Butbyte[,]also implementsIListvia theArraybase class, and on a 2D array those operations behave nothing like row access:IList.Countreturns the total flattened length (6 for a 2×3, not 2), andvalue[i]throwsArgumentException("Array was not a one-dimensional array.") on the very first access. So the writer first encoded the wrong outer length, then crashed on the first indexer call.TypeConverter.ToClickHouseType: the type-inference helpers only understood one layer of array. Both overloads (the one taking aType, used when you ask "what ClickHouse type matchestypeof(byte[,])?", and the one taking a value, used when you pass a parameter with no explicit type hint) had a singleif (type.IsArray) return new ArrayType { ... }branch. Forbyte[,], the inner call askedbyte[,].GetElementType()and got backbyte— the rank-2 information was thrown away, producingArray(UInt8)instead ofArray(Array(UInt8)). The value-based path was even worse: it tried to peek at the first element viaArray.GetValue(0), which throws on any array of rank > 1 (you have to pass an index per dimension), so the inference crashed instead of just producing the wrong type.The error message was also lossy: by the time recursion bottomed out, the formatter only knew it had a scalar where an
Array(UInt8)was expected. the originalArray(Array(UInt8))context was gone, as was the parameter name.The fix
1.
Types/MultiDimArrayHelper.cs(new)The central abstraction. Two public methods:
EnumerateOutermostRank(Array array): given a rank-N array, yields N-1-rank slices along axis 0. For rank 1 it just yields elements. The slicing copies each row into a freshly-allocated lower-rankArrayso that downstream code can recurse without worrying about rank.ToMultidimensional<T>(object jagged): the inverse direction. Takes a jagged value (T[][]or deeper), validates rectangularity, and copies it into a rank-NT[,,...]. ThrowsInvalidOperationExceptionon ragged data or null intermediate rows, with a message that names the depth and the mismatched length.Both operations are conceptually trivial but the index bookkeeping is fiddly enough that they belong in a single, unit-testable home.
2.
Types/TypeConverter.cs: rank-aware loopsThe same loop appears in both overloads;
ToClickHouseType(Type)andToClickHouseType(object). The value-based overload also stopped crashing on multidim by usingArray.GetValue(int[] indices)instead ofArray.GetValue(int).3.
Formats/HttpParameterFormatter.csA new pattern-match arm placed before the existing
IEnumerablearm:The ordering matters. Multidimensional arrays would otherwise be intercepted by the
IEnumerablearm and flattened.The default branch's error message also got a name and full-type rewrite, and the public entry point wraps recursive throws with the outer type so a leaf-level mismatch surfaces the user-visible parameter type:
4.
Types/ArrayType.Write: same multidim arm on the binary pathEach slice is a rank-(N-1) array; the recursive
UnderlyingType.Writecall sees a lower-rank value and either recurses again (rank-3 input > rank-2 slice > still multidim) or drops into the legacy(IList)valuepath (rank-1 slice > ordinary array).5. Reading multidimensional shapes... the part I went back on (see Choosing the reader API)
ClickHouseDataReader.GetFieldValue<T>an override that did a one-line cast now detects multidimensionalTand routes throughMultiDimArrayHelper.ToMultidimensional<T>:The
FieldValueDispatcher<T>cache reduces the hot-path overhead to a single staticboolload and branch. Each closed generic instantiation computes the predicate exactly once on first use.On the standalone
MultiDimArrayHelpertypeThe slicing and rectangularity logic is small enough that it could plausibly live as private methods on either
ArrayType(write side) orClickHouseDataReader(read side). I deliberately put it in its own internal static class for three reasons:MultiDimArrayHelperTestscover 22 cases without needing to spin up a server or even aClickHouseDataReader.MultiDimArrayHelper.EnumerateOutermostRank(value)reads clearly; whereas a privateSliceOuter(value)onHttpParameterFormatterwould only make sense if you remembered the formatter happened to have it.It's
internalrather thanpublicbecause the helper isn't part of the user-facing contract. The only public surface that touches it isGetFieldValue<T>'s new special-case branch.Choosing the reader API
Two reasonable designs presented themselves. I weighed them against the existing architecture and the cost on the hot read path, and picked the one that slotted into the codebase's existing conventions.
Option A: a new public method
GetMultidimensional<T>(int ordinal)onClickHouseDataReaderthat delegates to the helper. I feel this conversion is qualitatively different from the existing typed accessors (GetIPAddress,GetTuple,GetBigInteger) which are zero-cost casts ofGetValue, this one allocates, copies, and can throw on ragged data. An explicit method name would make that visible at the call site, and keep the cost out of any unrelated read.Option B: extending the existing
GetFieldValue<T>override so it detects multidimTand routes through the helper.ClickHouseDataReaderalready overridesGetFieldValue<T>. A user wanting to read a multidim column would naturally tryreader.GetFieldValue<int[,]>(0)first, and today that path already exists and fails withInvalidCastException. Two paths to one goal, with the obvious one broken.I picked B. It fits the codebase's typed-read convention, fixes a latent UX issue in the same stroke, and doesn't require users to discover a new method through the changelog.
The cost was a hot-path concern: every
GetFieldValue<T>call now checkstypeof(T).IsArray, which I've helped mitigate with aFieldValueDispatcher<T>static-cached predicate. The hot path becomes a single bool load and branch.For value-type
Tthe JIT can fold the static initialiser to a constant and eliminate the check entirely; for reference-typeTit should be nanoseconds.Checklist
Delete items not relevant to your PR:
Note
Medium Risk
Touches core serialization (HTTP, binary, type inference) and
GetFieldValuebehavior for multidimensional targets; behavior changes are user-visible but heavily tested and scoped to nested-array scenarios.Overview
Adds end-to-end support for nested ClickHouse arrays (
Array(Array(T))and deeper) across HTTP parameters, binary/bulk inserts, and type inference (#320).Write path: Jagged shapes (
T[][],List<List<T>>) keep working; rectangular CLR arrays (T[,],T[,,], …) are now serialized correctly instead of being flattened or mis-handled viaIEnumerable/IList. A new internalMultiDimArrayHelperwalks multidimensional arrays in place for HTTP and binary encoding, validates CLR rank against nestedArraydepth, and handles non-zero lower bounds.TypeConvertermaps multidimensional CLR types to the right nestedArray(...)types and no longer crashes on rank > 1 when peeking the first element.Read path:
GetValuestill returns jagged arrays;GetFieldValue<T[,]>(and higher rank) materializes rectangular arrays when data is rectangular, with clearerInvalidCastExceptionvsInvalidOperationExceptionsemantics and column ordinal in messages.Diagnostics: HTTP parameter errors include parameter name, full ClickHouse type, and CLR value type; outer type is preserved when formatting fails on nested parameters.
Release notes and broad integration/unit tests cover bulk copy, form-data parameters,
ClickHouseClientbinary insert, andGetFieldValueedge cases.Reviewed by Cursor Bugbot for commit 73ab6f8. Bugbot is set up for automated code reviews on this repo. Configure here.