Skip to content

fix: array_concat widens container variant for mixed List/LargeList inputs#21704

Merged
Jefffrey merged 1 commit intoapache:mainfrom
hcrosse:fix/array-concat-mixed-large-list
Apr 21, 2026
Merged

fix: array_concat widens container variant for mixed List/LargeList inputs#21704
Jefffrey merged 1 commit intoapache:mainfrom
hcrosse:fix/array-concat-mixed-large-list

Conversation

@hcrosse
Copy link
Copy Markdown
Contributor

@hcrosse hcrosse commented Apr 17, 2026

Which issue does this PR close?

Rationale for this change

array_concat hit an internal cast error when given a mix of List and LargeList (or FixedSizeList and LargeList) arguments:

> select array_concat(make_array(1, 2), arrow_cast([3, 4], 'LargeList(Int64)'));
DataFusion error: Internal error: could not cast array of type List(Int64) to arrow_array::array::list_array::GenericListArray<i64>.

ArrayConcat::coerce_types was coercing only the base element type, leaving the outer container alone. When the resolved return type is LargeList, array_concat_inner later tries to downcast each arg to GenericListArray<i64>, which fails for any List argument that slipped through.

What changes are included in this PR?

In ArrayConcat::coerce_types, after coercing the base type, also promote each input's outermost List to LargeList when the return type is a LargeList. FixedSizeList inputs already go through FixedSizedListToList first and then get promoted too. Per-arg dimensionality is preserved, so nested cases keep working with align_array_dimensions.

Are these changes tested?

Yes, added sqllogictests in array_concat.slt covering:

  • List + LargeList
  • LargeList + List
  • FixedSizeList + LargeList
  • Three-way mix List, LargeList, List

Each one also asserts arrow_typeof(...) = LargeList(Int64).

Are there any user-facing changes?

Queries that previously returned an internal cast error now return the concatenated LargeList as expected. No API changes.

…ist/LargeList inputs

`ArrayConcat::coerce_types` only coerced the base element type, leaving the
outer container variant unchanged. With a mix of `List` and `LargeList` (or
`FixedSizeList` and `LargeList`) inputs, `array_concat_inner` would later try
to downcast the `List` array to `GenericListArray<i64>` and fail with an
internal cast error.

When the resolved return type is `LargeList`, also promote each input's
outermost `List` to `LargeList` so the cast targets match what
`array_concat_inner` expects.
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

}

fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
let base_type = base_type(&self.return_type(arg_types)?);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized something funny the existing implementation does; it calls return_type within coerce_types even though the docstring for return_type states that the input types are already coerced (i.e. coerce_types would have already been called) 😅

crm26 added a commit to crm26/datafusion that referenced this pull request Apr 20, 2026
Addresses round-2 review comments on apache#21542:
- Widen container variant in coerce_types when inputs mix List and
  LargeList (or FixedSizeList), so mixed-type calls like
  `cosine_distance([1.0, 0.0], arrow_cast([0.0, 1.0], 'LargeList(Float64)'))`
  succeed. Follows the pattern from PR apache#21704 (ArrayConcat).
- Coerce bare NULL inputs to a matching list variant so
  `cosine_distance(NULL, [1.0, 2.0])` returns NULL instead of erroring.
- Drop the `list_cosine_distance` alias — the base name is not
  `array_cosine_distance`, so the `array_X` -> `list_X` convention does
  not apply.
- Expand SLT coverage: mixed-type variants, FixedSizeList inputs,
  Float32 and Int64 inner types, bare NULL in each position, NULL row
  in a multi-row VALUES, and an unsupported-type plan error case.

Dispatch fallthrough in cosine_distance_inner is now unreachable after
the coerce_types widening, changed from exec_err! to internal_err!.

Part of apache#21536.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jefffrey Jefffrey added this pull request to the merge queue Apr 21, 2026
@Jefffrey
Copy link
Copy Markdown
Contributor

Thanks @hcrosse

Merged via the queue into apache:main with commit e524f49 Apr 21, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_concat fails with internal error on mixed List + LargeList inputs

2 participants