Skip to content

[VARIANT] Add support for DataType::Struct for cast_to_variant#8090

Merged
alamb merged 7 commits intoapache:mainfrom
cmu-db:struct-casting
Aug 14, 2025
Merged

[VARIANT] Add support for DataType::Struct for cast_to_variant#8090
alamb merged 7 commits intoapache:mainfrom
cmu-db:struct-casting

Conversation

@carpecodeum
Copy link
Copy Markdown
Contributor

@carpecodeum carpecodeum commented Aug 7, 2025

Which issue does this PR close?

Rationale for this change

Add support for DataType::Struct for cast_to_variant

What changes are included in this PR?

Adds support for casting and adds tests as well

Are there any user-facing changes?

yes casting to variant is a user facing issue

Props to @mprammer!!

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carpecodeum -- I think this implementation may have some non trivial performance issues, but we can also address that as a follow on PR

Just let me know how you want to proceed:

  1. Merge this PR and file a follow on ticket (my preference)
  2. Try and improve the performance in this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as written this will cast the entire input array for each row, and then only read a single row each time, which will likely perform pretty poorly

One way to avoid this might be to slice the field array and only convert the slice (e.g.

let field_array = field_array.slice(i, 1);
let field_variant_array = cast_to_variant(field_array)?;
let field_variant = field_variant_array.value(0); // index zero

Another way that might be even faster could be to recursively cast each field once at the start of the function in a pre-traversal order, and then traverse the input fields the same way (so you always had access to the current field as a variant)

@carpecodeum
Copy link
Copy Markdown
Contributor Author

Thanks @carpecodeum -- I think this implementation may have some non trivial performance issues, but we can also address that as a follow on PR

Just let me know how you want to proceed:

  1. Merge this PR and file a follow on ticket (my preference)
  2. Try and improve the performance in this PR

Thank you for reviewing, I will make the changes here

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 14, 2025

Screenshot 2025-08-14 at 1 01 33 PM

Unfortunately this branch seems to have conflicts again

@alamb alamb merged commit f248da3 into apache:main Aug 14, 2025
12 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 14, 2025

🚀

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 14, 2025

Thanks @carpecodeum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant]: Implement DataType::Struct support for cast_to_variant kernel

2 participants