Skip to content

feat(spark): add base64 and unbase64 functions#19968

Merged
comphead merged 11 commits intoapache:mainfrom
cht42:spark-base64
Jan 26, 2026
Merged

feat(spark): add base64 and unbase64 functions#19968
comphead merged 11 commits intoapache:mainfrom
cht42:spark-base64

Conversation

@cht42
Copy link
Copy Markdown
Contributor

@cht42 cht42 commented Jan 24, 2026

Which issue does this PR close?

Rationale for this change

Add spark compatible base64/unbase64 functions

What changes are included in this PR?

  • new encoding mode in DF encoding UDF for padded base64
  • spark udfs for base64/unbase64

Are these changes tested?

yes in SLT

Are there any user-facing changes?

yes

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation spark labels Jan 24, 2026
@@ -19,7 +19,7 @@ extern crate criterion;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this benchmark was broken beacuse of input/output type mismatch


match self {
Self::Base64 => {
Self::Base64 | Self::Base64Padded => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoding is the same


fn return_field_from_args(&self, args: ReturnFieldArgs<'_>) -> Result<FieldRef> {
let [bin] = take_function_args(self.name(), args.arg_fields)?;
let return_type = match bin.data_type() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching return type with DF encode function


fn return_field_from_args(&self, args: ReturnFieldArgs<'_>) -> Result<FieldRef> {
let [str] = take_function_args(self.name(), args.arg_fields)?;
let return_type = match str.data_type() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching return type with DF decode function

Comment thread datafusion/functions/src/encoding/inner.rs
Comment thread datafusion/spark/src/function/string/base64.rs Outdated
Comment thread datafusion/spark/src/function/string/base64.rs Outdated
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jan 24, 2026
Comment thread datafusion/spark/src/function/string/base64.rs Outdated
Comment thread datafusion/spark/src/function/string/base64.rs Outdated
}

fn invoke_with_args(&self, _args: ScalarFunctionArgs) -> Result<ColumnarValue> {
exec_err!("{} should have been simplified", self.name())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it fires, the error message would be highly confusing IMO

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cht42 the PR is great, PTAL on CI failures

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cht42 and @Jefffrey for review

@comphead comphead added this pull request to the merge queue Jan 26, 2026
Merged via the queue into apache:main with commit 8efc2b6 Jan 26, 2026
29 checks passed
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19967
- Part of apache#15914 

## Rationale for this change

Add spark compatible base64/unbase64 functions

## What changes are included in this PR?

- new encoding mode in DF encoding UDF for padded base64
- spark udfs for base64/unbase64

## Are these changes tested?

yes in SLT

## Are there any user-facing changes?

yes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Add base64 and unbase64 function

3 participants