Skip to content

fix: Cast string to boolean not compatible with Spark#107

Merged
sunchao merged 5 commits intoapache:mainfrom
erenavsarogullari:comet-issue-17
Feb 26, 2024
Merged

fix: Cast string to boolean not compatible with Spark#107
sunchao merged 5 commits intoapache:mainfrom
erenavsarogullari:comet-issue-17

Conversation

@erenavsarogullari
Copy link
Copy Markdown
Member

@erenavsarogullari erenavsarogullari commented Feb 24, 2024

Which issue does this PR close?

Closes #17.

Rationale for this change

Comet needs to be compatible with Spark on string to boolean casting results.

What changes are included in this PR?

Currently, Spark supports following boolean values and Comet also needs to compatible with Spark.

trueStrings => "t", "true", "y", "yes", "1"
falseStrings => "f", "false", "n", "no", "0"

Spark Reference API:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L74
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L585
Note: I will also add spark.sql.ansi.enabled case support.

How are these changes tested?

Added new UT.

Comment thread spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala Outdated
@viirya
Copy link
Copy Markdown
Member

viirya commented Feb 24, 2024

Note: I will also add spark.sql.ansi.enabled case support.

Comet doesn't support ansi enabled case for now. I think we can ignore them first.

@erenavsarogullari erenavsarogullari changed the title [WIP] fix: Cast string to boolean not compatible with Spark fix: Cast string to boolean not compatible with Spark Feb 24, 2024
Comment thread spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Outdated
Comment thread core/src/execution/datafusion/expressions/cast.rs Outdated
Comment thread core/src/execution/datafusion/expressions/cast.rs Outdated
Comment on lines +80 to +83
(DataType::Utf8, DataType::Boolean) => Self::spark_cast_utf8_to_boolean::<i32>(&array),
(DataType::LargeUtf8, DataType::Boolean) => {
Self::spark_cast_utf8_to_boolean::<i64>(&array)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to handle dictionary types too. Maybe as a follow up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for casting from Dictionary.value_type to Boolean?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., Dictionary(_, DataType::Utf8) and Dictionary(_, DataType:: LargeUtf8)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, please feel free to create an issue so i can have a look after this PR. Thanks

@sunchao sunchao merged commit 96dfccf into apache:main Feb 26, 2024
@sunchao
Copy link
Copy Markdown
Member

sunchao commented Feb 26, 2024

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cast string to boolean not compatible with Spark

3 participants