-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17205] Literal.sql should handle Infinity and NaN #14777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with | |
| case (v: Short, ShortType) => v + "S" | ||
| case (v: Long, LongType) => v + "L" | ||
| // Float type doesn't have a suffix | ||
| case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" | ||
| case (v: Double, DoubleType) => v + "D" | ||
| case (v: Float, FloatType) => | ||
| val castedValue = v match { | ||
| case _ if v.isNaN => "'NaN'" | ||
| case Float.PositiveInfinity => "'Infinity'" | ||
| case Float.NegativeInfinity => "'-Infinity'" | ||
| case _ => v | ||
| } | ||
| s"CAST($castedValue AS ${FloatType.sql})" | ||
| case (v: Double, DoubleType) => | ||
| v match { | ||
| case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" | ||
| case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" | ||
| case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" | ||
| case _ => v + "D" | ||
| } | ||
| case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also prevent that a Decimal gets written in scientific notation? Or should we do that in a different PR?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm.... as discussed, that's going to look very ugly but might be more compatible with Postgres and won't be lossy for very precise decimals. I say that we defer to followup for now.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, let me go ahead and quickly confirm whether Hive will support full expansion...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-FloatingPointTypes:
However, the professed lack of support for scientific notation seems to be contradicted by https://issues.apache.org/jira/browse/HIVE-2536 and manual tests. Here's a test query which demonstrates the precision issues in decimal literals: In Hive, these both behave equivalently: both forms of the number are interpreted as double so we lose precision and both cases wind up as In Spark 2.0, the first expanded form is parsed as a decimal literal, while the scientific notation form is parsed as a double, so the expanded form correctly preserves the decimal while the scientific notation causes precision loss (as in Hive). I think there's two possible fixes here: we could either emit the fully-expanded form or could update Spark's parser to treat scientific notation floating point literals as decimals. From a consistency point, I'm in favor of the latter approach because I don't think it makes sense for Given all of this, I think that it would certainly be safe to emit fully-expanded forms of the decimal but I'm not sure if this is the optimal fix because it doesn't resolve inconsistencies between Spark and Hive and results in really ugly, hard-to-read expressions.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A third option would be to add support for Hive's BigDecimal literals. Any number ending with a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BigDecimal literals are a good idea. Given that there are multiple overlapping / complimentary approaches here, I think we should fork this discussion and defer any decimal changes to a separate PR.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, lets do that. I created SPARK-17246 to track this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have created SPARK-17246 to track this. |
||
| case (v: Int, DateType) => s"DATE '${DateTimeUtils.toJavaDate(v)}'" | ||
| case (v: Long, TimestampType) => s"TIMESTAMP('${DateTimeUtils.toJavaTimestamp(v)}')" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the original code, this is intended to work with Spark / Hive; Postgres would use a slightly different form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would PostgreSQL use? I don't think it would be bad to increase compatibility with PostgreSQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd have to use
CAST(x as DOUBLE PRECISION), but Spark doesn't seem to supportDOUBLE PRECISIONand neither does Hive (AFAIK).Postgres doesn't understand the
Dsuffix and instead treats it as a column name.