Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with
case (v: Short, ShortType) => v + "S"
case (v: Long, LongType) => v + "L"
// Float type doesn't have a suffix
case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})"
case (v: Double, DoubleType) => v + "D"
case (v: Float, FloatType) =>
val castedValue = v match {
case _ if v.isNaN => "'NaN'"
case Float.PositiveInfinity => "'Infinity'"
case Float.NegativeInfinity => "'-Infinity'"
case _ => v
}
s"CAST($castedValue AS ${FloatType.sql})"
case (v: Double, DoubleType) =>
v match {
case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})"
case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})"
case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})"
case _ => v + "D"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the original code, this is intended to work with Spark / Hive; Postgres would use a slightly different form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would PostgreSQL use? I don't think it would be bad to increase compatibility with PostgreSQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd have to use CAST(x as DOUBLE PRECISION), but Spark doesn't seem to support DOUBLE PRECISION and neither does Hive (AFAIK).

Postgres doesn't understand the D suffix and instead treats it as a column name.

}
case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
Copy link
Contributor

@hvanhovell hvanhovell Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also prevent that a Decimal gets written in scientific notation? Or should we do that in a different PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... as discussed, that's going to look very ugly but might be more compatible with Postgres and won't be lossy for very precise decimals. I say that we defer to followup for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let me go ahead and quickly confirm whether Hive will support full expansion...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-FloatingPointTypes:

Floating point literals are assumed to be DOUBLE. Scientific notation is not yet supported.

However, the professed lack of support for scientific notation seems to be contradicted by https://issues.apache.org/jira/browse/HIVE-2536 and manual tests.

Here's a test query which demonstrates the precision issues in decimal literals:

SELECT
    CAST(-0.000000000000000006688467811848818630 as DECIMAL(38, 36)), 
    CAST(-6.688467811848818630E-18 AS DECIMAL(38, 36))

In Hive, these both behave equivalently: both forms of the number are interpreted as double so we lose precision and both cases wind up as 0.000000000000000006688467811848818 (with the final three digits lost).

In Spark 2.0, the first expanded form is parsed as a decimal literal, while the scientific notation form is parsed as a double, so the expanded form correctly preserves the decimal while the scientific notation causes precision loss (as in Hive).

I think there's two possible fixes here: we could either emit the fully-expanded form or could update Spark's parser to treat scientific notation floating point literals as decimals.

From a consistency point, I'm in favor of the latter approach because I don't think it makes sense for 1.1 and 1.1e0 to be treated differently.

Given all of this, I think that it would certainly be safe to emit fully-expanded forms of the decimal but I'm not sure if this is the optimal fix because it doesn't resolve inconsistencies between Spark and Hive and results in really ugly, hard-to-read expressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A third option would be to add support for Hive's BigDecimal literals. Any number ending with a BD would be treated as a BigDecimal literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigDecimal literals are a good idea.

Given that there are multiple overlapping / complimentary approaches here, I think we should fork this discussion and defer any decimal changes to a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, lets do that. I created SPARK-17246 to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created SPARK-17246 to track this.

case (v: Int, DateType) => s"DATE '${DateTimeUtils.toJavaDate(v)}'"
case (v: Long, TimestampType) => s"TIMESTAMP('${DateTimeUtils.toJavaTimestamp(v)}')"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,13 @@ class ExpressionSQLBuilderSuite extends SQLBuilderTest {
checkSQL(Literal(4: Int), "4")
checkSQL(Literal(8: Long), "8L")
checkSQL(Literal(1.5F), "CAST(1.5 AS FLOAT)")
checkSQL(Literal(Float.PositiveInfinity), "CAST('Infinity' AS FLOAT)")
checkSQL(Literal(Float.NegativeInfinity), "CAST('-Infinity' AS FLOAT)")
checkSQL(Literal(Float.NaN), "CAST('NaN' AS FLOAT)")
checkSQL(Literal(2.5D), "2.5D")
checkSQL(Literal(Double.PositiveInfinity), "CAST('Infinity' AS DOUBLE)")
checkSQL(Literal(Double.NegativeInfinity), "CAST('-Infinity' AS DOUBLE)")
checkSQL(Literal(Double.NaN), "CAST('NaN' AS DOUBLE)")
checkSQL(
Literal(Timestamp.valueOf("2016-01-01 00:00:00")), "TIMESTAMP('2016-01-01 00:00:00.0')")
// TODO tests for decimals
Expand Down