add values list expression #1165

jimexist · 2021-10-22T13:53:22Z

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb

This is awesome @jimexist -- it will make creating small examples / demos much easier 🏅 🏅 🏅

The only thing I think that needs to be fixed prior to merge is the contents of from_plan in utils.rs where values is not recreated correctly from the input exprs.

The test cases are very thorough as is the implementation.

I took it for a spin locally:

> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three'));
Plan("subquery in FROM must have an alias")
> 
> 
> 
> SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) as t;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | one     |
| 2       | two     |
| 3       | three   |
+---------+---------+
3 rows in set. Query took 0.007 seconds.
> 


SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) as t(num, letter);
+-----+--------+
| num | letter |
+-----+--------+
| 1   | one    |
| 2   | two    |
| 3   | three  |
+-----+--------+
3 rows in set. Query took 0.005 seconds.

Which is very cool (it is working the same as Postgres!)

alamb · 2021-10-23T10:52:05Z

datafusion/tests/sql.rs

+        assert!(plan.is_err());
+    }
+    {
+        let sql = "VALUES (1),('2')";


👍 for testing the negative case

datafusion/src/logical_plan/builder.rs

datafusion/src/logical_plan/plan.rs

alamb · 2021-10-23T11:09:03Z

datafusion/src/physical_plan/values.rs

+        if data.is_empty() {
+            return Err(DataFusionError::Plan("Values list cannot be empty".into()));
+        }
+        // we have this empty batch as a placeholder to satisfy evaluation argument


I wonder what you think about moving the creation of the actual RecordBatch from ValuesExec::try_new to execute -- the rationale would be to make PhysicalPlan creation faster and push the actual work into execute where if can potentially be run concurrently with other parts

Given the size of data in a VALUES statement, this is not likely to be any real difference so I am fine with leaving the creation in the same place too -- I just wanted to mention it.

i can address this in a subsequent PR where more expr types are supported (e.g. CAST)

datafusion/src/logical_plan/plan.rs

alamb · 2021-10-23T11:21:39Z

datafusion/src/optimizer/constant_folding.rs

            | LogicalPlan::Aggregate { .. }
            | LogicalPlan::Repartition { .. }
            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Values { .. }


I suspect it is not likely to matter, but constant folding could be applied to the Exprs in values. As written this code will not apply constant folding to those expressions

address in #1170

alamb · 2021-10-23T11:25:06Z

datafusion/src/optimizer/utils.rs

        }),
+        LogicalPlan::Values { schema, values } => Ok(LogicalPlan::Values {
+            schema: schema.clone(),
+            values: values.to_vec(),


I think the values here should be derived from expr - the various optimizers call from_plan to create a new logical plan after potentially rewriting expressions returned from LogicalPlan::expressions.

something like (untested)

values : exprs.windows(values[0].len()).map(|w| w.to_vec()).collect()

I'll add some comments to make that clearer

alamb · 2021-10-23T11:32:59Z

datafusion/tests/sql.rs

+        assert_batches_eq!(expected, &actual);
+    }
+    {
+        let sql = "VALUES (NULL,'a'),(NULL,'b'),(3,'c')";


LOL every case I could come up with to test you have already covered

alamb · 2021-10-23T11:43:30Z

It also would be cool to add VALUES to the list of supported features on https://github.com/apache/arrow-datafusion#sql-support

jimexist · 2021-10-23T11:53:12Z

It also would be cool to add VALUES to the list of supported features on https://github.com/apache/arrow-datafusion#sql-support

let me track that here

alamb · 2021-10-23T21:19:23Z

🎉

…factoring (apache#1165) * move CheckOverflow to spark-expr crate * move NegativeExpr to spark-expr crate * move UnboundColumn to spark-expr crate * move ExpandExec from execution::datafusion::operators to execution::operators * refactoring to remove datafusion subpackage * update imports in benches * fix * fix

* feat: add support for array_contains expression * test: add unit test for array_contains function * Removes unnecessary case expression for handling null values * chore: Move more expressions from core crate to spark-expr crate (apache#1152) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports * remove dead code (apache#1155) * fix: Spark 4.0-preview1 SPARK-47120 (apache#1156) ## Which issue does this PR close? Part of apache/datafusion-comet#372 and apache/datafusion-comet#551 ## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR fixes the new test SPARK-47120 added in Spark 4.0 ## How are these changes tested? tests enabled * chore: Move string kernels and expressions to spark-expr crate (apache#1164) * Move string kernels and expressions to spark-expr crate * remove unused hash kernel * remove unused dependencies * chore: Move remaining expressions to spark-expr crate + some minor refactoring (apache#1165) * move CheckOverflow to spark-expr crate * move NegativeExpr to spark-expr crate * move UnboundColumn to spark-expr crate * move ExpandExec from execution::datafusion::operators to execution::operators * refactoring to remove datafusion subpackage * update imports in benches * fix * fix * chore: Add ignored tests for reading complex types from Parquet (apache#1167) * Add ignored tests for reading structs from Parquet * add basic map test * add tests for Map and Array * feat: Add Spark-compatible implementation of SchemaAdapterFactory (apache#1169) * Add Spark-compatible SchemaAdapterFactory implementation * remove prototype code * fix * refactor * implement more cast logic * implement more cast logic * add basic test * improve test * cleanup * fmt * add support for casting unsigned int to signed int * clippy * address feedback * fix test * fix: Document enabling comet explain plan usage in Spark (4.0) (apache#1176) * test: enabling Spark tests with offHeap requirement (apache#1177) ## Which issue does this PR close? ## Rationale for this change After apache/datafusion-comet#1062 We have not running Spark tests for native execution ## What changes are included in this PR? Removed the off heap requirement for testing ## How are these changes tested? Bringing back Spark tests for native execution * feat: Improve shuffle metrics (second attempt) (apache#1175) * improve shuffle metrics * docs * more metrics * refactor * address feedback * fix: stddev_pop should not directly return 0.0 when count is 1.0 (apache#1184) * add test * fix * fix * fix * feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` (apache#1185) * Make shuffle compression codec and level configurable * remove lz4 references * docs * update comment * clippy * fix benches * clippy * clippy * disable test for miri * remove lz4 reference from proto * minor: move shuffle classes from common to spark (apache#1193) * minor: refactor decodeBatches to make private in broadcast exchange (apache#1195) * minor: refactor prepare_output so that it does not require an ExecutionContext (apache#1194) * fix: fix missing explanation for then branch in case when (apache#1200) * minor: remove unused source files (apache#1202) * chore: Upgrade to DataFusion 44.0.0-rc2 (apache#1154) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports * save * save * save * remove unused imports * clippy * implement more hashers * implement Hash and PartialEq * implement Hash and PartialEq * implement Hash and PartialEq * benches * fix ScalarUDFImpl.return_type failure * exclude test from miri * ignore correct test * ignore another test * remove miri checks * use return_type_from_exprs * Revert "use return_type_from_exprs" This reverts commit febc1f1ec1301f9b359fc23ad6a117224fce35b7. * use DF main branch * hacky workaround for regression in ScalarUDFImpl.return_type * fix repo url * pin to revision * bump to latest rev * bump to latest DF rev * bump DF to rev 9f530dd * add Cargo.lock * bump DF version * no default features * Revert "remove miri checks" This reverts commit 4638fe3aa5501966cd5d8b53acf26c698b10b3c9. * Update pin to DataFusion e99e02b * update pin * Update Cargo.toml Bump to 44.0.0-rc2 * update cargo lock * revert miri change --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * update UT Signed-off-by: Dharan Aditya <dharan.aditya@gmail.com> * fix typo in UT Signed-off-by: Dharan Aditya <dharan.aditya@gmail.com> --------- Signed-off-by: Dharan Aditya <dharan.aditya@gmail.com> Co-authored-by: Andy Grove <agrove@apache.org> Co-authored-by: KAZUYUKI TANIMURA <ktanimura@apple.com> Co-authored-by: Parth Chandra <parthc@apache.org> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

github-actions bot added ballista sql SQL Planner labels Oct 22, 2021

jimexist force-pushed the add-values-list-impl branch 2 times, most recently from bb74f4d to 1216110 Compare October 23, 2021 06:53

jimexist changed the title ~~WIP add values list expression~~ add values list expression Oct 23, 2021

jimexist requested review from alamb and houqp and removed request for alamb October 23, 2021 06:54

jimexist force-pushed the add-values-list-impl branch 2 times, most recently from 71c04fc to 7f2170e Compare October 23, 2021 08:21

alamb approved these changes Oct 23, 2021

View reviewed changes

alamb mentioned this pull request Oct 23, 2021

Add additional docstring comments to from_plan #1168

Merged

This was referenced Oct 23, 2021

optimize values list execution plan by moving the evaluation part to execution phase #1169

Open

apply constant folding to LogicalPlan::Values #1170

Closed

update the homepage README to include values, approx_distinct, etc. #1171

Closed

jimexist force-pushed the add-values-list-impl branch from 3ee8066 to b5d8f04 Compare October 23, 2021 11:53

add values list expression

ba42411

jimexist force-pushed the add-values-list-impl branch from 8f0dcf7 to ba42411 Compare October 23, 2021 11:57

apply formatting

364b574

jimexist merged commit 3c1b807 into apache:master Oct 23, 2021

jimexist deleted the add-values-list-impl branch October 23, 2021 13:10

jimexist mentioned this pull request Oct 23, 2021

add support for unary and binary values in values list, update docs #1172

Merged

houqp added the enhancement New feature or request label Nov 6, 2021

add values list expression #1165

add values list expression #1165

Uh oh!

Conversation

jimexist commented Oct 22, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 23, 2021

Uh oh!

jimexist commented Oct 23, 2021

Uh oh!

alamb commented Oct 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants