Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

allow the sql test files to specify different dimensions of config sets during testing. For example,

--CONFIG_DIM1 a=1
--CONFIG_DIM1 b=2,c=3

--CONFIG_DIM2 x=1
--CONFIG_DIM2 y=1,z=2

This example defines 2 config dimensions, and each dimension defines 2 config sets. We will run the queries 4 times:

  1. a=1, x=1
  2. a=1, y=1, z=2
  3. b=2, c=3, x=1
  4. b=2, c=3, y=1, z=2

Why are the changes needed?

Currently SQLQueryTestSuite takes a long time. This is because we run each test at least 3 times, to check with different codegen modes. This is not necessary for most of the tests, e.g. DESC TABLE. We should only check these codegen modes for certain tests.

With the --CONFIG_DIM directive, we can do things like: test different join operator(broadcast or shuffle join) X different codegen modes.

After reducing testing time, we should be able to run thrifter server SQL tests with config settings.

Does this PR introduce any user-facing change?

no

How was this patch tested?

test only

@cloud-fan
Copy link
Contributor Author

@cloud-fan cloud-fan added the SQL label Nov 20, 2019
@@ -1,8 +1,3 @@
-- List of configuration the test suite is run against:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to test the optimizer, don't need to run it with different join operators

@@ -1,8 +1,3 @@
-- List of configuration the test suite is run against:
Copy link
Contributor Author

@cloud-fan cloud-fan Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to test the analyzer/optimizer. Natural join will be rewritten to other normal joins, no need to un it with different join operators


-- Set the cross join enabled flag for the LEFT JOIN test since there's no join condition.
-- Ultimately the join should be optimized away.
set spark.sql.crossJoin.enabled = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true by default now.

@@ -1,8 +1,3 @@
-- List of configuration the test suite is run against:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are testing UDFs, and the join operator doesn't matter.

@SparkQA
Copy link

SparkQA commented Nov 20, 2019

Test build #114175 has finished for PR 26612 at commit 3be4734.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -104,9 +101,6 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite {
"subquery/in-subquery/simple-in.sql",
"subquery/in-subquery/in-order-by.sql",
"subquery/in-subquery/in-set-operations.sql",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cloud-fan .
This causes linter failure. Please remove the ending ,.

$ dev/lint-scala
Scalastyle checks failed at following occurrences:
[error] /Users/dongjoon/PRS/PR-26612/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala: illegal start of simple expression: Token(RPAREN,),3495,))
[error] Total time: 20 s, completed Nov 20, 2019 11:24:55 AM

@SparkQA
Copy link

SparkQA commented Nov 21, 2019

Test build #114198 has finished for PR 26612 at commit 9562cae.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Nov 21, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Nov 21, 2019

Test build #114216 has finished for PR 26612 at commit 9562cae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2019

Test build #114218 has finished for PR 26612 at commit 9562cae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

After this PR:

  • SQLQueryTestSuite: 25min
  • ThriftServerQueryTestSuite: 18min

Before this PR: (use the result of #26214)

  • SQLQueryTestSuite: 47min
  • ThriftServerQueryTestSuite: 8min (but can't apply settings)

@wangyum
Copy link
Member

wangyum commented Nov 21, 2019

Yes. Much faster than before.
SQLQueryTestSuite:

[info] Run completed in 15 minutes, 43 seconds.
[info] Total number of tests run: 202

ThriftServerQueryTestSuite:

[info] Run completed in 13 minutes, 13 seconds.
[info] Total number of tests run: 137

@HyukjinKwon
Copy link
Member

Merged to master.

-- Test aggregate operator with codegen on and off.
--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY
--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry to be late) In the one dimension case, CONFIG_DIM1 is the same with SET?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's different. We still run this test 3 times as there are 3 config sets in this dimension. It's only the same with SET if there is only one dimension and one config set.

// - config: (String, String))
// We need to do cartesian product for all the config dimensions, to get a list of
// config sets, and run the query once for each config set.
val configDimLines = comments.filter(_.startsWith("--CONFIG_DIM")).map(_.substring(12))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to update how-to use CONFIG_DIM in

Copy link
Member

@HyukjinKwon HyukjinKwon Nov 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should document this there ...

maropu pushed a commit that referenced this pull request Nov 25, 2019
### What changes were proposed in this pull request?

add document to address #26612 (comment)

### Why are the changes needed?

help people understand how to use --CONFIG_DIM

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #26661 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants