-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31388][SQL][TESTS] org.apache.spark.sql.hive.thriftserver.CliSuite doesn't match results correctly #28156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #120981 has finished for PR 28156 at commit
|
|
Test build #120982 has finished for PR 28156 at commit
|
|
cc @wangyum too. |
|
Test build #120983 has finished for PR 28156 at commit
|
|
Test build #120984 has finished for PR 28156 at commit
|
| } finally { | ||
| process.destroy() | ||
| if (!process.waitFor(1, MINUTES)) { | ||
| log.warn("spark-sql did not exit. Killing process.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the SQL repl keeps waiting for the next input this will always happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are closing the stdin of the process, so it will exit.
Nevermind, then
|
@wangyum I pushed one more change: wait for the CLI process instead of killing it. Fail the test if it doesn't end gracefully. We are closing the stdin of the process, so it will exit. at the end. |
|
Hm. I still have some problems with stdout lines sometimes cut in half: |
|
Test build #121023 has finished for PR 28156 at commit
|
|
Test build #121021 has finished for PR 28156 at commit
|
|
@wangyum I pushed another change to cut some longer lines in half, and to match for first 60 bytes of query echo lines. Running multiple times it was still sometimes flaky for me because some long lines would be cut by the output processor... |
|
Test build #121024 has finished for PR 28156 at commit
|
|
Test build #121025 has finished for PR 28156 at commit
|
|
Retest this please. |
|
Test build #121152 has finished for PR 28156 at commit
|
|
@HeartSaVioR Thanks! |
|
Looks like two patches are orthogonal. Personally I feel safer if we try to isolate the env. (sure, including metastore) per test (not per suite), but yes it would bring longer test duration and possibly reach a timeout. If both patches could make sure the test suite becomes stabilized, then this patch is the thing we can merge first (as it brings less impact on test), and revisit other if the test suite becomes flaky again. |
|
This is a huge pain on existing PRs as it fails by high chance. Could some committer take a look again and deal with this? Thanks! |
|
@dongjoon-hyun @wangyum do you have extra comments, or would you think this could be merged? |
|
retest this please |
|
Test build #121751 has finished for PR 28156 at commit
|
|
thanks, merging to master/3.0! |
…uite doesn't match results correctly
### What changes were proposed in this pull request?
`CliSuite.runCliWithin` was not matching for expected results correctly. It was matching for expected lines anywhere in stdout or stderr.
On the example of `Single command with --database` test:
In
```
runCliWithin(2.minute)(
"CREATE DATABASE hive_db_test;"
-> "",
"USE hive_test;"
-> "",
"CREATE TABLE hive_test(key INT, val STRING);"
-> "",
"SHOW TABLES;"
-> "hive_test"
)
```
It was looking for lines containing "", "", "" and then "hive_test".
However, the string "hive_test" was contained in "hive_test_db", and hence:
```
2020-04-08 17:53:12,752 INFO CliSuite - 2020-04-08 17:53:12.752 - stderr> Spark master: local, Application Id: local-1586368384172
2020-04-08 17:53:12,765 INFO CliSuite - stderr> found expected output line 0: ""
2020-04-08 17:53:12,765 INFO CliSuite - 2020-04-08 17:53:12.765 - stdout> spark-sql> CREATE DATABASE hive_db_test;
2020-04-08 17:53:12,765 INFO CliSuite - stdout> found expected output line 1: ""
2020-04-08 17:53:17,688 INFO CliSuite - 2020-04-08 17:53:17.688 - stderr> chgrp: changing ownership of 'file:///tmp/spark-8811f069-4cba-4c71-a5d6-62dd925fb5ff': chown: changing group of '/tmp/spark-8811f069-4cba-4c71-a5d6-62dd925fb5ff': Operation not permitted
2020-04-08 17:53:12,765 INFO CliSuite - stderr> found expected output line 2: ""
2020-04-08 17:53:18,069 INFO CliSuite - 2020-04-08 17:53:18.069 - stderr> Time taken: 5.265 seconds
2020-04-08 17:53:18,087 INFO CliSuite - 2020-04-08 17:53:18.087 - stdout> spark-sql> USE hive_test;
2020-04-08 17:53:12,765 INFO CliSuite - stdout> found expected output line 3: "hive_test"
2020-04-08 17:53:21,742 INFO CliSuite - Found all expected output.
```
Because of that, it could kill the CLI process without really even creating the table. This was not expected. The test could be flaky depending on whether process.destroy() in the finally block managed to kill it before it actually creates the table.
I make the output checking more robust to not match on unexpected output, by making it check the echo of query output on the CLI. Also, wait for the CLI process to finish gracefully (triggered by closing its stdin), instead of killing it forcibly.
### Why are the changes needed?
org.apache.spark.sql.hive.thriftserver.CliSuite was flaky, and didn't test outputs as expected.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests in CLISuite. Tested several times with no flakiness. Was getting flaky results almost on every run before.
```
[info] CliSuite:
[info] - load warehouse dir from hive-site.xml (12 seconds, 568 milliseconds)
[info] - load warehouse dir from --hiveconf (10 seconds, 648 milliseconds)
[info] - load warehouse dir from --conf spark(.hadoop).hive.* (20 seconds, 653 milliseconds)
[info] - load warehouse dir from spark.sql.warehouse.dir (9 seconds, 763 milliseconds)
[info] - Simple commands (16 seconds, 238 milliseconds)
[info] - Single command with -e (9 seconds, 967 milliseconds)
[info] - Single command with --database (21 seconds, 205 milliseconds)
[info] - Commands using SerDe provided in --jars (15 seconds, 51 milliseconds)
[info] - SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path (14 seconds, 625 milliseconds)
[info] - SPARK-11188 Analysis error reporting (7 seconds, 960 milliseconds)
[info] - SPARK-11624 Spark SQL CLI should set sessionState only once (7 seconds, 424 milliseconds)
[info] - list jars (9 seconds, 520 milliseconds)
[info] - list jar <jarfile> (9 seconds, 277 milliseconds)
[info] - list files (9 seconds, 828 milliseconds)
[info] - list file <filepath> (9 seconds, 646 milliseconds)
[info] - apply hiveconf from cli command (9 seconds, 469 milliseconds)
[info] - Support hive.aux.jars.path (10 seconds, 676 milliseconds)
[info] - SPARK-28840 test --jars command (10 seconds, 921 milliseconds)
[info] - SPARK-28840 test --jars and hive.aux.jars.path command (11 seconds, 49 milliseconds)
[info] - SPARK-29022 Commands using SerDe provided in ADD JAR sql (14 seconds, 210 milliseconds)
[info] - SPARK-26321 Should not split semicolon within quoted string literals (12 seconds, 729 milliseconds)
[info] - Pad Decimal numbers with trailing zeros to the scale of the column (10 seconds, 381 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented lines (10 seconds, 935 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented with multi-lines (20 seconds, 731 milliseconds)
```
Closes #28156 from juliuszsompolski/SPARK-31388.
Authored-by: Juliusz Sompolski <julek@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 560bd54)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
CliSuite.runCliWithinwas not matching for expected results correctly. It was matching for expected lines anywhere in stdout or stderr.On the example of
Single command with --databasetest:In
It was looking for lines containing "", "", "" and then "hive_test".
However, the string "hive_test" was contained in "hive_test_db", and hence:
Because of that, it could kill the CLI process without really even creating the table. This was not expected. The test could be flaky depending on whether process.destroy() in the finally block managed to kill it before it actually creates the table.
I make the output checking more robust to not match on unexpected output, by making it check the echo of query output on the CLI. Also, wait for the CLI process to finish gracefully (triggered by closing its stdin), instead of killing it forcibly.
Why are the changes needed?
org.apache.spark.sql.hive.thriftserver.CliSuite was flaky, and didn't test outputs as expected.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests in CLISuite. Tested several times with no flakiness. Was getting flaky results almost on every run before.