Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented May 19, 2023

Which issue does this PR close?

Closes #6392
Closes #4734

Rationale for this change

I want to run sqllogictests faster locally to increase development velocity

What changes are included in this PR?

  1. Run tests in parallel (up to the number of cores on the machine)
  2. Reduce output in normal runs
  3. Add docs on logging

Are these changes tested?

They are all tested and I tested manually in a few other scenarios:

Timings

On my 8 core laptop:

Main (10s)

$ time /Users/alamb/Software/target-df/debug/deps/sqllogictests-99ee8d9f48c22ddb 2>&1 > /dev/null

real	0m10.113s
user	0m11.810s
sys	0m0.460s

This PR (2.5s)

$ time /Users/alamb/Software/target-df2/debug/deps/sqllogictests-650a1844c5cdf284 2>&1  > /dev/null

real	0m2.578s
user	0m15.471s
sys	0m0.519s

Successful Case

cargo test -p datafusion --test sqllogictests --features=avro 

Running "cte.slt"
Running "math.slt"
Running "wildcard.slt"
Running "subquery.slt"
Running "strings.slt"
...
Running "dates.slt"
Running "errors.slt"
Running "arrow_typeof.slt"
Running "prepare.slt"
Running "limit.slt"
Running "information_schema.slt"
Running "predicates.slt"
Running "nullif.slt"
Running "type_coercion.slt"

Example output with a single test diff:

cargo test -p datafusion --features=avro --test sqllogictests
   Compiling datafusion v24.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/core)
    Finished test [unoptimized + debuginfo] target(s) in 4.05s
     Running tests/sqllogictests/src/main.rs (/Users/alamb/Software/target-df2/debug/deps/sqllogictests-650a1844c5cdf284)
External error: query result mismatch:
[SQL] SELECT bit_and(c5), bit_and(c6), bit_and(c7), bit_and(c8), bit_and(c9) FROM aggregate_test_100
[Diff] (-expected|+actual)
-   0 0 0 0 5
+   0 0 0 0 0
at tests/sqllogictests/test_files/aggregate.slt:85

Example output with two files that error

External error: query result mismatch:
[SQL] SELECT bit_and(c5), bit_and(c6), bit_and(c7), bit_and(c8), bit_and(c9) FROM aggregate_test_100
[Diff] (-expected|+actual)
-   0 0 0 0 5
+   0 0 0 0 0
at tests/sqllogictests/test_files/aggregate.slt:85

External error: query result mismatch:
[SQL] EXPLAIN
INSERT INTO table_without_values SELECT
SUM(c4) OVER(PARTITION BY c1 ORDER BY c9 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING),
COUNT(*) OVER(PARTITION BY c1 ORDER BY c9 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM aggregate_test_100
ORDER by c1
[Diff] (-expected|+actual)
    logical_plan
    Dml: op=[Insert] table=[table_without_values]
    --Projection: SUM(aggregate_test_100.c4) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING AS field1, COUNT(UInt8(1)) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING AS field2
    ----Sort: aggregate_test_100.c1 ASC NULLS LAST
    ------Projection: SUM(aggregate_test_100.c4) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING, COUNT(UInt8(1)) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING, aggregate_test_100.c1
    --------WindowAggr: windowExpr=[[SUM(aggregate_test_100.c4) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING, COUNT(UInt8(1)) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING]]
    ----------TableScan: aggregate_test_100 projection=[c1, c4, c9]
    physical_plan
-   InsertExec: sink=MemoryTable (partitions=1)
+   MemoryWriteExec: partitions=1, input_partition=1
    --ProjectionExec: expr=[SUM(aggregate_test_100.c4) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING@0 as field1, COUNT(UInt8(1)) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING@1 as field2]
    ----SortPreservingMergeExec: [c1@2 ASC NULLS LAST]
    ------ProjectionExec: expr=[SUM(aggregate_test_100.c4) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING@3 as SUM(aggregate_test_100.c4), COUNT(UInt8(1)) PARTITION BY [aggregate_test_100.c1] ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING@4 as COUNT(UInt8(1)), c1@0 as c1]
    --------BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c4): Ok(Field { name: "SUM(aggregate_test_100.c4)", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(1)) }, COUNT(UInt8(1)): Ok(Field { name: "COUNT(UInt8(1))", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(1)) }], mode=[Sorted]
    ----------SortExec: expr=[c1@0 ASC NULLS LAST,c9@2 ASC NULLS LAST]
    ------------CoalesceBatchesExec: target_batch_size=8192
    --------------RepartitionExec: partitioning=Hash([Column { name: "c1", index: 0 }], 8), input_partitions=8
    ----------------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
    ------------------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c4, c9], has_header=true
at tests/sqllogictests/test_files/insert.slt:51

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 19, 2023
@alamb alamb force-pushed the alamb/sqllogictest_parallel branch from dcc1f49 to e4b7ff7 Compare May 19, 2023 14:57
@alamb alamb marked this pull request as ready for review May 19, 2023 18:54
run_test_file(&path, relative_path).await?;
let test_files: Vec<_> = read_test_files(&options).collect();

// Run all tests in parallel, reporting failures at the end
Copy link
Contributor

@comphead comphead May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering, if there can be a race condition if multiple tests work with table t1, and the table dropped by test who has finished first. Other test could fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each slt file runs with its own SessionContext so I think they should be isolated from one another.

I suppose if they all shared a temporary directory or something else that could be changed, that would be a problem.

Perhaps I can add some comments in various places explaining why it is important to keep the tests from having externally visible side effects so they can be parallelized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In deb9a4e and 139b594

@alamb
Copy link
Contributor Author

alamb commented May 23, 2023

FYI @melgenek and @xudong963 -- I am not sure if you are interested in this PR, but if you had time to review I would be most appreciative

Copy link
Contributor

@melgenek melgenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very neat. Thank you!

.join()
.unwrap();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function above for Windows creates a single-threaded executor. I made it this way by mistake.
It should be tokio::runtime::Builder::new_multi_thread().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 19f145c 👍

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't find time to do the feature, thanks @alamb making it happen!

@alamb alamb merged commit 3214450 into apache:main May 24, 2023
@alamb
Copy link
Contributor Author

alamb commented May 24, 2023

Thanks everyone!

@alamb alamb deleted the alamb/sqllogictest_parallel branch May 24, 2023 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster sqllogictest runs Run all sqllogictest, even if there are test failures in between

4 participants