-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Cancellation benchmark #14818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancellation benchmark #14818
Conversation
187e64f to
ca6074f
Compare
|
Let's see if this works too.... /benchmark |
Connects to apache#14036. This benchmark loads multiple files into an in-memory object store, starts a datafusion query in a new tokio runtime, lets the query run for an amount of time, cancels the query, and measures how long it takes to drop the tokio runtime. This demonstrates datafusion is likely not yielding often enough to allow for timely query cancellation and freeing up of all resources.
ca6074f to
a768d25
Compare
It doesn't sadly -- we removed this due to potential security concerns |
Ok-- I'll remove the documentation about it in my benchmark docs PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @carols10cents
I tested this PR with
./benchmarks/bench.sh run cancellationAnd it looked like this (👍 )
benchmarks/data/cancellation -o /Users/andrewlamb/Software/datafusion/benchmarks/results/cancellation-test-case/cancellation.json`
No data files found, generating (this will take a bit)
Generating file 1 of 7
Generating file 2 of 7
Generating file 3 of 7
Generating file 4 of 7
Generating file 5 of 7
Generating file 6 of 7
Generating file 7 of 7
Done generating files
Using 7 files now on disk
Starting to load data into in-memory object store
Done loading data into in-memory object store
in main, sleeping
Starting spawned
Creating logical plan...
Creating physical plan...
Executing physical plan...
Getting results...
cancelling thread
done dropping runtime in 32.72425ms
Iteration 0 cancelled in 32.724250000000005 ms
in main, sleeping
...
I made a small comment suggestion, but I think we can do it as a follow on PR
|
|
||
| /// Test performance of cancelling queries | ||
| /// | ||
| /// The queries are executed on a synthetic dataset generated during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// The queries are executed on a synthetic dataset generated during | |
| /// Queries in DataFusion should stop executing "quickly" after they are | |
| /// cancelled (the output stream is dropped). | |
| /// | |
| /// The queries are executed on a synthetic dataset generated during |
|
Thanks again @carols10cents |
Which issue does this PR close?
Rationale for this change
The behavior observed in #14036 was hard to reproduce and quantify; having a benchmark makes that easier!
What changes are included in this PR?
This benchmark loads multiple files into an in-memory object store, starts a datafusion query in a new tokio runtime, lets the query run for an amount of time, cancels the query, and measures how long it takes to drop the tokio runtime.
This demonstrates datafusion is likely not yielding often enough to allow for timely query cancellation and freeing up of all resources.
Are these changes tested?
This PR is only tests :)
Are there any user-facing changes?
Nope!