Set DataFusion runtime configurations through SQL interface#15594
Conversation
3d743e5 to
bfb1d0e
Compare
bfb1d0e to
8d894be
Compare
|
Hello @kumarlokesh. Thank you for working on this. I have 2 questions/concerns. Let's discuss on them a bit to get a future-proof design
|
8d894be to
fa34e62
Compare
bae6d2f to
08d038c
Compare
@berkaysynnada Thank you for the feedback!
|
| let ctx = SessionContext::new(); | ||
|
|
||
| // Set memory limit to 100MB using SQL - note the quotes around the value | ||
| ctx.sql("SET datafusion.runtime.memory_limit = '100M'") |
There was a problem hiding this comment.
Running this test can't ensure this config is set, because it can run without setting the memory limit and use the default UnboundedMemoryPool, instead I think it should assert the spill count for a query that has spilled some intermediate data.
SET datafusion.runtime.memory_limit = '1M'
set datafusion.execution.sort_spill_reservation_bytes = 0;
select * from generate_series(1, 100000) as t1(v1) order by v1;
-- And assert spill-count from the query is > 0
You can check this
for how to assert the spill file count.Later after all configurations are added, I think we should make this test case stronger by setting more runtime configs, and do some property test to ensure all of them are properly set.
BTW, I tried the above test locally, and the generate_series UDTF seems not registered in the PR branch, but it works in the main branch.
There was a problem hiding this comment.
Running this test can't ensure this config is set, because it can run without setting the memory limit and use the default
UnboundedMemoryPool, instead I think it should assert the spill count for a query that has spilled some intermediate data.SET datafusion.runtime.memory_limit = '1M' set datafusion.execution.sort_spill_reservation_bytes = 0; select * from generate_series(1, 100000) as t1(v1) order by v1; -- And assert spill-count from the query is > 0You can check this
for how to assert the spill file count.
Later after all configurations are added, I think we should make this test case stronger by setting more runtime configs, and do some property test to ensure all of them are properly set.BTW, I tried the above test locally, and the
generate_seriesUDTF seems not registered in the PR branch, but it works in the main branch.
@2010YOUY01 @berkaysynnada addressed above in 6501f21.
|
|
||
| /// Parse memory limit from string to number of bytes | ||
| /// e.g. '1.5G', '100M' | ||
| fn parse_memory_limit(&self, limit: &str) -> Result<usize> { |
There was a problem hiding this comment.
Maybe we can open this function to external uses?
|
Thank you @kumarlokesh for addressing my comments. I don't have further suggestions or concern other than @2010YOUY01 shared |
08d038c to
6501f21
Compare
2010YOUY01
left a comment
There was a problem hiding this comment.
Thank you. I plan to leave this PR open for 2 days more before merging, in case anyone has additional comments.
|
|
||
| let mut state = self.state.write(); | ||
| let mut builder = | ||
| RuntimeEnvBuilder::from_runtime_env(state.runtime_env()); |
There was a problem hiding this comment.
This implementation is all cloning Arcs inside RuntimeEnv, I think it shouldn't have any issue.
|
Thanks -- I recommend we merge this after the 47.0.0 release is created (likely tomorrow) |
alamb
left a comment
There was a problem hiding this comment.
This looks really nice -- thank you @kumarlokesh and @2010YOUY01
|
We made |
…-configs-through-sql
|
I merged up from main to rerun CI to be sure all tests pass |
|
🚀 so exciting -- thank you |
…5594) * Set DataFusion runtime configurations through SQL interface * fix clippy warnings * use spill count based tests for checking applied memory limit --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
SET datafusion.runtime.memory_limit = '100M'.Are these changes tested?
Are there any user-facing changes?