-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Structify ConfigOptions (#4517) #4771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| fn enable_page_index(&self, config_options: &ConfigOptions) -> bool { | ||
| self.enable_page_index | ||
| .or_else(|| config_options.get_bool(OPT_PARQUET_ENABLE_PAGE_INDEX)) | ||
| // default to false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer have implicit defaults all over the place 😄
| .finish() | ||
| /// A type-safe container for [`ConfigExtension`] | ||
| #[derive(Debug, Default, Clone)] | ||
| pub struct Extensions(BTreeMap<&'static str, ExtensionBox>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formulation is specifically designed to be able to replace the AnyMap currently on SessionConfig. As an added bonus it no longer requires types to be interior-mutable
| pub round_robin_repartition: bool, default = true | ||
|
|
||
| /// Parquet options | ||
| pub parquet: ParquetOptions, default = Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nesting is a source of non-trivial additional complexity, but I made it work so at least we have support for this in the future.
| } | ||
| } | ||
|
|
||
| impl From<ConfigOptions> for SessionConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lays the ground-word for eventually removing SessionConfig altogether #3887
| self.config_options().get(variable) | ||
| // TOOD: Move ConfigOptions into common crate | ||
| match variable { | ||
| "datafusion.execution.time_zone" => self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nasty hack, I hope to move ConfigOptions into core in a subsequent PR so that it can be used directly instead of via ContextProvider
| ---- | ||
| datafusion.catalog.create_default_catalog_and_schema true | ||
| datafusion.catalog.format NULL | ||
| datafusion.catalog.has_header false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option was missing from the definitions
| datafusion.catalog.has_header false | ||
| datafusion.catalog.information_schema true | ||
| datafusion.catalog.location NULL | ||
| datafusion.catalog.type NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type is a reserved keyword in rust, and the name was pretty overloaded anyway, so I opted to rename this to format. Amusingly this was already what the local variables were named as
| let config = SessionConfig::new() | ||
| .with_batch_size(TEST_BATCH_SIZE) | ||
| .set_u64( | ||
| "datafusion.execution.coalesce_target_batch_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed in #4757 the new config returns an error for invalid keys and so caught this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two were added in #4757. They're no longer necessary and so I just opted to remove them
| }; | ||
| let mut config = ConfigOptions::new(); | ||
| for (k, v) in task_props { | ||
| let _ = config.set(&k, &v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows deserializing from a HashMap, as an added bonus this will now work for all keys and not just a specific subset
Longer-term we probably want to change this signature to accept ConfigOptions directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to #4758
| self.set(key, ScalarValue::Utf8(Some(value.into()))) | ||
| } | ||
| /// Returns the [`ConfigEntry`] stored within this [`ConfigOptions`] | ||
| pub fn entries(&self) -> Vec<ConfigEntry> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the mechanism by which ConfigOptions can be both serialized and documentation generated
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how the options are now typed / type checked at compile time rather than runtime.
benchmarks/src/bin/h2o.rs
Outdated
| let path = opt.path.to_str().unwrap(); | ||
| let config = SessionConfig::from_env().with_batch_size(65535); | ||
| let mut config = ConfigOptions::from_env()?; | ||
| config.built_in.execution.batch_size = 65535; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kind of cool that it is a struct format that is checked by the compiler
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it
The created structure works well in my editor.
Thank you @tustvold . This is the one missing
It think we should leave this open at least a full work day after approval (aka don't merge until next Monday)
| default_schema: String, | ||
| /// Configuration options | ||
| config_options: ConfigOptions, | ||
| options: ConfigOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 for the change to the more overloaded options name, even though I realize it is more concise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fustfmt was being obtuse and formatting things over 7 lines and this made it happy. SessionConfig isn't long for this world so I think it is fine
| /// replaced by [`config_options`]. | ||
| /// | ||
| /// [`config_options`]: SessionContext::config_option | ||
| pub fn to_props(&self) -> HashMap<String, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be related to #4758 from @yahoNanJing
| }; | ||
| let mut config = ConfigOptions::new(); | ||
| for (k, v) in task_props { | ||
| let _ = config.set(&k, &v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to #4758
| | datafusion.optimizer.repartition_windows | Boolean | true | Should DataFusion repartition data using the partitions keys to execute window functions in parallel using the provided `target_partitions` level | | ||
| | datafusion.optimizer.skip_failed_rules | Boolean | true | When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail. | | ||
| | datafusion.optimizer.top_down_join_key_reordering | Boolean | true | When set to true, the physical plan optimizer will run a top down process to reorder the join keys. Defaults to true | | ||
| | key | default | description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The major difference here seems to be the type column is no longer present. Was that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the fields are now Rust typed not datatype. I could include this, but it seemed unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably clear from the names if something is boolean or numeric. I don't feel strongly
| use std::collections::BTreeMap; | ||
| use std::fmt::Display; | ||
|
|
||
| macro_rules! config_namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I think some introduction doc comments here would help -- specifically some examples of what this macro is creating would be nice
|
I plan to merge this tomorrow morning (GMT) unless anyone needs more time to review |
|
Benchmark runs are scheduled for baseline = 6c81c10 and contender = 21169b9. 21169b9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |

Which issue does this PR close?
Closes #4517
Closes #3500
Closes #4752
Relates to #3887
Rationale for this change
The current approach is:
What changes are included in this PR?
This reworks ConfigOptions to instead be a struct, using a declarative macro to auto-generate serialization logic
Are these changes tested?
Are there any user-facing changes?
Yes