refactor: CascadingReindexingTemplate Refactoring#19106
refactor: CascadingReindexingTemplate Refactoring#19106capistrant merged 10 commits intoapache:masterfrom
Conversation
| .maxRowsInMemory(getMaxRowsInMemory()) | ||
| .appendableIndexSpec(getAppendableIndexSpec()) | ||
| .maxBytesInMemory(getMaxBytesInMemory()) | ||
| .maxTotalRows(getMaxTotalRows()) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
| this.defaultPartitioningRule = ReindexingPartitioningRule.syntheticRule( | ||
| defaultSegmentGranularity, | ||
| defaultPartitionsSpec, | ||
| defaultPartitioningVirtualColumns | ||
| ); |
There was a problem hiding this comment.
should this check if it is set on tuningConfig? or forbid defining it on the tuningConfig?
There was a problem hiding this comment.
by it do you mean partitionsSpec? I had planned on documenting it as being blown away by whatever your
- applicable rule defines
- default top level partitions spec is if no rule applies
alternatives to the above:
- force a tuningConfig to be defined top level with at least a partitions spec and use that as the default
- instead of silently blowing away the static tuningConfig partitions spec, throw an error at submit time saying you can't define it there if someone tries
There was a problem hiding this comment.
by it do you mean partitionsSpec?
yes
I had planned on documenting it as being blown away ...
it is probably fine if we document that whatever you set there is ignored. Maybe perhaps a log.warn or something if it is set on the tuningconfig but a defaultPartitionsSpec nor rules are set?
There was a problem hiding this comment.
after reflecting, I think failing with clear error on trying to use unsupported field in the static tuningConfig is ok and right way to head off operator confusion
| List<VirtualColumn> allVCs = new ArrayList<>(deletionVCs); | ||
|
|
||
| if (partitioningVCs != null && !partitioningVCs.isEmpty()) { | ||
| // Check for name collisions |
There was a problem hiding this comment.
VirtualColumns.create does a check like this too, though i guess you lose your more specific exception messaging here. I suppose you could catch the IAE thrown by that and make the nicer message that way, or leave a comment about why validation is being done externally here
There was a problem hiding this comment.
huh, I hadn't realized that create could handle this for me. I like delegating to that and enriching the message if thrown
There was a problem hiding this comment.
lgtm 👍
I have some minor concern that there might be a lot of duplication managing virtual columns separately for partitioning and deletion in cases where there is deletion on the same columns used for partitioning, but i'm ok if we see how this works in practice and if we need to revisit it because of lots of redundancy or difficulty in managing things then we can consider ways to consolidate the definitions
| this.defaultPartitioningRule = ReindexingPartitioningRule.syntheticRule( | ||
| defaultSegmentGranularity, | ||
| defaultPartitionsSpec, | ||
| defaultPartitioningVirtualColumns | ||
| ); |
There was a problem hiding this comment.
by it do you mean partitionsSpec?
yes
I had planned on documenting it as being blown away ...
it is probably fine if we document that whatever you set there is ignored. Maybe perhaps a log.warn or something if it is set on the tuningconfig but a defaultPartitionsSpec nor rules are set?
…titionsSpec in tuning config is submitted
Description
relates to #19092
After taking the initial implementation of the CascadingReindexingTemplate and associated RuleProvider concepts for a spin, some refactoring opportunities have been identified. I hope that these will make more conceptual sense to operators trying to configure this as well as make things generally easier to reason about.
Rule Type Refactoring
ReindexingSegmentGranularityRuleandReindexingTuningConfigRulehave been deleted.ReindexingPartitioningRule
This new rule type consolidates partitioning details into a single rule. This rule takes the place of
ReindexingSegmentGranularityRulein that it has special consideration for creating the base timeline of reindexing search intervals and that all search intervals created align to natural Segment Granularity boundaries being applied.The components of this rule are:
ReindexingIndexSpecRule
A new rule type that is scoped to the
IndexSpecthat goes in a task tuning config.CascadingReindexingTemplatenew top level fieldsI have considered combining the below 2 + defaultSegmentGranularity into a defaultPartitioningSomething top level field but did not do this as of now.
defaultPartitionsSpecRequired partitionsspec that will be applied in cases where a synthetic
ReindexingPartitioningRuleneeds to be generated to complete the reindexing timelinedefaultPartitioningVirtualColumnsOptional default partitioning virtual column expressions that would be used if your default partitionspec utilized the new functionality in #19061
tuningConfigCascadingReindexingTemplatenow has an optional top leveltuningConfigfield where an operator can provide aUserCompactionTaskQueryTuningConfigthat will apply to all tasks generated by the supervisor. Note that any partitions spec here is useless and will be blown away by either the defaultPartitionsSpec or the applicable ReindexingPartitioningRule's partition spec. IndexSpec however can take a default here that will apply in any case where there is not an IndexSpecRule that applies to the task being generatedRelease note
N/A for now but we will have a larger release note I can share encompassing all of the new reindexing rule provider stuff for druid 37.
-->
Key changed/added classes in this PR
CascadingReindexingTemplateReindexingConfigBuilderReindexingDeletionRuleOptimizerReindexingPartitioningRuleReindexingIndexSpecRuleThis PR has: