Conversation
Rename partition spec fields to be consistent across the various types (hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment and maxRowsPerSegment in favor of targetPartitionSize and maxSegmentSize. Consistent and clearer names are easier for users to understand and use. Also fix various IntelliJ inspection warnings and doc spelling mistakes.
| !PartitionsSpec.isEffectivelyNull(targetPartitionSize) || !PartitionsSpec.isEffectivelyNull(maxRowsPerSegment), | ||
| "Either targetPartitionSize or maxRowsPerSegment must be specified" | ||
| (target.value == null) != (max.value == null), | ||
| "Exactly one of " + target.name + " or " + max.name + " must be present" |
There was a problem hiding this comment.
You can also use StringUtils here, like
StringUtils.format("Exactly one of %s or %s must be present", target.name, max.name)
There was a problem hiding this comment.
What is the advantage of using StringUtils.format()? This string does not need anything locale-specific.
There was a problem hiding this comment.
That's the style i have seen mostly used in codebase and looks better in my opinion.
| |maxPartitionSize|Maximum number of rows to include in a partition. Defaults to 50% larger than the targetPartitionSize.|no| | ||
| |targetRowsPerSegment|Target number of rows to include in a partition, should be a number that targets segments of 500MB\~1GB.|yes| | ||
| |targetPartitionSize|Deprecated. Use `targetRowsPerSegment` instead. Target number of rows to include in a partition, should be a number that targets segments of 500MB\~1GB.|no| | ||
| |maxRowsPerSegment|Maximum number of rows to include in a partition. Defaults to 50% larger than the `targetPartitionSize`.|no| |
There was a problem hiding this comment.
I think the docs here could be more clear that the *PartitionSize and *RowsPerSegment parameters are equivalent, just with different names
There was a problem hiding this comment.
I'll explicitly call out the rename/equivalence.
| public HashedPartitionsSpec( | ||
| @JsonProperty("targetPartitionSize") @Deprecated @Nullable Integer targetPartitionSize, | ||
| @JsonProperty("maxRowsPerSegment") @Nullable Integer maxRowsPerSegment, | ||
| @JsonProperty(PartitionsSpec.MAX_ROWS_PER_SEGMENT) @Nullable Integer maxRowsPerSegment, |
There was a problem hiding this comment.
For consistency with the single dim spec, it seems like it would be better to call this targetRowsPerSegment to match targetPartitionSize, but that would mean deprecating yet another property
There was a problem hiding this comment.
hm, or it may be better to have a different rename for maxPartitionSize in the single dim spec, since targetPartitionSIze and maxRowsPerSegment were equivalent before
There was a problem hiding this comment.
You mean to rename "targetPartitionSize" in HashedPartitionsSpec to "targetRowsPerSegment"? My understanding is that hashed partitions do not have a "target" size as it is really a "max" size (which is why the former is deprecated).
@jihoonson described to me that when #8141 did some refactoring, the behavior of "targetPartitionSize" in SingleDimensionPartitionsSpec was wrongly made to match that of "maxRowsPerSegment". This PR separates the two so that single dimension partitioning can be made to honor a "target" size.
There was a problem hiding this comment.
Ah, got it, thanks for clarifying
There was a problem hiding this comment.
You mean to rename "targetPartitionSize" in HashedPartitionsSpec to "targetRowsPerSegment"? My understanding is that hashed partitions do not have a "target" size as it is really a "max" size (which is why the former is deprecated).
Ah sorry, probably there was some misunderstanding. targetPartitionSize i nHashedPartitionsSpec (which I deprecated in favor of maxRowsPerSegment) is actually the target number of rows per segment. You can see how to compute numShards based on targetPartitionSize here. So, IMO, targetRowsPerSegment makes most sense for it too.
There was a problem hiding this comment.
Got it. I'll update HashedPartitionsSpec to deprecate maxRowsPerSegment and to add targetRowsPerSegment.
Description
Rename partition spec fields to be consistent across the various types (hashed, single_dim, dynamic). Specifically, use
targetNumRowsPerSegmentandmaxRowsPerSegmentinstead oftargetPartitionSizeandmaxSegmentSize. Consistent and clearer names are easier for users to understand and use.Also fix various IntelliJ inspection warnings and doc spelling mistakes.
This PR has: