Motivation
In auto compaction, the coordinator searches for segments to compact based on their byte size. This algorithm is currently a stateless algorithm. At each coordinator run, it traverses all segments of all datasources from the latest one to the oldest one, compares their size against targetCompactionSizeBytes (this is currently missing which causes #8481), and issues a compaction task if it finds some segments smaller than targetCompactionSizeBytes.
However, only comparing the segment size against targetCompactionSizeBytes is not enough to tell that a given segment requires a further compaction or not because the segment could be created with various types of partitionsSpec and later compacted with one of them. (As of now, auto compaction supports only maxRowsPerSegment, maxTotalRows, and targetCompactionSizeBytes, but it should support all partitionsSpec types in the future.)
As of now, we have 3 partitionsSpec, i.e., DynamicPartitionsSpec, HashedPartitionsSpec, and SingleDimensionPartitionsSpec.
DynamicPartitionsSpec has maxRowsPerSegment, and maxTotalRows.
HashedPartitionsSpec has targetPartitionSize (target number of rows per segment), numShards, and partitionDimensions.
SingleDimensionPartitionsSpec has targetPartitionSize (target number of rows per segment), maxPartitionSize (max number of rows per segment), partitionDimensions.
In the coordinator, most of these configurations are not easy to use to search for segments which need compaction because the number of rows in each segment is not available in the coordinator. And even if we had that in the coordinator, hash or range partitioned segments could have totally different number of rows from targetPartitionSize, which makes hard to tell a given segment needs compaction or not. Note that a segment doesn't need further compaction if it's already compacted with the same partitionsSpec. Auto compaction based on parallel indexing can also make things complicated. In parallel indexing with DynamicPartitionsSpec, the last segment created by each task can have rows less than maxRowsPerSegment.
Proposed changes
To address this issue, I propose to store the state of auto compaction in the metadata store, so that auto compaction can search for compaction candidates from the segments which are not compacted yet.
DataSegment change
A new compactionPartitionsSpec will be added to DataSegment. compactionPartitionsSpec will be filled in only the coordinator.
public class DataSegment
{
private final Integer binaryVersion;
private final SegmentId id;
@Nullable
private final Map<String, Object> loadSpec;
private final List<String> dimensions;
private final List<String> metrics;
private final ShardSpec shardSpec;
@Nullable
private final PartitionsSpec compactionPartitionsSpec;
private final long size;
}
Changes in publishing segments
When a compaction task creates segments, it can add its partitionsSpec to them (Appenderator.add()). Other task types don't add it.
Changes in the coordinator
SQLMetadataSegmentManager loads DataSegment and keeps them in memory same as now. Since PartitionsSpec is not very diverse in general, interning would be useful to reduce memory usage.
DruidCoordinatorSegmentCompactor checks that the compactionPartitionsSpec is available for a given segment. If it's missing, that segment should be a new segment created by a non-compaction task, which means it will be a compaction candidate. If it exists, the coordinator compares the compactionPartitionsSpec with the partitionsSpec in auto compaction configuration. The segment will be a compaction candidate if it has a different partitionsSpec.
Rationale
Dropped alternative
The coordinator can get the number of rows of each segment from the system schema to find the compaction candidates. However, as mentioned in the motivation section, number of rows is not enough to determine that a given segment needs a compaction or not.
Operational impact
There should be no operational impact.
Test plan
- Will add unit tests
- Will test in our internal cluster
Future work
In DataSegment, similar to compactionPartitionsSpec, there are a couple of fields which are loaded only in some particular places; they are null or some default value elsewhere. But, even when they are null or default, they will still use 8 more bytes per segment. I think it would probably be worth to split DataSegment into several classes, so that only necessary fields are loaded to reduce memory usage.
Motivation
In auto compaction, the coordinator searches for segments to compact based on their byte size. This algorithm is currently a stateless algorithm. At each coordinator run, it traverses all segments of all datasources from the latest one to the oldest one, compares their size against
targetCompactionSizeBytes(this is currently missing which causes #8481), and issues a compaction task if it finds some segments smaller thantargetCompactionSizeBytes.However, only comparing the segment size against
targetCompactionSizeBytesis not enough to tell that a given segment requires a further compaction or not because the segment could be created with various types ofpartitionsSpecand later compacted with one of them. (As of now, auto compaction supports onlymaxRowsPerSegment,maxTotalRows, andtargetCompactionSizeBytes, but it should support all partitionsSpec types in the future.)As of now, we have 3
partitionsSpec, i.e.,DynamicPartitionsSpec,HashedPartitionsSpec, andSingleDimensionPartitionsSpec.DynamicPartitionsSpechasmaxRowsPerSegment, andmaxTotalRows.HashedPartitionsSpechastargetPartitionSize(target number of rows per segment),numShards, andpartitionDimensions.SingleDimensionPartitionsSpechastargetPartitionSize(target number of rows per segment),maxPartitionSize(max number of rows per segment),partitionDimensions.In the coordinator, most of these configurations are not easy to use to search for segments which need compaction because the number of rows in each segment is not available in the coordinator. And even if we had that in the coordinator, hash or range partitioned segments could have totally different number of rows from
targetPartitionSize, which makes hard to tell a given segment needs compaction or not. Note that a segment doesn't need further compaction if it's already compacted with the samepartitionsSpec. Auto compaction based on parallel indexing can also make things complicated. In parallel indexing withDynamicPartitionsSpec, the last segment created by each task can have rows less thanmaxRowsPerSegment.Proposed changes
To address this issue, I propose to store the state of auto compaction in the metadata store, so that auto compaction can search for compaction candidates from the segments which are not compacted yet.
DataSegment change
A new
compactionPartitionsSpecwill be added toDataSegment.compactionPartitionsSpecwill be filled in only the coordinator.Changes in publishing segments
When a compaction task creates segments, it can add its
partitionsSpecto them (Appenderator.add()). Other task types don't add it.Changes in the coordinator
SQLMetadataSegmentManagerloadsDataSegmentand keeps them in memory same as now. SincePartitionsSpecis not very diverse in general, interning would be useful to reduce memory usage.DruidCoordinatorSegmentCompactorchecks that thecompactionPartitionsSpecis available for a given segment. If it's missing, that segment should be a new segment created by a non-compaction task, which means it will be a compaction candidate. If it exists, the coordinator compares thecompactionPartitionsSpecwith thepartitionsSpecin auto compaction configuration. The segment will be a compaction candidate if it has a differentpartitionsSpec.Rationale
Dropped alternative
The coordinator can get the number of rows of each segment from the system schema to find the compaction candidates. However, as mentioned in the motivation section, number of rows is not enough to determine that a given segment needs a compaction or not.
Operational impact
There should be no operational impact.
Test plan
Future work
In
DataSegment, similar tocompactionPartitionsSpec, there are a couple of fields which are loaded only in some particular places; they are null or some default value elsewhere. But, even when they are null or default, they will still use 8 more bytes per segment. I think it would probably be worth to splitDataSegmentinto several classes, so that only necessary fields are loaded to reduce memory usage.