You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From IncrementalIndex generally overestimates theta sketch size #6743
"Theta sketches have a very large max size by default, relative to typical row sizes (about 250KB with "size" set to the default of 16384). The ingestion-time row size estimator (getMaxBytesPerRowForAggregators in OnheapIncrementalIndex) uses this figure to estimate row sizes when theta sketches are used at ingestion time, leading to way more spills than is reasonable. It would be better to use an estimate based more on actual current size. I'm not sure how to get this, though."
From [Proposal] Resizable Buffer in BufferAggregator #2963
"In case of complex aggregators like thetaSketch, more than 80% of the time sketches don't grow to full capacity but query processing still reserves the full max size. The idea is to support use of a resizable aggregator by BufferAggregator so that maximum space is not reserved but BufferAggregator should be able to re-allocate the buffer on demand."
Some sketches e.g. doublesSketch do not have an upper bound on size and can't provide a correct number for AggregatorFactory.getMaxIntermediateSize() . Current workaround, they use, is to use some number and then fall back to on-heap objects if sketch grows bigger than that.
Proposed changes
Add following methods to AggregatorFactory
/**
* Does BufferAggregator support handling of varying ByteBuffer sizes by overriding
* {@link BufferAggregator#aggregate(ByteBuffer, int, int)}
* @return
*/
public boolean isDynamicallyResizable()
{
return getMinIntermediateSize() < getMaxIntermediateSize();
}
/**
* Start size of ByteBuffer to be used with BufferAggregator.
* @return
*/
public int getMinIntermediateSize()
{
return getMaxIntermediateSize();
}
Add following methods to BufferAggregator
/**
* Returns false if aggregation requires a bigger buffer than capacity arg or true.
* Return status must be used exclusively to signal "low memory in buffer" condition and
* nothing else.
*/
default boolean aggregate(ByteBuffer buff, int position, int capacity)
{
aggregate(buff, position);
return true;
}
// Following methods are equivalent of old methods with same name except they provide access to capacity
// of ByteBuffer which is assumed to be AggregatorFactory.getMaxIntermediateSize() by older methods.
default void init(ByteBuffer buff, int position, int capacity)
{
init(buff, position);
}
default void relocate(ByteBuffer oldBuff, int oldPosition, int oldCapacity, ByteBuffer newwBuff, int newwPosition, int newwCapacity)
{
relocate(oldPosition, newwPosition, oldBuff, newwBuff);
}
default Object get(ByteBuffer buff, int position, int capacity)
{
return get(buff, position);
}
default float getFloat(ByteBuffer buff, int position, int capacity)
{
return getFloat(buff, position);
}
default double getDouble(ByteBuffer buff, int position, int capacity)
{
return getDouble(buff, position);
}
default long getLong(ByteBuffer buff, int position, int capacity)
{
return getLong(buff, position);
}
default boolean isNull(ByteBuffer buff, int position, int capacity)
{
return isNull(buff, position);
}
Update all of Druid core code to remove usage of Aggregator and use BufferAggregator in all places using the newly introduced methods. For example, see the changes made to OnheapIncrementalIndex in #8127 . Once that is done in all places, Aggregator interface and AggregatorFactory.factorize(ColumnSelectorFactory) and AggregatorFactory.getMaxIntermediateSize() can be removed.
At least BufferAggregator that work on top of sketches of variable sizes should be updated to implement newly introduced methods. In those extensions, AggregatorFactory.getMinIntermediateSize() should return a value that can be overridden by user per query/indexing to allow fine tuning.
Following interfaces and classes are introduced to aid usage of new methods in aggregator extension.
public interface MemoryAllocator
{
BufferHolder allocate(int capacity);
void free(BufferHolder bh);
}
public interface BufferHolder
{
int position();
int capacity();
ByteBuffer get();
}
public class SimpleOnHeapMemoryAllocator implements MemoryAllocator
{
@Override
public BufferHolder allocate(int capacity)
{
return new SimplerBufferHolder(ByteBuffer.allocate(capacity));
}
@Override
public void free(BufferHolder ignored)
{
}
// This would be used in OnHeapIncrementalIndex
private static class SimplerBufferHolder implements BufferHolder
{
private final ByteBuffer bb;
public SimplerBufferHolder(ByteBuffer bb)
{
this.bb = bb;
}
@Override
public int position()
{
return 0;
}
@Override
public int capacity()
{
return bb.capacity();
}
@Override
public ByteBuffer get()
{
return bb;
}
}
}
I think MemoryAllocator interface might change a bit when we write code to use it in off-heap use cases in querying.
With these changes we can possibly use off-heap memory for aggregators in IncrementalIndex too.
This would also enable removal of v1 groupBy implementation that is kept around due to its usage of Aggregator that allowed onheap growable sketches.
Motivation
From IncrementalIndex generally overestimates theta sketch size #6743
"Theta sketches have a very large max size by default, relative to typical row sizes (about 250KB with "size" set to the default of 16384). The ingestion-time row size estimator (getMaxBytesPerRowForAggregators in OnheapIncrementalIndex) uses this figure to estimate row sizes when theta sketches are used at ingestion time, leading to way more spills than is reasonable. It would be better to use an estimate based more on actual current size. I'm not sure how to get this, though."
From [Proposal] Resizable Buffer in BufferAggregator #2963
"In case of complex aggregators like thetaSketch, more than 80% of the time sketches don't grow to full capacity but query processing still reserves the full max size. The idea is to support use of a resizable aggregator by BufferAggregator so that maximum space is not reserved but BufferAggregator should be able to re-allocate the buffer on demand."
Some sketches e.g.
doublesSketchdo not have an upper bound on size and can't provide a correct number forAggregatorFactory.getMaxIntermediateSize(). Current workaround, they use, is to use some number and then fall back to on-heap objects if sketch grows bigger than that.Proposed changes
Add following methods to
AggregatorFactoryAdd following methods to
BufferAggregatorUpdate all of Druid core code to remove usage of
Aggregatorand useBufferAggregatorin all places using the newly introduced methods. For example, see the changes made toOnheapIncrementalIndexin #8127 . Once that is done in all places,Aggregatorinterface andAggregatorFactory.factorize(ColumnSelectorFactory)andAggregatorFactory.getMaxIntermediateSize()can be removed.At least
BufferAggregatorthat work on top of sketches of variable sizes should be updated to implement newly introduced methods. In those extensions,AggregatorFactory.getMinIntermediateSize()should return a value that can be overridden by user per query/indexing to allow fine tuning.Following interfaces and classes are introduced to aid usage of new methods in aggregator extension.
Rationale
#2963
Operational impact
None
Test plan (optional)
Unit Tests should cover the changes.
Future Work
I think
MemoryAllocatorinterface might change a bit when we write code to use it in off-heap use cases in querying.With these changes we can possibly use off-heap memory for aggregators in IncrementalIndex too.
This would also enable removal of v1 groupBy implementation that is kept around due to its usage of
Aggregatorthat allowed onheap growable sketches.