locking in Theta sketch buffer aggregator#7938
locking in Theta sketch buffer aggregator#7938AlexanderSaydakov wants to merge 1 commit intoapache:masterfrom
Conversation
himanshug
left a comment
There was a problem hiding this comment.
technically speaking, yes this locking is desired.
realistically speaking, BufferAggregator is not used by anyone while indexing and that is why it hasn't been a problem so far. so it is a problem but not that someone is really gonna encounter for now. At the same time I don't think locking in no/little contention is a major problem, so +1 for adding the locks.
noting some thoughts that popped up in my head....
striping is great but then we create 64 ReadWriteLock object array instead of single object per BufferAggregator and there are num_row_groupings * num_sketch_column of those . it may or may not be a problem, but only data can tell.
Even if BufferAggregator was used in indexing situation , it will be a single-writer-multiple-reader context, so striping might not be that beneficial which makes sense in ConcurrentHashMap due to expected multiple-writer situation. but again, only data can tell :)
| } | ||
|
|
||
| /** | ||
| * see https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/Striped.java#L536-L548 |
There was a problem hiding this comment.
maybe copy the comment instead , as that file on those line numbers might change.
| * see https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/Striped.java#L536-L548 | |
| * This method was written by Doug Lea with assistance from members of JCP JSR-166 Expert Group | |
| * and released to the public domain, as explained at | |
| * http://creativecommons.org/licenses/publicdomain | |
| * | |
| * As of 2010/06/11, this method is identical to the (package private) hash method in OpenJDK 7's | |
| * java.util.HashMap class. |
|
adding to previous comment... I would personally err on the side of choosing less heap usage over unlikely/potential performance increase and that choice would be "no striping" in this case. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
How do we want to proceed (if we do)? |
|
This issue is no longer marked as stale. |
| { | ||
|
|
||
| /** for locking per buffer position (power of 2 to make index computation faster) */ | ||
| private static final int NUM_STRIPES = 64; |
There was a problem hiding this comment.
Please make it at least Runtime.getRuntime().availableProcessors() * 2
| /** for locking per buffer position (power of 2 to make index computation faster) */ | ||
| private static final int NUM_STRIPES = 64; | ||
|
|
||
| public static Striped<ReadWriteLock> getReadWriteLock() |
There was a problem hiding this comment.
It's misleading that a method that returns a new Striped object (e. g. a factory method) starts with "get". It's not a getter.
| * @param position | ||
| * @return index | ||
| */ | ||
| public static int lockIndex(final int position) |
There was a problem hiding this comment.
lockIndex() is poorly linked to getReadWriteLock() and in general this construction is error-prone. I suggest adding a proper wrapper class with a Striped delegate and instance lockIndex() method.
| */ | ||
| public static int lockIndex(final int position) | ||
| { | ||
| return smear(position) % NUM_STRIPES; |
There was a problem hiding this comment.
If NUM_STRIPES will not be a compile-time constant, it's better to use & (NUM_STRIPES - 1) for performance
| } | ||
|
|
||
| /** | ||
| * This method uses locks because it can be used during indexing, |
There was a problem hiding this comment.
Please make proper sentences with punctuation.
|
|
||
| /** | ||
| * This method uses locks because it can be used during indexing, | ||
| * and Druid can call aggregate() and get() concurrently |
| private final TgtHllType tgtHllType; | ||
| private final int size; | ||
| private final IdentityHashMap<ByteBuffer, WritableMemory> memCache = new IdentityHashMap<>(); | ||
| private final IdentityHashMap<ByteBuffer, Int2ObjectMap<HllSketch>> sketchCache = new IdentityHashMap<>(); |
There was a problem hiding this comment.
It seems that this field is not really a "cache". The corresponding field in SketchBufferAggregator is called "unions". Maybe call it "sketches".
| final Lock lock = stripedLock.getAt(StripedLockHelper.lockIndex(position)).writeLock(); | ||
| lock.lock(); | ||
| try { | ||
| final Union union = Union.writableWrap(mem); |
There was a problem hiding this comment.
If this statement can update the memory, please add a comment like // Union.writableWrap(mem) can update the memory, therefore it must be inside the critical section.. Otherwise, please move the statement outside of the critical section and add a comment like // Union.writableWrap(mem) cannot update the memory, therefore it can be outside the critical section.
| final Lock lock = stripedLock.getAt(StripedLockHelper.lockIndex(position)).writeLock(); | ||
| lock.lock(); | ||
| try { | ||
| final ArrayOfDoublesUpdatableSketch sketch = ArrayOfDoublesSketches.wrapUpdatableSketch(region); |
There was a problem hiding this comment.
If this statement can update the memory, please add a comment like // ArrayOfDoublesSketches.wrapUpdatableSketch() can update the memory, therefore it must be inside the critical section.. Otherwise, please move the statement outside of the critical section and add a comment like // ArrayOfDoublesSketches.wrapUpdatableSketch() cannot update the memory, therefore it can be outside the critical section.`.
| final Lock lock = stripedLock.getAt(StripedLockHelper.lockIndex(position)).writeLock(); | ||
| lock.lock(); | ||
| try { | ||
| final ArrayOfDoublesUnion union = ArrayOfDoublesSketches.wrapUnion(region); |
There was a problem hiding this comment.
Same as in ArrayOfDoublesSketchBuildBufferAggregator.aggregate()
|
relooking at it after a while and a bit of fresh rethinking, TBH for now, any locking in |
|
Let us close this request If this functionality is not wanted. |
Added locking to theta buffer aggregator, factored out common locking code.
It seems to me that this locking was missing from the very beginning, but was not quite necessary so far.
As I understand it might become important in the future or is already needed in some particular use cases. This was discussed in several pull requests.
For instance:
#6581
#5002
#5148