locking in Theta sketch buffer aggregator by AlexanderSaydakov · Pull Request #7938 · apache/druid

AlexanderSaydakov · 2019-06-21T01:21:21Z

Added locking to theta buffer aggregator, factored out common locking code.

It seems to me that this locking was missing from the very beginning, but was not quite necessary so far.
As I understand it might become important in the future or is already needed in some particular use cases. This was discussed in several pull requests.
For instance:
#6581
#5002
#5148

… code

himanshug

technically speaking, yes this locking is desired.
realistically speaking, BufferAggregator is not used by anyone while indexing and that is why it hasn't been a problem so far. so it is a problem but not that someone is really gonna encounter for now. At the same time I don't think locking in no/little contention is a major problem, so +1 for adding the locks.

noting some thoughts that popped up in my head....
striping is great but then we create 64 ReadWriteLock object array instead of single object per BufferAggregator and there are num_row_groupings * num_sketch_column of those . it may or may not be a problem, but only data can tell.
Even if BufferAggregator was used in indexing situation , it will be a single-writer-multiple-reader context, so striping might not be that beneficial which makes sense in ConcurrentHashMap due to expected multiple-writer situation. but again, only data can tell :)

himanshug · 2019-06-21T07:02:57Z

+  }
+
+  /**
+   * see https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/Striped.java#L536-L548


maybe copy the comment instead , as that file on those line numbers might change.

Suggested change

* see https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/Striped.java#L536-L548

* This method was written by Doug Lea with assistance from members of JCP JSR-166 Expert Group

* and released to the public domain, as explained at

* http://creativecommons.org/licenses/publicdomain

*

* As of 2010/06/11, this method is identical to the (package private) hash method in OpenJDK 7's

* java.util.HashMap class.

himanshug · 2019-06-21T07:31:43Z

adding to previous comment... I would personally err on the side of choosing less heap usage over unlikely/potential performance increase and that choice would be "no striping" in this case.

stale · 2019-08-20T08:11:22Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

AlexanderSaydakov · 2019-08-27T00:51:22Z

How do we want to proceed (if we do)?

stale · 2019-08-27T00:51:24Z

This issue is no longer marked as stale.

leventov · 2019-08-27T09:22:19Z

+{
+
+  /** for locking per buffer position (power of 2 to make index computation faster) */
+  private static final int NUM_STRIPES = 64;


Please make it at least Runtime.getRuntime().availableProcessors() * 2

leventov · 2019-08-27T09:23:51Z

+  /** for locking per buffer position (power of 2 to make index computation faster) */
+  private static final int NUM_STRIPES = 64;
+
+  public static Striped<ReadWriteLock> getReadWriteLock()


It's misleading that a method that returns a new Striped object (e. g. a factory method) starts with "get". It's not a getter.

leventov · 2019-08-27T09:25:24Z

+   * @param position
+   * @return index
+   */
+  public static int lockIndex(final int position)


lockIndex() is poorly linked to getReadWriteLock() and in general this construction is error-prone. I suggest adding a proper wrapper class with a Striped delegate and instance lockIndex() method.

leventov · 2019-08-27T09:26:15Z

+   */
+  public static int lockIndex(final int position)
+  {
+    return smear(position) % NUM_STRIPES;


If NUM_STRIPES will not be a compile-time constant, it's better to use & (NUM_STRIPES - 1) for performance

leventov · 2019-08-27T09:28:18Z

  }

+  /**
+   * This method uses locks because it can be used during indexing,


Please make proper sentences with punctuation.

leventov · 2019-08-27T09:54:02Z


+  /**
+   * This method uses locks because it can be used during indexing,
+   * and Druid can call aggregate() and get() concurrently


Please make {@link #aggregate} a Javadoc link

leventov · 2019-08-27T09:57:31Z

  private final TgtHllType tgtHllType;
  private final int size;
  private final IdentityHashMap<ByteBuffer, WritableMemory> memCache = new IdentityHashMap<>();
  private final IdentityHashMap<ByteBuffer, Int2ObjectMap<HllSketch>> sketchCache = new IdentityHashMap<>();


It seems that this field is not really a "cache". The corresponding field in SketchBufferAggregator is called "unions". Maybe call it "sketches".

leventov · 2019-08-27T10:03:32Z

+    final Lock lock = stripedLock.getAt(StripedLockHelper.lockIndex(position)).writeLock();
    lock.lock();
    try {
      final Union union = Union.writableWrap(mem);


If this statement can update the memory, please add a comment like // Union.writableWrap(mem) can update the memory, therefore it must be inside the critical section.. Otherwise, please move the statement outside of the critical section and add a comment like // Union.writableWrap(mem) cannot update the memory, therefore it can be outside the critical section.

leventov · 2019-08-27T10:05:51Z

+    final Lock lock = stripedLock.getAt(StripedLockHelper.lockIndex(position)).writeLock();
    lock.lock();
    try {
      final ArrayOfDoublesUpdatableSketch sketch = ArrayOfDoublesSketches.wrapUpdatableSketch(region);


If this statement can update the memory, please add a comment like // ArrayOfDoublesSketches.wrapUpdatableSketch() can update the memory, therefore it must be inside the critical section.. Otherwise, please move the statement outside of the critical section and add a comment like // ArrayOfDoublesSketches.wrapUpdatableSketch() cannot update the memory, therefore it can be outside the critical section.`.

leventov · 2019-08-27T10:06:35Z

+    final Lock lock = stripedLock.getAt(StripedLockHelper.lockIndex(position)).writeLock();
    lock.lock();
    try {
      final ArrayOfDoublesUnion union = ArrayOfDoublesSketches.wrapUnion(region);


Same as in ArrayOfDoublesSketchBuildBufferAggregator.aggregate()

himanshug · 2019-08-27T17:20:20Z

relooking at it after a while and a bit of fresh rethinking, TBH for now, any locking in BufferAggregator can only slow things down without any benefit. None of the other BufferAggregators have any locking and I guess we could/would fix all of them when that need comes.

AlexanderSaydakov · 2019-08-28T18:44:54Z

Let us close this request If this functionality is not wanted.

added locking to theta buffer aggregator, factored out common locking…

3f861a0

… code

himanshug reviewed Jun 21, 2019

View reviewed changes

stale Bot added the stale label Aug 20, 2019

stale Bot removed the stale label Aug 27, 2019

leventov requested changes Aug 27, 2019

View reviewed changes

AlexanderSaydakov closed this Aug 28, 2019

-   * see https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/Striped.java#L536-L548
+   * This method was written by Doug Lea with assistance from members of JCP JSR-166 Expert Group
+   * and released to the public domain, as explained at
+   * http://creativecommons.org/licenses/publicdomain
+   *
+   * As of 2010/06/11, this method is identical to the (package private) hash method in OpenJDK 7's
+   * java.util.HashMap class.

Conversation

AlexanderSaydakov commented Jun 21, 2019

Uh oh!

himanshug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented Jun 21, 2019

Uh oh!

stale Bot commented Aug 20, 2019

Uh oh!

AlexanderSaydakov commented Aug 27, 2019

Uh oh!

stale Bot commented Aug 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented Aug 27, 2019

Uh oh!

AlexanderSaydakov commented Aug 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants