'maxBytesInMemory' tuningConfig introduced for ingestion tasks by surekhasaharan · Pull Request #5583 · apache/druid

surekhasaharan · 2018-04-05T20:44:41Z

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing. If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

The default value is 1/6(Runtime.maxMemory())
To maintain the current behaviour set 'maxBytesInMemory' to -1
If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist

…for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist

jihoonson

@surekhasaharan thanks for the nice work! Please consider my comments. Also would you add a doc for the new cool configuration?

jihoonson · 2018-04-05T22:31:24Z

    KafkaTuningConfig that = (KafkaTuningConfig) o;
    return maxRowsInMemory == that.maxRowsInMemory &&
           maxRowsPerSegment == that.maxRowsPerSegment &&
+           maxBytesInMemory == this.maxBytesInMemory &&


Should be that.maxBytesInMemory.

@surekhasaharan Also, these methods can be generated automatically; try cmd-N in IntelliJ -> Generate -> equals/hashCode.

my bad, fixed now.

@gianm nice tip. I didn't know that.

jihoonson · 2018-04-05T22:31:50Z

    return "KafkaTuningConfig{" +
           "maxRowsInMemory=" + maxRowsInMemory +
           ", maxRowsPerSegment=" + maxRowsPerSegment +
+           ",maxBytesInMemory=" + maxBytesInMemory +


nit: ", maxBytesInMemory=" as other variables.

This one can also be generated automatically.

hmm, for every code change, i did reformat code according to druid_intelliJ_formatting profile, but seems that did not correct the format.

I was pointing out a different thing from the code formatting: the formatting is just about adjusting whitespace and such, not editing the actual logic of the toString method. Generating is about generating the actual logic: you can trigger that by deleting the toString method and then generating a new one (cmd-N => Generate)

ah ok, will use that in future.

jihoonson · 2018-04-06T00:30:04Z

  {
    final KafkaTuningConfig tuningConfig = new KafkaTuningConfig(
-        1000,
+        1000, null,


nit: Druid format convention is like

1000, null,

again reformat does not seem to fix this in intelliJ, also should this not be flagged by 'mvn clean -Pstrict -pl '!benchmarks' compile test-compile -B' as a style error ? Anyways will fix.

This is something the formatter probably should do, but it's not perfect. And also, we don't have a checkstyle rule for it. Ideally we should have such a rule - it'd be another good contribution!

Unfortunately, our code format profile doesn't handle every code convention. I usually first do reformat code of Intellij, and check my code format is different from that of other codes.

got it, will remember this and look more closely at the rest of code to get the convention followed.

jihoonson · 2018-04-06T00:30:13Z

  {
    final KafkaTuningConfig tuningConfig = new KafkaTuningConfig(
-        1000,
+        1000, null,


jihoonson · 2018-04-06T00:30:25Z

  {
    KafkaTuningConfig original = new KafkaTuningConfig(
-        1,
+        1, null,


jihoonson · 2018-04-06T00:50:11Z

+    {
+      //timestamp + dims length + dimensionDescsList shared pointer
+      long sizeInBytes = Long.BYTES + Integer.BYTES * dims.length + Long.BYTES + Long.BYTES;
+      sizeInBytes += dimsKeySize;


Would you elaborate more on how sizeInBytes is calculated?

I attempted to add more docs, hope it will make it clear.

jihoonson · 2018-04-06T00:53:22Z

+                                               (sum, aggregator) -> sum += aggregator.getMaxIntermediateSize(),
+                                               (sum1, sum2) -> sum1 + sum2
+                                           );
+    return maxAggregatorIntermediateSize;


Would you elaborate more on this? Looks like the actual size might be bigger than the calculated one.

@jihoonson how could it be bigger, if the calculated size is the sum of max intermediate sizes?

(Putting aside the wrinkle that the max intermediate size is for the BufferAggregator, but IncrementalIndex actually uses the Aggregator. I think that should generally be OK since it would be strange for the Aggregator to take up a lot more space than the BufferAggregator.)

IIRC, max intermediate size represents only the size of intermediate aggregate. However, it doesn't mean that an aggregator uses only that amount of memory. For example, LongSumAggregator keeps two variables like below.

private final BaseLongColumnValueSelector selector; private long sum;

Ah I see. Yeah, it is definitely inexact in that regard. There are probably a few other overheads we're missing. I think it's safe to assume that every aggregator will have enough overhead for its own object header, and for a pointer to a selector. We could add a factor for that. If you believe this page, it's 128 bits per object: https://gist.github.com/arturmkrtchyan/43d6135e8a15798cc46c

It is a bit unsatisfying how inexact the memory usage approximations are, but I am hopeful they will be good enough to make the system run better out of the box, and that's what matters.

So, should I add 16 bytes per aggregator ?

jihoonson · 2018-04-06T01:01:08Z

      outOfRowsReason = StringUtils.format("Maximum number of rows [%d] reached", maxRowCount);
    }
+    if (!sizeCheck) {
+      outOfRowsReason = StringUtils.format("Maximum size in bytes [%d] reached", maxBytesInMemory);


This might replace outOfRowsReason. It should be checked that both countCheck and sizeCheck are false.

Also, please add a null check for outOfRowsReason before allocating it and throw an exception if it's not null.

How about reporting both reasons in the case where both checks trip at the same time?

I think this is same with my first comment.

yeah this was fuzzy and incomplete, fixed it.

jihoonson · 2018-04-06T01:03:53Z

  }

+  @VisibleForTesting
+  long getRowSizeInMemory(SegmentIdentifier identifier)


Would you rename this method to a more intuitive one? Looks like it returns a row size.

Probably getBytesInMemory?

jihoonson · 2018-04-06T01:04:30Z

        .setIndexSchema(indexSchema)
        .setReportParseExceptions(reportParseExceptions)
-        .setMaxRowCount(maxRowsInMemory)
+        .setMaxRowCount(maxRowsInMemory).setMaxBytesInMemory(maxBytesInMemory)


Please break the line before .setMaxBytesInMemory().

gianm · 2018-04-06T05:17:05Z

    KafkaTuningConfig that = (KafkaTuningConfig) o;
    return maxRowsInMemory == that.maxRowsInMemory &&
           maxRowsPerSegment == that.maxRowsPerSegment &&
+           maxBytesInMemory == this.maxBytesInMemory &&


@surekhasaharan Also, these methods can be generated automatically; try cmd-N in IntelliJ -> Generate -> equals/hashCode.

gianm · 2018-04-06T05:17:23Z

    return "KafkaTuningConfig{" +
           "maxRowsInMemory=" + maxRowsInMemory +
           ", maxRowsPerSegment=" + maxRowsPerSegment +
+           ",maxBytesInMemory=" + maxBytesInMemory +


This one can also be generated automatically.

gianm · 2018-04-06T05:21:06Z

@@ -39,6 +39,7 @@ public class RealtimeAppenderatorTuningConfig implements TuningConfig, Appendera
 {
  private static final int defaultMaxRowsInMemory = 75000;


Hmm, maybe we should make the default max rows in memory higher now? I feel like we should, since that way with the default settings, maxBytesInMemory is going to be the one that takes effect.

How about 1000000?

Yeah, we would need to change the default maxRows, just wasn't sure, if I should do this with current patch. Will change.

gianm · 2018-04-06T05:26:53Z

   * Setting deserializeComplexMetrics to false is necessary for intermediate aggregation such as groupBy that
   * should not deserialize input columns using ComplexMetricSerde for aggregators that return complex metrics.
-   *
+   * <p>


Please don't add these where they don't already exist. You can get IntelliJ to stop adding them by unchecking "Generate 'p' on empty lines" in the JavaDoc code style preferences.

ok, changed the javadoc setting and removed this

gianm · 2018-04-06T05:29:10Z

  {
    private static final int DEFAULT_MAX_ROWS_IN_MEMORY = 75_000;
    private static final int DEFAULT_MAX_TOTAL_ROWS = 20_000_000;
+    private static final long DEFAULT_MAX_BYTES_IN_MEMORY = Runtime.getRuntime().maxMemory() / 3;


I wish we didn't have to have this in two places. Although I guess the default maxRowsInMemory is also in two places.

Yeah, it is being defined in all the implementation of AppenderatorConfig. What do you think, if I define a static final in this interface itself and every implementation uses that. I just tried to follow same convention as for maxRowsInMemory. The upside will be one place to change default later, downside being every implementation is tied to same default value, well, which is same anyways even now. What do you suggest ?

A recent patch created a IndexTaskUtils utility class. Maybe it would make sense to put these defaults there, like IndexTaskUtils.DEFAULT_MAX_ROWS_IN_MEMORY and IndexTaskUtils.DEFAULT_MAX_BYTES_IN_MEMORY.

I have put these 2 defaults, but I there are others. Those can be moved to IndexTaskUtils in later commits.

gianm · 2018-04-06T08:20:30Z

      outOfRowsReason = StringUtils.format("Maximum number of rows [%d] reached", maxRowCount);
    }
+    if (!sizeCheck) {
+      outOfRowsReason = StringUtils.format("Maximum size in bytes [%d] reached", maxBytesInMemory);


How about reporting both reasons in the case where both checks trip at the same time?

gianm · 2018-04-06T08:20:43Z

-    if (!canAdd) {
+    final boolean countCheck = size() < maxRowCount;
+    boolean sizeCheck = true;
+    if (maxBytesInMemory != -1) {


Suggest doing > 0 rather than != -1. That way, any negative number (or zero) means unlimited.

gianm · 2018-04-06T08:21:58Z

      if (TimeAndDims.EMPTY_ROW_INDEX == prev) {
        numEntries.incrementAndGet();
+        if (maxBytesInMemory != -1) {
+          long estimatedRowSize = estimateRowSizeInBytes(key) + maxBytesPerRowForAggregators;


The method estimateRowSizeInBytes should really be named estimateKeySizeInBytes. Or, it should keep the name estimateRowSizeInBytes, but in that case it should do the + maxBytesPerRowForAggregators part. The point being that the row includes both the timeAndDims key, and the aggregators.

Thinking about it a bit more, I think it makes sense to keep the method named estimateRowSizeInBytes and to add the + maxBytesPerRowForAggregators into the method. That way, there's a clear entry point for the logic for estimating a row size, and people can look there if they want to improve it later.

changed to second comment

gianm · 2018-04-06T08:24:39Z

  }

+  @Test
+  public void testMaxBytesInMemory() throws Exception


This test is good, but please add another one verifying that the limit is applied across more than one sink in the same appenderator.

Added one more test

gianm · 2018-04-06T08:25:55Z

  }

+  @VisibleForTesting
+  long getRowSizeInMemory(SegmentIdentifier identifier)


Probably getBytesInMemory?

gianm · 2018-04-06T08:28:56Z

Thanks @surekhasaharan! I left some review comments.

One of them suggested changing the default maxRowsInMemory to 1000000 (the idea being we'll rely a lot more on maxBytesInMemory) although we can also discuss whether or not that is a good idea. There are some pros and cons. I guess the biggest con would be that there is potential for a well-tuned setup with the current scheme to be thrown out of whack somehow. But I think it's worth it if these defaults are better for the majority of cases.

Marked this "release notes" due to the potential change in behavior.

…ce (apache#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment

* Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments

* Allow getDomain to return disjointed intervals * Indentation issues

apache#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR apache#5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR apache#5551 - Adding thetaSketchConstant

* With incremental handoff the changed line is no longer true.

* Add missing doc for automatic pendingSegments * address comments

* Fix indexTask to respect forceExtendableShardSpecs * add comments

Deprecated due to apache#5382

…#5586) Also switch various firehoses to the new method. Fixes apache#5585.

…for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist

* Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues

* Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase

…-set-max-memory

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

gianm

Thanks for the updates @surekhasaharan!

I wrote some more comments. In addition to the comments on the diff, a couple of general ones:

Please add doc entries for the new config: a good way to find where to add this is to search the docs for "maxRowsInMemory".
Please update our tutorials and sample ingestion specs to not have a maxRowsInMemory set. I think some of them have one set explicitly, and that will mess with the attempt here to determine it automatically.
Please also add some maxBytesInMemory handling to Hadoop indexing, by way of HadoopTuningConfig + IndexGeneratorJob.

gianm · 2018-04-12T03:22:05Z


 public class IndexTaskUtils
 {
+  public static final int DEFAULT_MAX_ROWS_IN_MEMORY = 75_000;


Let's use 1_000_000 here.

Now that I think about it some more, this class isn't going to work for these constants.

I'm hoping we can just write them in one place and have other stuff reference them. They are both magical and it's good for magic to be rare. But, one of the things (for example) that has to reference them is HadoopTuningConfig, which is in the hadoop-indexer package.

Maybe the right place to put these is in TuningConfig itself (i.e. io.druid.segment.indexing.TuningConfig).

ok, yeah, when i added the defaults to IndexTaskUtils it didn't cover all the places I had defined this default. Defining those in TuningConfig seems correct. But I am also checking for maxBytesInMemory == 0 in OnheapIncrementalIndex and setting it to default if 0, and now that seems wrong. May be I should instead throw an exception in IncrementalIndex.buildOnHeap. Perhaps now I understand what @jihoonson was talking about.

gianm · 2018-04-12T03:22:28Z

 public class RealtimeAppenderatorTuningConfig implements TuningConfig, AppenderatorConfig
 {
-  private static final int defaultMaxRowsInMemory = 75000;
+  private static final int defaultMaxRowsInMemory = 1000000;


This should use IndexTaskUtils.DEFAULT_MAX_ROWS_IN_MEMORY too.

will change to TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY

gianm · 2018-04-12T03:22:34Z

-  private static final int defaultMaxRowsInMemory = 75000;
+  private static final int defaultMaxRowsInMemory = 1000000;
  private static final int defaultMaxRowsPerSegment = 5_000_000;
+  private static final long defaultMaxBytesInMemory = getDefaultMaxBytesInMemory();


This should use IndexTaskUtils.DEFAULT_MAX_BYTES_IN_MEMORY too.

will change to TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY now.

gianm · 2018-04-12T03:24:02Z

+  @Override
+  public long estimateEncodedKeyComponentSize(int[] key)
+  {
+    /**


The javadoc-style /** comment isn't appropriate here (those are meant for documenting classes, methods, and fields). It could just be a // style comment since it's one line. And usually we use that style within methods even for multiline comments (although sometimes we use /*).

gianm · 2018-04-12T03:25:39Z

      this.parseExceptionMessages = parseExceptionMessages;
    }

+    public AddToFactsResult(


I'd suggest getting rid of the other constructor. Having too many legacy constructors makes it easy to accidentally call the wrong constructor and leave out an important parameter.

IMO - in this case, if addToFacts isn't able to generate a bytesInMemory number, it would be better to fess up and explicitly pass in a zero so it's obvious to readers what is going on.

Removed the other constructor and passed 0 for bytesInMemory from OffheapIncrementalIndex.

gianm · 2018-04-12T03:53:13Z

  // This variable updated in add(), persist(), and drop()
  private final AtomicInteger rowsCurrentlyInMemory = new AtomicInteger();
  private final AtomicInteger totalRows = new AtomicInteger();
+  private final AtomicLong currentBytesInMemory = new AtomicLong();


Would prefer to see this named bytesCurrentlyInMemory for a clear analogy with rowsCurrentlyInMemory.

ok, that name is more suitable here.

gianm · 2018-04-12T03:56:02Z

  private final List<DimensionDesc> dimensionDescsList;
  private final Map<String, ColumnCapabilitiesImpl> columnCapabilities;
  private final AtomicInteger numEntries = new AtomicInteger();
+  private final AtomicLong sizeInBytes = new AtomicLong();


Consider calling this bytesInMemory for consistency with the config name. So it's clear that they're related.

gianm · 2018-04-12T03:59:49Z

+        || (tuningConfig.getMaxBytesInMemory() > 0 && currentBytesInMemory.get() >= tuningConfig.getMaxBytesInMemory())) {
      if (allowIncrementalPersists) {
        // persistAll clears rowsCurrentlyInMemory, no need to update it.
        persistAll(committerSupplier == null ? null : committerSupplier.get());


Please log the reason we're persisting here. There are starting to be enough conditions that it's going to be useful to see the specific one that got triggered.

Refactored this part a bit, please check the format of the log, if it looks okay.

It looks good, although in the nittiest of nit picks, ", " would be nicer than ",".

gianm · 2018-04-12T04:01:00Z

    }
  }

+  //add methods for byte mem checks


This looks like a stray comment that you meant to delete?

gianm · 2018-04-12T04:01:37Z

    return numEntries.get();
  }

+  public long sizeInBytes()


Consider naming this getBytesInMemory() so it's clear it's related to all the other bytes-in-memory stuff.

* Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods

…-set-max-memory

* Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples

gianm · 2018-04-30T04:23:51Z

@surekhasaharan Could you please resolve the conflicts, and we can do another round of review? thanks!

…-set-max-memory

gianm · 2018-04-30T18:11:02Z

  int DEFAULT_MAX_ROWS_IN_MEMORY = 1_000_000;
+  // We initially estimated this to be 1/3(max jvm memory), but bytesCurrentlyInMemory only
+  // tracks active index and not the index being flushed to disk, to account for that
+  // we doubled default to 1/6(max jvm memory)


Nit: it's halved, not doubled.

gianm · 2018-04-30T18:15:00Z

 |`type`|String|The indexing task type, this should always be `kafka`.|yes|
-|`maxRowsInMemory`|Integer|The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).|no (default == 75000)|
+|`maxRowsInMemory`|Integer|The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).|no (default == 1000000)|
+|`maxBytesInMemory`|Long|The maximum number of bytes to keep in memory to aggregate before persisting. This is used to manage the required JVM heap size. |no (default == One-sixth of max JVM memory)|


This should include a comment like maxRowsInMemory's that tells people that the actual max is going to be double (or more, if you set maxPendingPersists higher). It would also be nice to warn people that this is approximate. Maybe something like:

The number of bytes to aggregate in-heap before persisting. This is based on a rough estimate of memory usage, not actual usage. The maximum heap memory usage for indexing is maxBytesInMemory * (2 + maxPendingPersists).

Similar comments at other locations where this parameter is documented.

…-set-max-memory

gianm

LGTM 👍 thanks @surekhasaharan!

This is tagged design review so someone else should take a look too.

jihoonson · 2018-05-01T03:58:55Z

I'll finish my review tomorrow.

jihoonson · 2018-05-01T18:24:16Z

 |type|The task type, this should always be "index".|none|yes|
 |targetPartitionSize|Used in sharding. Determines how many rows are in each segment.|5000000|no|
-|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|75000|no|
+|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|1000000|no|


Looks like some docs on maxRowsInMemory have gone stale. Can we sync all of them?

I did not understand this, what do you mean by "sync all of them".

I meant, we might make all documents for maxRowsInMemory of all tuningConfigs same.

jihoonson · 2018-05-01T18:26:18Z

 |targetPartitionSize|Used in sharding. Determines how many rows are in each segment.|5000000|no|
-|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|75000|no|
+|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|1000000|no|
+|maxBytesInMemory|Used in determining when intermediate persists to disk should occur. This value represents number of bytes to aggregate in heap memory before persisting. This is based on a rough estimate of memory usage and not actual usage. The maximum heap memory usage for indexing is maxBytesInMemory * (2 + maxPendingPersists)|1/6 of max JVM memory|no|


I think some people might be confused by two similar but different configurations. I think it's worthwhile to roughly describe their proper usages.

Agree, it can be confusing, I'll try to add more explanation.

jihoonson · 2018-05-01T18:31:32Z

-  private static final int defaultMaxRowsInMemory = 75000;
+  private static final int defaultMaxRowsInMemory = TuningConfig.DEFAULT_MAX_ROWS_IN_MEMORY;
  private static final int defaultMaxRowsPerSegment = 5_000_000;
+  private static final long defaultMaxBytesInMemory = TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY;


Unused variable.

jihoonson · 2018-05-01T18:34:15Z

  {
    return new RealtimeTuningConfig(
        defaultMaxRowsInMemory,
+        defaultMaxBytesInMemory,


Is this intentional? Probably it should be null or 0 according to the comment on TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY.

changed to 0.

jihoonson · 2018-05-01T18:35:26Z

        DEFAULT_SHARD_SPECS,
        DEFAULT_INDEX_SPEC,
        DEFAULT_ROW_FLUSH_BOUNDARY,
+        DEFAULT_MAX_BYTES_IN_MEMORY,


Same here. Is this intentional? Probably it should be null or 0 according to the comment on TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY.

changed to 0

jihoonson · 2018-05-01T20:23:15Z

  /**
   * Merge segment, push to deep storage. Should only be used on segments that have been fully persisted. Must only
   * be run in the single-threaded pushExecutor.
+<<<<<<< HEAD


Please cleanup this.

jihoonson · 2018-05-01T20:24:24Z

 public class Sink implements Iterable<FireHydrant>
 {
-  private static final IncrementalIndexAddResult ADD_FAILED = new IncrementalIndexAddResult(-1, null);
+


Unnecessary line break.

jihoonson · 2018-05-01T20:24:32Z

    }
  }

+


Unnecessary line break.

jihoonson · 2018-05-01T20:27:09Z

+          tuningConfig.getMaxRowsInMemory()
+      ));
+    }
+    if (tuningConfig.getMaxBytesInMemory() != -1


Better to be tuningConfig.getMaxBytesInMemory() > 0

jihoonson · 2018-05-01T20:45:59Z

  // This variable updated in add(), persist(), and drop()
  private final AtomicInteger rowsCurrentlyInMemory = new AtomicInteger();
  private final AtomicInteger totalRows = new AtomicInteger();
+  private final AtomicLong bytesCurrentlyInMemory = new AtomicLong();


rowsCurrentlyInMemory, bytesCurrentlyInMemory, and totalRows should be updated in sync because different threads can read/update them and independent updates might lead an unexpected behavior. For example, rowsCurrentlyInMemory might be already updated, but bytesCurrentlyInMemory is not when another thread reads them.

This should be done when I added totalRows, but I missed it. Would you add a class to update all these stats atomically? Like

class MemoryStats { Object lock; int rowsInMemory; long bytesInMemory; int rowsInMemoryAndDisk; void add(int rowsInMemory, long bytesInMemory, int rowsInMemoryAndDisk) { synchronized(lock) { this.rowsInMemory += rowsInMemory; this.bytesInMemory += bytesInMemory; this.rowsInMemoryAndDisk += rowsInMemoryAndDisk; } } }

There's no point updating these in sync if they are not checked in sync, so for this change to be useful, the check should be moved into MemoryStats as well. But is it really necessary to check them in sync? The non-add threads can only decrease these values. I don't think anything bad will happen if the values decrease while add is in the middle of checking them.

What do you think?

Hmm, thinking about it some more, if drop is called concurrently with add for the same segment then there might be some problems. However, this never happens in practice, since appenderator drivers only call drop after pushing/handing off segments that they are done writing to. Maybe we can just firm up the contract by adding to the javadoc for drop that callers must not call drop concurrently with add for the same segment. I think there isn't much reason to write the code for this possibility, since it isn't meant to actually happen.

What do you think, once again?

Ah, yes. The checking part is missing in the above snippet and it should also be in there.

And yes. It doesn't cause anything bad, but synchronized updates will lead a better behavior like less persisting. Do you have any other concerns?

I want to sort out whether we would be making this change for correctness reasons or aesthetic reasons. If it's for aesthetic reasons, maybe it's not necessary. And if it's for correctness reasons, then we need to make sure we are synchronizing the appropriate amount of things to fix the bug we are trying to fix.

I guess what I'm saying is that it doesn't look to me like "synchronized updates will lead a better behavior like less persisting" is true in practice, since the updates won't be happening simultaneously if the appenderator is being used properly.

It looks to me like this synchronization only useful if drop and add are called simultaneously [1]. But this shouldn't happen anyway. If it does happen, it will cause worse problems (like data loss, since drop is meant to be used after segments have been pushed/handed off. If they haven't been handed off yet then we will lose data).

[1] The counters are updated in threads that call add, persist, drop, and clear. But add, persist, and clear already have javadoc saying they must be called from the same thread, so there is no synchronization issue. drop is the only one that is allowed to be called from a different thread than the other three.

Ok, I got your point. Thanks.

@surekhasaharan I think probably it would be ok to leave these alone until we figure out what the right approach is here, it might be a separate PR to either clean up the contract or synchronize some more stuff.

@jihoonson let us know what you think too.

@surekhasaharan I talked with @gianm offline. Please see the new comment. If this is the case, rowsCurrentlyInMemory and bytesCurrentlyInMemory have no longer to be AtomicInteger because they are used by the same thread.

If we end up doing the atomic-to-nonatomic change, I think it makes sense to do it in a separate PR. I think we should at least address the KafkaIndexTask publishing being in a separate thread first (#5729). And get rid of persist too.

So my vote is to not worry about it for this particular patch.

Agree. @surekhasaharan if you don't want to do in this PR, please raise an issue about it.

jihoonson · 2018-05-01T23:12:19Z

    // Decrement this sink's rows from rowsCurrentlyInMemory (we only count active sinks).
    rowsCurrentlyInMemory.addAndGet(-sink.getNumRowsInMemory());
    totalRows.addAndGet(-sink.getNumRows());
+    bytesCurrentlyInMemory.addAndGet(-sink.getBytesInMemory());


This and rowsCurrentlyInMemory.addAndGet(-sink.getNumRowsInMemory()); are not necessary because sink should be pushed before calling this method and thus there should be no data in memory. Instead, we need to add a sanity check that sink.getNumRowsInMemory() == 0. Same for getBytesInMemory().

Not clear, where do we want to add this sanity check ?

We can add a sanity check instead of rowsCurrentlyInMemory.addAndGet(-sink.getNumRowsInMemory()) because sink.getNumRowsInMemory() should always return 0 here. But, as commented #5583 (comment), you don't have to do this in this PR.

…-set-max-memory

jihoonson · 2018-05-02T18:58:43Z

 {
-  private static final int defaultMaxRowsInMemory = 75000;
+  private static final int defaultMaxRowsInMemory = TuningConfig.DEFAULT_MAX_ROWS_IN_MEMORY;
+  private static final long defaultMaxBytesInMemory = TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY;


This variable is not used anymore.

jihoonson · 2018-05-02T19:07:19Z

  private static final Logger log = new Logger(OnheapIncrementalIndex.class);
-
+  /**
+   * overhead per {@link ConcurrentHashMap.Node} object


This should be Overhead per {@link ConcurrentHashMap.Node} or {@link ConcurrentSkipListMap.Node} object because the facts can be either one of them. Related codes:

https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/segment/incremental/IncrementalIndex.java#L1196-L1200

https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/segment/incremental/IncrementalIndex.java#L1284-L1287

jihoonson · 2018-05-02T19:09:38Z

  private final ConcurrentHashMap<Integer, Aggregator[]> aggregators = new ConcurrentHashMap<>();
  private final FactsHolder facts;
  private final AtomicInteger indexIncrement = new AtomicInteger(0);
+  private long maxBytesPerRowForAggregators = 0;


The initialization to 0 is unnecessary and this can be final.

jihoonson · 2018-05-02T20:08:02Z

 |type|The task type, this should always be "index".|none|yes|
 |targetPartitionSize|Used in sharding. Determines how many rows are in each segment.|5000000|no|
-|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|75000|no|
+|maxRowsInMemory|Used in determining when intermediate persists to disk should occur. Normally user does not need to set this, but depending on the nature of data, if rows are short in terms of bytes, user may not want to store a million rows in memory and this value should be set.|1000000|no|


This looks nice. Thanks!

Please add this and the below description on maxBytesInMemory to all other places as well.

…-set-max-memory

jihoonson

@surekhasaharan thanks! Looks good to me now.

jon-wei

Design LGTM, had a minor comment

jon-wei · 2018-05-03T20:02:25Z


    this.maxRowsInMemory = maxRowsInMemory == null ? defaults.getMaxRowsInMemory() : maxRowsInMemory;
    this.maxRowsPerSegment = maxRowsPerSegment == null ? DEFAULT_MAX_ROWS_PER_SEGMENT : maxRowsPerSegment;
+    // initializing this to 0, it will be lazily intialized to a value


intialized -> initialized

ah, it's at more than one place. Will fix.

…-set-max-memory

…e#5583) * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Fix check style and remove a comment * Add overlord unsecured paths to coordinator when using combined service (apache#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment * More error reporting and stats for ingestion tasks (apache#5418) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments * Allow getDomain to return disjointed intervals (apache#5570) * Allow getDomain to return disjointed intervals * Indentation issues * Adding feature thetaSketchConstant to do some set operation in PostAgg (apache#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR apache#5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR apache#5551 - Adding thetaSketchConstant * Fix taskDuration docs for KafkaIndexingService (apache#5572) * With incremental handoff the changed line is no longer true. * Add doc for automatic pendingSegments (apache#5565) * Add missing doc for automatic pendingSegments * address comments * Fix indexTask to respect forceExtendableShardSpecs (apache#5509) * Fix indexTask to respect forceExtendableShardSpecs * add comments * Deprecate spark2 profile in pom.xml (apache#5581) Deprecated due to apache#5382 * CompressionUtils: Add support for decompressing xz, bz2, zip. (apache#5586) Also switch various firehoses to the new method. Fixes apache#5585. * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Address code review comments * Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues * Address more code review comments * Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase * Fix some style checks * Merge conflicts * Fix failing tests Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex * Address PR comments * Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods * Fix TeamCity inspection warnings * Added maxBytesInMemory config to HadoopTuningConfig * Updated the docs and examples * Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples * Set maxBytesInMemory to 0 until used Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing and set to part of max jvm memory when ingestion task starts * Update toString in KafkaSupervisorTuningConfig * Use correct maxBytesInMemory value in AppenderatorImpl * Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory Experimenting with various defaults, 1/3 jvm memory causes OOM * Update docs to correct maxBytesInMemory default value * Minor to rename and add comment * Add more details in docs * Address new PR comments * Address PR comments * Fix spelling typo

hellobabygogo · 2019-02-13T03:41:40Z

Runtime.maxMemory

why it's 1/6 of Runtime.maxMemory

hellobabygogo · 2019-02-13T03:42:18Z

Runtime.maxMemory

why it's 1/6 of Runtime.maxMemory

@jihoonson @gianm @surekhasaharan

surekhasaharan · 2019-02-13T05:53:49Z

@hellobabygogo there is a comment in the code about this here. Some more details, while analyzing the heap dump on OOMEs, we found that two OnheapIncrementalIndex instances roughly took 2/3 of retained heap, with each instance roughly taking 1/3 of heap. So we realized that our original default which was 1/3(jvm mem) may be too aggressive, since rows or bytes in memory trackers track only the active index, but not the one being flushed to disk. So the total used memory could be upto double of what is being tracked (as there's always one active index and potentially one being flushed to disk).

Surekha Saharan added 2 commits April 5, 2018 13:41

Fix check style and remove a comment

b31d634

jihoonson reviewed Apr 6, 2018

View reviewed changes

gianm added the Release Notes label Apr 6, 2018

gianm reviewed Apr 6, 2018

View reviewed changes

jihoonson mentioned this pull request Apr 6, 2018

More accurate dictionary size estimation in RowBasedKeySerde #4768

Closed

jon-wei and others added 16 commits April 6, 2018 16:21

Add overlord unsecured paths to coordinator when using combined servi…

ac401c5

…ce (apache#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment

Allow getDomain to return disjointed intervals (apache#5570)

7f4188f

* Allow getDomain to return disjointed intervals * Indentation issues

Fix taskDuration docs for KafkaIndexingService (apache#5572)

10dc150

* With incremental handoff the changed line is no longer true.

Add doc for automatic pendingSegments (apache#5565)

ea6b347

* Add missing doc for automatic pendingSegments * address comments

Fix indexTask to respect forceExtendableShardSpecs (apache#5509)

2bbc6d6

* Fix indexTask to respect forceExtendableShardSpecs * add comments

Deprecate spark2 profile in pom.xml (apache#5581)

e9906e8

Deprecated due to apache#5382

CompressionUtils: Add support for decompressing xz, bz2, zip. (apache…

99315da

…#5586) Also switch various firehoses to the new method. Fixes apache#5585.

Address code review comments

8c85a65

* Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues

Address more code review comments

55a3d2b

* Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase

Fix some style checks

c40678b

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

1a49eda

…-set-max-memory

Merge conflicts

9f87c2f

Fix failing tests

c45bf3b

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

gianm reviewed Apr 12, 2018

View reviewed changes

Surekha Saharan and others added 6 commits April 12, 2018 18:08

Address PR comments

5363f0f

* Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

94cd7db

…-set-max-memory

Fix TeamCity inspection warnings

813a261

Added maxBytesInMemory config to HadoopTuningConfig

ec24d3a

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

acb4020

…-set-max-memory

Updated the docs and examples

be9a0c1

* Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples

Surekha Saharan added 2 commits April 30, 2018 10:09

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

49c4929

…-set-max-memory

Minor to rename and add comment

27f98b5

gianm reviewed Apr 30, 2018

View reviewed changes

Surekha Saharan added 2 commits April 30, 2018 11:40

Add more details in docs

9b8b39f

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

1d358d7

…-set-max-memory

gianm approved these changes Apr 30, 2018

View reviewed changes

jihoonson reviewed May 1, 2018

View reviewed changes

Surekha Saharan added 2 commits May 2, 2018 11:46

Address new PR comments

28adb60

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

dd071cf

…-set-max-memory

jihoonson reviewed May 2, 2018

View reviewed changes

Surekha Saharan added 2 commits May 2, 2018 13:51

Address PR comments

5288da3

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

82fd254

…-set-max-memory

jihoonson approved these changes May 2, 2018

View reviewed changes

jon-wei approved these changes May 3, 2018

View reviewed changes

Surekha Saharan added 2 commits May 3, 2018 13:57

Fix spelling typo

c405b53

Merge branch 'master' of github.com:druid-io/druid into feature-allow…

62d20f7

…-set-max-memory

gianm merged commit 13c616b into apache:master May 3, 2018

jihoonson mentioned this pull request May 8, 2018

Materialized view implementation #5556

Merged

surekhasaharan deleted the feature-allow-set-max-memory branch May 8, 2018 20:28

jihoonson mentioned this pull request Jun 5, 2018

Fix Task docs for Tuning Config #5845

Merged

dclim added this to the 0.13.0 milestone Oct 8, 2018

dclim mentioned this pull request Oct 10, 2018

Druid 0.13.0-incubating release notes #6442

Closed

		@@ -39,6 +39,7 @@ public class RealtimeAppenderatorTuningConfig implements TuningConfig, Appendera
		{
		private static final int defaultMaxRowsInMemory = 75000;

Conversation

surekhasaharan commented Apr 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson Apr 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

surekhasaharan commented Apr 5, 2018 •

edited

Loading

jihoonson Apr 6, 2018 •

edited

Loading

gianm Apr 6, 2018 •

edited

Loading