Skip to content

Granularity interval materialization#10742

Merged
suneet-s merged 32 commits intoapache:masterfrom
loquisgon:granularity-interval-materialization
Jan 29, 2021
Merged

Granularity interval materialization#10742
suneet-s merged 32 commits intoapache:masterfrom
loquisgon:granularity-interval-materialization

Conversation

@loquisgon
Copy link
Copy Markdown

@loquisgon loquisgon commented Jan 11, 2021

Description

The UniformGranularitySpec currently materializes all intervals when it is constructed. This may cause OOM in the Overlord and other modules whenever the resulting interval list is very large. The changes here should avoid that materialization in all places in the overlord for the UniformGranularitySpec. There is still one method in the unform granularity spec, public Optional<Interval> bucketInterval(DateTime dt) that materializes all the intervals but care is taken that the intervals are not materialized unless that particular method is invoked. Since this method is only called outside the Overlord the OOM issues should be less drastic than they are today.


Key changed/added classes in this PR

A public API change was made to the interface GranularitySpec. Now the method bucketIntervals() returns a Iterable<Interval> rather than a materialized set of intervals as it used to return before this change. The other significant change was to avoid materialization of the intervals in the constructor of the UniformGranularitySpec.

Most of the work is delegated to a new helper class IntervalsByGranularity which takes a Collection of intervals and a Granularity in its constructor to build an Iterator avoiding the materialization of the intervals. This class also ensures that the intervals returned by the iterator are appropriately sorted. Then this class is used in the critical places were materialization was taken place. The class ArbitraryGranularitySpec still materializes all intervals so use the one with care.


This PR has:

  • [ X] been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [X ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@loquisgon loquisgon force-pushed the granularity-interval-materialization branch from e866686 to 48b49a6 Compare January 11, 2021 23:11
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cleanup the javadoc in next commit

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract indicates that inputIntervals must be left "as is".... will remove sorting in next commit

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@loquisgon loquisgon marked this pull request as ready for review January 12, 2021 18:37
@loquisgon loquisgon force-pushed the granularity-interval-materialization branch from a1a334c to 9f81cf7 Compare January 13, 2021 00:45
@loquisgon loquisgon force-pushed the granularity-interval-materialization branch from 9f81cf7 to 86c3f54 Compare January 13, 2021 00:52
ArrayList<Interval> condensedIntervals = JodaUtils.condenseIntervals(() -> uniqueIntervals.iterator());
intervalIterator = condensedIntervals.iterator();
} else {
IntervalsByGranularity intervalsByGranularity = new IntervalsByGranularity(intervals, segmentGranularity);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we no longer condensing the intervals if segmentGranularity != null?

Copy link
Copy Markdown
Author

@loquisgon loquisgon Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

private final Iterable<Interval> intervalIterable;
private final TreeSet<Interval> intervals;

public LookupIntervalBuckets(Iterable<Interval> intervalIterable)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class basically something like a lazy load of the TreeSet?

Copy link
Copy Markdown
Author

@loquisgon loquisgon Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this class to more cleanly re-use the TreeSet logic for bucket extraction among the ArbitraryGranularitySpec and the UniformGranularitySpec. The code in this PR goal is to avoid materialization of intervals in the Overlord only. We will deal with materialization issues elsewhere later. Thus when code outside the Overlord needs to get a bucket for a particuar DateTime then the code will actually materialize all the intervals and stores them in a fast lookup data structure (like TreeSet) to return the bucket. In the UniformGranularitySpec, we could still iterate through all the intervals and find the bucket (since the intervals are sorted) but that would be linear in the number of the intervals for every call. This is not acceptable given the frequency of calling the method to find the bucket. We feel we can improve on this but we decided to divide the work and at least for now prevent the materialization of intervals in the overlord (see PR description also).

return Optional.of(JodaUtils.condenseIntervals(setOptional.get()));
Iterable<Interval> bucketIntervals = schema.getDataSchema().getGranularitySpec().bucketIntervals();
if (bucketIntervals.iterator().hasNext()) {
return Optional.of(JodaUtils.condenseIntervals(schema.getDataSchema().getGranularitySpec().bucketIntervals()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can reuse bucketIntervals variable here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

IntervalsByGranularity intervalsByGranularity = new IntervalsByGranularity(intervals, segmentGranularity);
intervalIterator = intervalsByGranularity.granularityIntervalsIterator();
// the following is calling a condense that does not materialize the intervals:
uniqueCondensedIntervals.addAll(JodaUtils.condenseIntervals(intervalsByGranularity.granularityIntervalsIterator()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't addAll materialize all the intervals? Is materializing the condensed intervals fine as long as we are not materializing the original intervals?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, addAll is using the materialized intervals being returned by the condenseIntervals call. The assumption here is that the condensed intervals are much fewer than the bucket intervals. I think this assumption will hold in the majority of real cases. I think that in a rare corner case where the original intervals cannot be condensed then we still would have issues, but not in the majority of production cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, addAll is using the materialized intervals being returned by the condenseIntervals call. The assumption here is that the condensed intervals are much fewer than the bucket intervals. I think this assumption will hold in the majority of real cases. I think that in a rare corner case where the original intervals cannot be condensed then we still would have issues, but not in the majority of production cases.

I agree but it doesn't seem hard to avoid materialization. How about adding a new method condensedIntervalsIterator() which returns an iterator that lazily computes condensed intervals? It could be something like this:

  public static Iterator<Interval> condensedIntervalsIterator(Iterator<Interval> sortedIntervals)
  {
    if (!sortedIntervals.hasNext()) {
      return Iterators.emptyIterator();
    }

    final PeekingIterator<Interval> peekingIterator = Iterators.peekingIterator(sortedIntervals);
    return new Iterator<Interval>()
    {
      @Override
      public boolean hasNext()
      {
        return peekingIterator.hasNext();
      }

      @Override
      public Interval next()
      {
        if (!hasNext()) {
          throw new NoSuchElementException();
        }
        Interval currInterval = peekingIterator.next();
        while (peekingIterator.hasNext()) {
          Interval next = peekingIterator.peek();

          if (currInterval.abuts(next)) {
            currInterval = new Interval(currInterval.getStart(), next.getEnd());
            peekingIterator.next();
          } else if (currInterval.overlaps(next)) {
            DateTime nextEnd = next.getEnd();
            DateTime currEnd = currInterval.getEnd();
            currInterval = new Interval(
                currInterval.getStart(),
                nextEnd.isAfter(currEnd) ? nextEnd : currEnd
            );
            peekingIterator.next();
          } else {
            break;
          }
        }
        return currInterval;
      }
    };
  }

Then, tryTimeChunkLock() doesn't have to materialize intervals at all after condensing.

  protected boolean tryTimeChunkLock(TaskActionClient client, List<Interval> intervals) throws IOException
  {
    // The given intervals are first converted to align with segment granularity. This is because,
    // when an overwriting task finds a version for a given input row, it expects the interval
    // associated to each version to be equal or larger than the time bucket where the input row falls in.
    // See ParallelIndexSupervisorTask.findVersion().
    final Iterator<Interval> intervalIterator;
    final Granularity segmentGranularity = getSegmentGranularity();
    if (segmentGranularity == null) {
      intervalIterator = JodaUtils.condenseIntervals(intervals).iterator();
    } else {
      IntervalsByGranularity intervalsByGranularity = new IntervalsByGranularity(intervals, segmentGranularity);
      // the following is calling a condense that does not materialize the intervals:
      intervalIterator = JodaUtils.condensedIntervalsIterator(intervalsByGranularity.granularityIntervalsIterator());
    }

    // Intervals are already condensed to avoid creating too many locks.
    // Intervals are also sorted and thus it's safe to compare only the previous interval and current one for dedup.
    Interval prev = null;
    while (intervalIterator.hasNext()) {
      final Interval cur = intervalIterator.next();
      if (prev != null && cur.equals(prev)) {
        continue;
      }
      prev = cur;
      final TaskLock lock = client.submit(new TimeChunkLockTryAcquireAction(TaskLockType.EXCLUSIVE, cur));
      if (lock == null) {
        return false;
      }
    }
    return true;
  }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on the second thought, I think the current code is good enough. However, please add some comment explaining why it is OK to materialize intervals here. I think your comment above would be nice.

Copy link
Copy Markdown
Author

@loquisgon loquisgon Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your suggestion so I followed it and will avoid materializing them.

Comment thread core/src/main/java/org/apache/druid/java/util/common/JodaUtils.java Outdated
/**
* This method does not materialize the intervals represented by the
* sortedIntervals iterator. However, caller needs to insure that sortedIntervals
* is already sorted in ascending order.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Interval is not Comparable (it's not because there could be multiple ways to compare them), it would be nice to explicitly state the order should be Comparators.intervalsByStartThenEnd() to make it more clear.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a sanity check in the method, so that ti can break if this expectation doesn't meet? Otherwise, it will be hard to debug if something goes wrong in the future.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

intervalSet.addAll(intervals);
this.sortedIntervals = new ArrayList<>(intervals.size());
this.sortedIntervals.addAll(intervalSet);
this.sortedIntervals.sort(Comparators.intervalsByStartThenEnd());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that IntervalIterator would work only when sortedIntervals don't overlap each other because the intervals returned from IntervalIterator may not be sorted otherwise. Is this correct? Then, please add a sanity check after dedup to make sure there is no overlap.

Copy link
Copy Markdown
Author

@loquisgon loquisgon Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I will add the sanity check

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson @loquisgon How can we be sure that intervals don't overlap each?

if (currentIterator.hasNext()) {

// drop all subsequent intervals that are the same as the previous...
while (previous != null && previous.equals(currentIterator.peek())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the iterator created from granularity.getIterable() ever return the same interval? It doesn't seem like so and this check seems unnecessary.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put comments in the code to illustrate when this can happen (also I have unit tests to catch this)

if (sortedIntervals.isEmpty()) {
ite = Collections.emptyIterator();
} else {
ite = new IntervalIterator(sortedIntervals);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does IntervalIterator just flatten the nested iterators and concat them? If this is the case, I suggest to use FluentIterable.from(sortedIntervals).transformAndConcat(interval -> granularity.getIterable(interval)) instead because it's better to use a well-tested library than inventing another wheel.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

groupByJob.setOutputFormatClass(SequenceFileOutputFormat.class);
groupByJob.setPartitionerClass(DetermineHashedPartitionsPartitioner.class);
if (!config.getSegmentGranularIntervals().isPresent()) {
if (!config.getSegmentGranularIntervals().iterator().hasNext()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code just wants to know whether inputIntervals are set or not. Suggest to add a new method hasInputIntervals() in GranularitySpec for better readability.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for other places where it calls iterator.hasNext() for the same purpose.

Copy link
Copy Markdown
Author

@loquisgon loquisgon Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding a new method I decided to use GranularitySpec::inputIntervals().isEmpty(). Done.

IntervalsByGranularity intervalsByGranularity = new IntervalsByGranularity(intervals, segmentGranularity);
intervalIterator = intervalsByGranularity.granularityIntervalsIterator();
// the following is calling a condense that does not materialize the intervals:
uniqueCondensedIntervals.addAll(JodaUtils.condenseIntervals(intervalsByGranularity.granularityIntervalsIterator()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, addAll is using the materialized intervals being returned by the condenseIntervals call. The assumption here is that the condensed intervals are much fewer than the bucket intervals. I think this assumption will hold in the majority of real cases. I think that in a rare corner case where the original intervals cannot be condensed then we still would have issues, but not in the majority of production cases.

I agree but it doesn't seem hard to avoid materialization. How about adding a new method condensedIntervalsIterator() which returns an iterator that lazily computes condensed intervals? It could be something like this:

  public static Iterator<Interval> condensedIntervalsIterator(Iterator<Interval> sortedIntervals)
  {
    if (!sortedIntervals.hasNext()) {
      return Iterators.emptyIterator();
    }

    final PeekingIterator<Interval> peekingIterator = Iterators.peekingIterator(sortedIntervals);
    return new Iterator<Interval>()
    {
      @Override
      public boolean hasNext()
      {
        return peekingIterator.hasNext();
      }

      @Override
      public Interval next()
      {
        if (!hasNext()) {
          throw new NoSuchElementException();
        }
        Interval currInterval = peekingIterator.next();
        while (peekingIterator.hasNext()) {
          Interval next = peekingIterator.peek();

          if (currInterval.abuts(next)) {
            currInterval = new Interval(currInterval.getStart(), next.getEnd());
            peekingIterator.next();
          } else if (currInterval.overlaps(next)) {
            DateTime nextEnd = next.getEnd();
            DateTime currEnd = currInterval.getEnd();
            currInterval = new Interval(
                currInterval.getStart(),
                nextEnd.isAfter(currEnd) ? nextEnd : currEnd
            );
            peekingIterator.next();
          } else {
            break;
          }
        }
        return currInterval;
      }
    };
  }

Then, tryTimeChunkLock() doesn't have to materialize intervals at all after condensing.

  protected boolean tryTimeChunkLock(TaskActionClient client, List<Interval> intervals) throws IOException
  {
    // The given intervals are first converted to align with segment granularity. This is because,
    // when an overwriting task finds a version for a given input row, it expects the interval
    // associated to each version to be equal or larger than the time bucket where the input row falls in.
    // See ParallelIndexSupervisorTask.findVersion().
    final Iterator<Interval> intervalIterator;
    final Granularity segmentGranularity = getSegmentGranularity();
    if (segmentGranularity == null) {
      intervalIterator = JodaUtils.condenseIntervals(intervals).iterator();
    } else {
      IntervalsByGranularity intervalsByGranularity = new IntervalsByGranularity(intervals, segmentGranularity);
      // the following is calling a condense that does not materialize the intervals:
      intervalIterator = JodaUtils.condensedIntervalsIterator(intervalsByGranularity.granularityIntervalsIterator());
    }

    // Intervals are already condensed to avoid creating too many locks.
    // Intervals are also sorted and thus it's safe to compare only the previous interval and current one for dedup.
    Interval prev = null;
    while (intervalIterator.hasNext()) {
      final Interval cur = intervalIterator.next();
      if (prev != null && cur.equals(prev)) {
        continue;
      }
      prev = cur;
      final TaskLock lock = client.submit(new TimeChunkLockTryAcquireAction(TaskLockType.EXCLUSIVE, cur));
      if (lock == null) {
        return false;
      }
    }
    return true;
  }


interval = maybeInterval.get();
if (!bucketIntervals.get().contains(interval)) {
if (!Iterators.contains(bucketIntervals.iterator(), interval)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called whenever subtasks need to allocate a new segment via the supervisor task. As a result, this code is never called in the Overlord. It might be better to do this check without materialization for less memory pressure, but will degrade ingestion performance. I suggest to keep the current behaviour of materializing all bucket intervals here because I'm not sure what is the best way to handle this yet. We need to think more about how to fix the OOM error in the task as a follow-up.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will materialize them then

import java.util.Iterator;
import java.util.TreeSet;

public class LookupIntervalBuckets
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some javadoc explaining what this class does and what is for.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @return Iterable of all time groups
*/
Optional<SortedSet<Interval>> bucketIntervals();
Iterable<Interval> bucketIntervals();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest sortedBucketIntervals() to make it clear the results are sorted.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ingestionSchema,
getContext(),
intervalToPartitions
toolbox,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this PR is OK, but just FYI, it's usually recommended to not fix the code style unless you are modifying that area because it makes hard to find what are the real changes of the PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably that was an automatic intelli-j change.

@jihoonson
Copy link
Copy Markdown
Contributor

Forgot to mention one more thing. I think this PR won't fix all OOM errors in the Overlord and the next place where we should fix is likely TaskLockbox that manages all task locks per interval. However, I would say this PR is a good start to look into the OOM errors in the Overlord 🙂

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 after CI. @loquisgon thank you!

@suneet-s suneet-s merged commit 0e4750b into apache:master Jan 29, 2021
@loquisgon loquisgon deleted the granularity-interval-materialization branch January 29, 2021 16:55
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants