Skip to content

Sequences refactorings and removed unused code (part of #3798)#3693

Merged
fjy merged 6 commits intoapache:masterfrom
metamx:sequences-refactorings
Jan 20, 2017
Merged

Sequences refactorings and removed unused code (part of #3798)#3693
fjy merged 6 commits intoapache:masterfrom
metamx:sequences-refactorings

Conversation

@leventov
Copy link
Copy Markdown
Member

This PR includes several related fixes/refactorings around Sequences subsystem, in the importance order:

  1. Make Sequences.withBaggage() and Sequences.withEffect() to consistently close the baggage and execute the effect after base sequence is closed. Before, it was so in accumulate() path but was the opposite in Yielder path. This in mentioned in Review resource handling in Sequence/Yielder subsystem #3563.
  2. Fixed potential defect in SpecificSegmentQueryRunner, which catches MissingSegmentsException in toYielder() but doesn't catch exception in Yielder.next(). This fix develops the idea of SpecificSegmentQueryRunner misses missing segments from toYielder() #3617. Probably it could never be an issue in practice, because if the segment is missing, MissingSegmentsException should always be thrown from toYielder(). But I'm not sure.
  3. Add Sequences.wrap() which allows to simplify and remove repetitive code from MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner.
  4. Review exception handling in Sequences subsystem
  5. Remove unused classes from io.druid.java.util.common.guava package.
  6. Try to create sequences via factory methods from Sequences class rather than constructing XxxSequence classes directory; reduce visibility of the latter somewhere.

…apache#3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on apache#3617)
@drcrallen drcrallen added this to the 0.9.3 milestone Nov 28, 2016
@drcrallen drcrallen self-assigned this Nov 28, 2016
@gianm gianm added the Discuss label Nov 29, 2016
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 29, 2016

Marking Discuss as confusion was evident on the dev sync about what this is actually fixing or improving in practice. @leventov could you please elaborate on whether this is fixing/improving actual observed issues, actual issues that were unobserved, or things that are not known to be actual issues?

@leventov
Copy link
Copy Markdown
Member Author

@gianm

  1. Negative effects vary from inconsistent CPUTime/Query time measurement (including/not "closing time" of the wrapped sequence) to memory leaks and who knows what effects, because some "sub-resources" are closed after "super-resources", i. e. the resources the former are "derived" from. Taking accumulate() or Yielder path is hardly trackable and undetermined in some sense, because applying e. g. Sequences.limit() on upper level turns accumulate() into Yielder iteration underneath (because it's a YieldingSequenceBase)

  2. It's just breaks general Yielder implementation contract, when "the same thing" should be done in toYielder() and next(). It's hard to say if it is a bug or not right now, (if MissingSegmentsException is always thrown in toYielder(), it is not), but it could become a bug at any point in future.

  3. At the moment it removes some repetitive code. I originally did this because I added more to those code as part of my project of increasing amount and precision of query metrics. And repeating 30 lines of code in three places felt much more dangerous and boilerplate than repeating 10 lines. (This change is not yet in the PR itself, because it's blocked by this PR).

  4. This addresses Review resource handling in Sequence/Yielder subsystem #3563. Attempt to match try-with-resources safety in Sequences. Try-with-resources always catch Throwables and use addSuppressed() when needed. Current code catches either Exception or RuntimeException and logs exceptions which should better be suppressed. Suppression allows to see more actual exceptions in query response, for example. Generally it guarantees that two exceptions ("main" and suppressed) are logged at the same place, wherever it is done. Logging one exception and propagating the another may "unlink" those two in the unified log.

  5. Just remove unused code (merging java-util, extendedset, bytebuffer-collections, etc. opens a lot of such opportunities)

  6. Eases Sequences refactoring in future, and generally static factory methods are considered better than constructors, but really a minor thing (and it's only a couple of lines in this PR)

leventov added a commit to metamx/java-util that referenced this pull request Dec 15, 2016
leventov added a commit to metamx/java-util that referenced this pull request Dec 15, 2016
leventov added a commit to metamx/druid that referenced this pull request Dec 15, 2016
….common.guava package; fix apache#3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on apache#3617)
@leventov leventov changed the title Sequences refactorings and removed unused code Sequences refactorings and removed unused code (part of #3798) Dec 22, 2016
@leventov
Copy link
Copy Markdown
Member Author

@fjy @gianm this is a part of #3798, let it be the justification for this PR

{
if (isDone()) {
boolean done = isDone();
baseYielder.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior here is different if baseYielder.close() throws an exception – the runnable will no longer execute. Is that intentional?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not executing effect if exception is thrown in close() was aligned with behaviour in accumulate(), where exec.execute(effect) was just called after subSequence.accumulate(), without finally. However it seems reasonable to me to execute effect if exception is thrown in close(). To distinguish between exception thrown in close() and during the processing (before the sequence "is done") I had to use YieldingSequenceBase.

Also this whole thing of not executing the effect if the subsequence is not done is questionable. We use Sequences.withEffect() in two places in production. It seems to me that it makes sense to execute the effect in those places even if the subsequence is not "done" (especially considering that Sequences.limit() makes the sequence never done). What do you think?

Copy link
Copy Markdown
Contributor

@gianm gianm Jan 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Sequences.withEffect is only used for populating cache (in CachingClusteredClient and CachingQueryRunner). In that case, I think it's important to avoid populating the cache if the sequence isn't done, since the cache should only contain full result sets.

I'm not sure if there's a good reason that limited sequences have that isDone behavior though. I would think a limited sequence that reaches its limit is "done". But maybe there's something I'm missing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also a problem with caching whereby serializing for cache is very expensive, but if you allow the cache to be calculated / populated in the background, then things like HLL buffers get recycled too soon. Not sure if such a thing is solvable at this point, but it would be nice if background caching worked right with how query resources are freed.

if (thrown != null) {
builder.setDimension(DruidMetrics.STATUS, "failed");
} else if (!isDone) {
builder.setDimension(DruidMetrics.STATUS, "short");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sketchy to call builder.setDimension here, after emitter.emit(builder.build(...)) has already been called. This will be going in and editing a hashmap that has already been handed off to the emitter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}
// Use finally block because emitting query time (in `try {}`, above) and the status (in `finally {}`,
// below) are unrelated and we don't want the latter to be skipped if the former thrown any exception.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the Emitter contract is that emitter.emit does not throw unless something is really wrong (like OOME or link error or something). So this shouldn't be a worry.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Dec 24, 2016

Other than those 3 comments, the rest looks good to me.

@gianm gianm removed the Discuss label Dec 24, 2016
@leventov
Copy link
Copy Markdown
Member Author

@gianm thanks, addressed comments.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@leventov leventov changed the title Sequences refactorings and removed unused code (part of #3798) Sequences refactorings and removed unused code (part of #3798) [WIP] Jan 12, 2017
@leventov
Copy link
Copy Markdown
Member Author

@gianm I want also resolve the question with LimitedSequence and isDone() as part of this PR, so marked as WIP

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 12, 2017

ok @leventov. @cheddar do you have insight into this thread: #3693 (comment), & if there's a good reason for why things are the way they are?

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jan 12, 2017 via email

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 12, 2017

@cheddar Just wondering if there's a good reason that limited sequences aren't done when they finish, or if that's a bug. It seems like a bug but maybe there's a reason that they behave the way they do.

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jan 13, 2017

@gianm I'm not sure why you think it wouldn't be done. I see Yielders.done being called, which should be done. Also, remember that the call to close is the user of the Yielder's responsibility, so once the final Yielder is yielded, close() should get called and resources cleaned up.

That said, I also realize you might be talking abotu the isDone() call. A cursory glance at that makes me wonder if it is implemented correctly. I'm not quite sure. It looks like at some point in time interruptYield and the other check were added on top of just "within bounds". I'm not sure those belong in isDone. That said, though, there's nothing like a unit test or two to show the current behavior and see if it lines up or doesn't line up with what we expect. Fwiw, I would expect that the yielder returned on the next() call after the final element to be done().

Also, all of the "baggage" related stuff I would expect to be done in the close() method. I'm not quite sure what the semantic difference is between withEffect and withBaggage, but I would question the semantics of anything that is not done in the close() method or on each iteration.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 13, 2017

Ahhh, I think I see what's going on. I misunderstood the behavior that exists now. I thought that @leventov was saying that limited sequences don't act as done when they hit their limit. They do and that's good. What does happen is that if you do this:

Sequences.limit(
  Sequences.withEffect(baseSequence, runnable, exec),
  threshold
)

Then the "runnable" will not run if the "threshold" is reached before "baseSequence" is exhausted. This is good behavior for what we use withEffect for in production, which is populating the cache. We don't want the cache to populate with an un-exhausted result set.

So I think the behavior is fine as-is and just needs to be documented (since it's not obvious what is going on and why).

@leventov
Copy link
Copy Markdown
Member Author

@gianm @cheddar the problem is #3849. I don't see how it could be easily fixed, so I'm not going to fix this in this PR. At least, this PR shouldn't make the things worse than they used to be. So I'm removing [WIP]

@leventov leventov changed the title Sequences refactorings and removed unused code (part of #3798) [WIP] Sequences refactorings and removed unused code (part of #3798) Jan 13, 2017
@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Jan 13, 2017 via email

@leventov
Copy link
Copy Markdown
Member Author

Could someone please review this? @cheddar @himanshug @jon-wei

@jihoonson
Copy link
Copy Markdown
Contributor

This patch looks good to me.

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Jan 20, 2017

LGTM, 👍

@fjy fjy merged commit af93a8d into apache:master Jan 20, 2017
@leventov leventov deleted the sequences-refactorings branch January 20, 2017 04:09
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
…pache#3693)

* Removing unused code from io.druid.java.util.common.guava package; fix apache#3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on apache#3617)

* Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close()

* Fix strange code in MetricsEmittingQueryRunner

* Add comment on why YieldingSequenceBase is used in Sequences.withEffect()

* Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants