Skip to content

Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798)#3889

Merged
himanshug merged 30 commits intoapache:masterfrom
metamx:monomorphic-processing
Mar 17, 2017
Merged

Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798)#3889
himanshug merged 30 commits intoapache:masterfrom
metamx:monomorphic-processing

Conversation

@leventov
Copy link
Copy Markdown
Member

@leventov leventov commented Jan 28, 2017

  • Monomorphic processing: add HotLoopCallee, @CalledFromHotLoop, RuntimeShapeInspector, StringRuntimeShape, SpecializationService
  • Specialized topN queries with 1 or 2 aggregators
  • Added Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing, to avoid deoptimization of compiled code.
  • Made some stateless classes singletons, such as NoopAggregator and NoopBufferAggregator
  • Optimization of DistinctCountBufferAggregator: use Int2ObjectOpenHashMap instead of HashMap<Integer, ..>
  • Removed several unused classes

The key part of #3798.

This PR has minor intersection with #3883 (EmptyIntIterator added), but they can be merged in any order.

…ShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing.
@leventov leventov force-pushed the monomorphic-processing branch from f239fb0 to 76093b6 Compare February 1, 2017 22:57
@jihoonson
Copy link
Copy Markdown
Contributor

@leventov, nice work. I looked over the patch and am still reviewing.
So far, the patch looks good and I have only one question. How large is the runtime inspection overhead?

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Feb 2, 2017

@jihoonson thank you.

Runtime inspection overhead is about 150 ns per PooledTopNAlgorithm.scanAndAggregate() call.

public IntSet.IntIterator clone()
{
return invertedIndex.get(currRow);
return new EmptyIntIterator();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should clone return this, rather than allocating a new instance? It seems like the original intent was for EmptyIntIterator.INSTANCE to be the only object of its class.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it because Object.clone() contract implies that a new object is returned. I think this is not very harmful. This could be fixed later, if IntIterator is refactored to have a custom copy() method instead of inheriting Object.clone().

{
private final String name;
private final List<ColumnSelectorPlus<CardinalityAggregatorColumnSelectorStrategy>> selectorPlusList;
private final ColumnSelectorPlus<CardinalityAggregatorColumnSelectorStrategy>[] selectorPluses;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some places, you replace a selectorPlusList with an array (for performance?). Would it make sense to make the same change elsewhere? E.g., it looks like CardinalityAggregatorFactory specifically converts an array to a List, then passes the List to CardinalityAggregator, which just copies the List into a new array.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced List with array because RuntimeShapeInspector accepts only arrays as series of objects, not lists. Partially this is made because RuntimeShapeInspector may need to accept List as a "singular" field (e. g. to distinguish ArrayList from Collections.singletonList() or Guava's ImmutableList). Partially to enforce the conversion like made in CardinalityAggregator and CardinalityBufferAggregator, yes, for performance.

Removed unnecessary array -> list -> array conversion in CardinalityAggregatorFactory.

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Feb 7, 2017

@jeffs thanks for review. Also regarding @CalledFromHotLoop - decided to annotate only interface methods, not implementation methods, because IntelliJ doesn't copy the annotation automatically and it's too tedious to insert it manually everywhere.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leventov, 150 ns per call sounds reasonable.
I left some comments, and it would be great if the class specialization is tested as well. Maybe TopNQueryRunnerTest can be run with/without the class specialization.

{
if (prototypeClassBytecode == null) {
ClassLoader cl = prototypeClass.getClassLoader();
InputStream prototypeClassBytecodeStream = cl.getResourceAsStream(prototypeClassBytecodeName + ".class");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to close the stream.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
UNSAFE = (Unsafe) theUnsafe.get(null);
} catch (Exception e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block needs to be moved to one line below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, a catch block should start on a new line. Please find <option name="CATCH_ON_NEW_LINE" value="true" /> at here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson thanks, fixed

@leventov leventov added the WIP label Feb 7, 2017
@leventov
Copy link
Copy Markdown
Member Author

leventov commented Feb 7, 2017

Going to add some tests.

@leventov leventov removed the WIP label Feb 8, 2017
@leventov
Copy link
Copy Markdown
Member Author

leventov commented Feb 8, 2017

@jihoonson addressed comments, testing specialization now.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov, thanks for the update. I left one more comment. Everything other looks good.

.aggregators(Lists.<AggregatorFactory>newArrayList(QueryRunnerTestHelper.rowsCount))
.aggregators(duplicateAggregators(
QueryRunnerTestHelper.rowsCount,
new CountAggregatorFactory("rows1", "rows")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think simply changing its name from rows to rows1 should work. The changes in CountAggregatorFactory are not necessary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation, reverted back.

* to specialize class for the specific runtimeShape. The default value is chosen to be so that the specialized
* class will likely be compiled with C2 HotSpot compiler with the default values of *BackEdgeThreshold options.
*/
private static final long triggerSpecializationIterationsThreshold =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is type long when we are reading an int for sure?

also generally system properties are not used for configuration in druid? can you explain why they were necessary e.g. fakeSpecialize too ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, more importantly, do users need to tune it?

Copy link
Copy Markdown
Member Author

@leventov leventov Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed type of triggerSpecializationIterationsThreshold to int.

SpecializationService is not an established thing, so it's quite possible that when this code is battle-tested more, the design of SpecializationService will change so that triggerSpecializationIterationsThreshold doesn't make sense anymore or should have different semantics. So I don't want to expose it as an "official" configuration yet, because it will impose compatibility constrains.

Copy link
Copy Markdown
Member Author

@leventov leventov Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fakeSpecialize is needed exclusively in Druid development, it allows to analyze generated assembly with JITWatch. Users don't need to touch it.

}
}

private long addAndGetTotalIterations(long newIterations)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a comment, that it does so for last one hour... took me a bit to figure out

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

!Boolean.getBoolean("dontSpecializeGeneric1AggPooledTopN");
@VisibleForTesting
static boolean specializeGeneric2AggPooledTopN =
!Boolean.getBoolean("dontSpecializeGeneric2AggPooledTopN");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you intend these to be user configurable, in that case they should be part of TopNQueryConfig really

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not for users, it's for performance comparison in benchmarks

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Mar 9, 2017

@himanshug any other comments?

@himanshug
Copy link
Copy Markdown
Contributor

himanshug commented Mar 10, 2017

@leventov LGTM aside from #3889 (review)

However, this PR updates BufferAggregator to implement HotLoopCallee , BufferAggregator is a druid extension point and many users have custom aggregator extensions which will break with this. Given that change, this can only be released in 0.10.0 or 0.11.0
given that, we haven't released 0.10.0 yet, so we can try and pull this one into 0.10.0 milestone. @gianm any objections ?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 10, 2017

I think this is too big to pull into 0.10.0, we branched it off already and the branch should get bug fixes only. We could jump directly to 0.11.0 if we need to I guess.

Although, would this really be incompatible with existing extension jars? HotLoopCallee has no methods, so does adding it to BufferAggregator require extensions to be recompiled?

@leventov
Copy link
Copy Markdown
Member Author

@gianm HotLoopCallee has method inspectRuntimeShape().

Another way is to move to Java 8 in 0.10.1 and provide a default implementation of inspectRuntimeShape() in BufferAggregator.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 11, 2017

Oh, I missed that. A default implementation sounds good if that means an extension compiled for 0.10.0 would work in 0.10.1 (sorry, I'm not that familiar with what interface changes will and will not require recompiles).

@leventov
Copy link
Copy Markdown
Member Author

@gianm

https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html#jls-13.5.6

In other words, adding a default method is a binary-compatible change because it does not introduce errors at link time

@drcrallen
Copy link
Copy Markdown
Contributor

@leventov prior conversations were about requiring a java8 JVM, not about compiling all of druid for java 8 target. Requiring a java8 target for all code would need community discussion.

@leventov
Copy link
Copy Markdown
Member Author

@drcrallen why compiling Druid for Java 8 target is more risky, than requiring Java 8 JVM?

@leventov
Copy link
Copy Markdown
Member Author

@himanshug now BufferAggregator.inspectRuntimeShape() has default implementation.

@leventov
Copy link
Copy Markdown
Member Author

@himanshug could this PR be merged now?

@himanshug
Copy link
Copy Markdown
Contributor

👍

if (existingValue != null) {
existingValue.addAndGet(newIterations);
}
perMinuteIterations.computeIfAbsent(currentMinute, AtomicLong::new).addAndGet(newIterations);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line actually contains bug, fixed here: 37b04c3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants