add CachingClusteredClient benchmark, refactor some stuff#8089
Conversation
|
LGTM overall. Would you fix the Line 198 of BrokerServerView? It should be |
…-client-benchmark
…rStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select
| new TopNQueryRunnerFactory( | ||
| new StupidPool<>( | ||
| "TopNQueryRunnerFactory-bufferPool", | ||
| () -> ByteBuffer.allocate(10 * 1024 * 1024) |
There was a problem hiding this comment.
Please extract this 10 * 1024 * 1024 together with the one above in the code if they have to be equal or as different constants if they don't have to be equal and comment about that.
| .aggregators(new LongSumAggregatorFactory("sumLongSequential", "sumLongSequential")) | ||
| .granularity(Granularity.fromString(queryGranularity)) | ||
| .metric("sumLongSequential") | ||
| .threshold(20480) |
There was a problem hiding this comment.
Please add a comment explaining something about this number, e. g. why it has to be 20480.
There was a problem hiding this comment.
I talked to @jihoonson (who wrote the benchmark originally) and we don't believe the number is actually significant, it was just an arbitrarily large number to make the result meaningful. I changed it to 10000 and have added a comment about it. I think it likely this will maybe be moved into a benchmark parameter in the future to compare merges for different thresholds, but I will save that until I actually have different merges to compare.
There was a problem hiding this comment.
I think I set it to 20480 in #6629, but don't remember where this number came from. I think threshold could be any number if it's not too small so that parallel merge can help with merging results.
| */ | ||
| public interface BinaryFn<Type1, Type2, OutType> | ||
| @FunctionalInterface | ||
| public interface CombiningFunction<T> |
There was a problem hiding this comment.
Please use the standard BinaryOperator or add a Javadoc comment justifying the existence of CombiningFunction.
There was a problem hiding this comment.
Ah, good point. All of our custom functions gave me tunnel vision and I forgot that Java exists 😅. I don't think the existence of CombiningFunction is really justified since it's just for vanity/cosmetic reasons to clarify what it's doing, which I don't think are necessary.
There was a problem hiding this comment.
I have removed this interface with the latest commit to this PR.
| @@ -40,9 +40,11 @@ public BaseSequence( | |||
| public <OutType> OutType accumulate(OutType initValue, final Accumulator<OutType, T> fn) | |||
| * Creates an ordering comparator that is used to order results. This ordering function is used in the defaul | ||
| * {@link ResultMergeQueryRunner} provided by {@link QueryToolChest#mergeResults(QueryRunner)} | ||
| */ | ||
| public Ordering<ResultType> createOrderingFn(Query<ResultType> query) |
There was a problem hiding this comment.
Please use Comparator or explain in the Javadoc comment why Ordering is used here instead of Comparator.
There was a problem hiding this comment.
It's currently producing an Ordering because the CombiningSequence created by ResultMergeQueryRunner takes that instead of a Comparator. However, it looks like the Ordering in CombiningSequence can be swapped to use a regular Comparator, so I think this could likely be safely changed.
There was a problem hiding this comment.
I have swapped uses of Ordering with Comparator in CombiningSequence, ResultMergeQueryRunner, the QueryToolChest implementations, and the GroupByStrategy implementations.
| public int compare(Result<Object> r1, Result<Object> r2) | ||
| { | ||
| return r1.getTimestamp().compareTo(r2.getTimestamp()); | ||
| (r1, r2) -> r1.getTimestamp().compareTo(r2.getTimestamp()), |
There was a problem hiding this comment.
Please use Comparator.comparing(Result::getTimestamp)
|
|
||
| public class TopNMetricSpecOptimizationsTest | ||
| { | ||
| private static final List<AggregatorFactory> aggs = Lists.newArrayList( |
There was a problem hiding this comment.
Please call static final constant with all caps
| && !segmentWatcherConfig.getWatchedDataSources().contains(input.rhs.getDataSource())) { | ||
| return false; | ||
| } | ||
| this.segmentFilter = metadataAndSegment -> { |
There was a problem hiding this comment.
Type of metadataAndSegment is not obvious, please use (Pair<DruidServerMetadata, DataSegment> metadataAndSegment) -> ...
| if (arg2 == null) { | ||
| return arg1; | ||
| } | ||
| private final CombiningFunction<Integer> plus = (arg1, arg2) -> { |
There was a problem hiding this comment.
Please make this function static. I would call it PLUS_NULLABLE to highlight why it's not just Integer::sum.
| return arg2; | ||
| } | ||
| return arg1; | ||
| (arg1, arg2) -> { |
There was a problem hiding this comment.
Could you please extract GuavaUtils.firstNonNull and use a method reference here? The doc for that method should note that Objects.firstNonNull() cannot be used itself because it's one of those methods causing Guava incompatibility (see #6948). Please also add Guava's Objects.firstNonNull() this method to forbidden-apis.
There was a problem hiding this comment.
This pattern is in a lot of places, but this is the only place specifically where it's done to just return one or the other, not as a mechanism to bail early before doing some sort of combine. Is it still worth moving to GuavaUtils or should it just be a private static method of this class?
Additionally, this method seems @Nullable which makes it have a different contract than Objects.firstNonNull, which requires one of the 2 arguments not be null. Prohibiting it on this functions behalf doesn't seem necessary, so are we doing it for a reason in #6948?
There was a problem hiding this comment.
I went ahead and did this ^ with the latest commits, extracting to GuavaUtils.firstNonNull and adding to forbidden api
…of Ordering, other review adjustments
| * null. | ||
| */ | ||
| @Nullable | ||
| public static <T> T firstNonNull(T arg1, T arg2) |
There was a problem hiding this comment.
Please annotate both parameters @Nullable
|
|
||
| /** | ||
| * Creates an ordering comparator that is used to order results. This ordering function is used in the defaul | ||
| * Creates an ordering comparator that is used to order results. This comparator is used in the defaul |
| * {@link ResultMergeQueryRunner} provided by {@link QueryToolChest#mergeResults(QueryRunner)} | ||
| */ | ||
| public Ordering<ResultType> createOrderingFn(Query<ResultType> query) | ||
| public Comparator<ResultType> createComparator(Query<ResultType> query) |
There was a problem hiding this comment.
Optional: maybe call it createResultComparator.
| */ | ||
| @Nullable | ||
| default Ordering<Row> createOrderingFn(Query<Row> queryParam) | ||
| default Comparator<Row> createComparator(Query<Row> queryParam) |
There was a problem hiding this comment.
I think createResultComparator would be a clearer name for this method
…Null nullable parameters
|
Thanks for review @leventov and @jihoonson! |
Description
This PR adds a benchmark for
CachingClusteredClientand some refactoring of the query processing pipeline to provide the foundation for testing approaches to parallel broker merges.Benchmarks can be run with a command like the following:
Substituting benchmark cache directory as appropriate. This benchmark could potentially be improved in the future to precompute the results to merge and strictly measure the merge, but for now I am retaining the overall approach of the original from #6629.
Background
I'm having a go at parallel broker merges, making another attempt to achieve the goals of #5913 and #6629, eventually planning to attempt the
ForkJoinPoolinasyncModeapproach suggested by @leventov in this thread. Before that, in order to untangle things a bit, I've taken the benchmarks from #6629 (credit to @jihoonson) and updated/simplified them to take advantage of some of the changes toSegmentGeneratorfrom #6794, to allow a persistent cache for the generated benchmark segments for much faster benchmarking. I've also extracted some of the useful refactorings and got a bit more adventurous. This should help isolate these supporting changes from any future PR which adds parallel merging, reducing review overhead.Refactoring
CombiningFunction<T>Added
CombiningFunction<T>, a new@FunctionalInterfaceto replaceBinaryFn<Type1, Type2, OutType>, since all actual usages were of the formBinaryFn<T, T, T>and being strictly used in merging sequences/iterators/iterables, etc.QueryToolChestandResultMergeQueryRunnerIn order to split out the mechanisms useful during merge from the merge implementation,
QueryToolChestnow has 2 additional functions:and
For group-by queries,
GroupByStrategyalso has these method signatures, sinceGroupByQueryToolchestis delegating these things to the strategy.These methods are passed into a refactored, non-abstract
ResultMergeQueryRunner, as function generators, that given aQueryproduce either aCombiningFunctionorOrderingrespectively.ConnectionCountServerSelectorStrategyis nowWeightedServerSelectorStrategyI did not refactorI reverted this change since it felt sort of artificial at this point, in favor of doing something like this when we actually need it.QueryableDruidServerin quite the same manner as #6629, but I did still modifyQueryableDruidServerandQueryRunnerto add agetWeightmethod, as suggested by @drcrallen in this comment thread to make the selector strategy a bit more generic instead of hard castingQueryRunnerto aDirectDruidClientto get the number of connections. I'm sort unsure about this one, this refactor might have made a bit more sense in the context of the changes toQueryableDruidServerin #6629, but I still think it's maybe worth doing?Removed
OrderedMergingIterator,OrderedMergingSequence, andSortingMergeIteratorhave been removed, since they were strictly used by their tests.This PR has: