Conversation
jon-wei
left a comment
There was a problem hiding this comment.
Had a couple of comments, otherwise code LGTM.
Do you have any examples (e.g., in another branch) of how these combiners would be used?
|
|
||
| @Override | ||
| public AggregateCombiner makeAggregateCombiner() | ||
| { |
There was a problem hiding this comment.
Is it possible to implement this, or is there something preventing this from being supported?
There was a problem hiding this comment.
It doesn't make sense, and useless: only metric columns are aggregated during indexing, not the timestamp column.
There was a problem hiding this comment.
the timestamp min/max aggs can be used to aggregate columns other than _time:
To use this feature, a "timeMin" or "timeMax" aggregator must be included at indexing time. They can apply to any columns that can be converted to timestamp, which include Long, DateTime, Timestamp, and String types.
There was a problem hiding this comment.
Ok, I added. Though I'm not sure there is a way currently to use it. And if anything, because of #4658 I doubt that anybody used it. (Also fixed that bug)
| * object returned from {@link io.druid.segment.ObjectColumnSelector#get()} must not be modified, and must not become | ||
| * a subject for modification during subsequent combine() calls. | ||
| * | ||
| * Since the state of AggregateCombiner is underfined before {@link #reset} is ever called on it, the effects of calling |
|
|
||
| import javax.annotation.Nullable; | ||
|
|
||
| public interface ObjectColumnSelector<T> extends ColumnValueSelector |
There was a problem hiding this comment.
Could we make ObjectColumnSelector<T extends Number>?
There was a problem hiding this comment.
Object in ObjectColumnSelector is usually not a Number. It's a number only when numeric columns (long, double, float) are presented as object columns, but there are also "genuinely" object columns (thetaSketch, histogram, etc.)
…r() implementation
|
@jon-wei could you please merge this PR? |
drcrallen
left a comment
There was a problem hiding this comment.
Had some comments about the use of combine vs fold and its consistency of intention in the druid code.
| * | ||
| * @see AggregatorFactory#combine | ||
| */ | ||
| void combine(ColumnValueSelector selector); |
There was a problem hiding this comment.
Elsewhere this functionality is called fold, which to me indicates you are eliminating the prior state of this object, which is the behavior here. combine seems to indicate that it may or may not return a new object instead of folding in the other values. Would fold be more appropriate here?
There was a problem hiding this comment.
Specifically the contract for combine is very different compared to https://github.com/druid-io/druid/pull/4676/files#diff-90e2d51a725b4d59f09e2f8b740b7f37R51, where modifications to lhs and rhs are allowed.
| * @see io.druid.segment.IndexMerger | ||
| */ | ||
| public AggregateCombiner makeAggregateCombiner() | ||
| { |
There was a problem hiding this comment.
Is there a way to have a default implementation that prevents proprietary aggregators from hard-failing when they upgrade?
There was a problem hiding this comment.
marked as Incompatible for this until "how to make it not incompatible" is highlighted some more.
There was a problem hiding this comment.
It's not possible to provide a safe default implementation, because delegating to combine() may modify source objects, that keeps the #4672 bug alive and it's why AggregateCombiner prohibits mutation. OTOH we don't know how to clone the input object for any kind of object.
There was a problem hiding this comment.
The combiners ONLY apply at indexing time currently (or very near future) right? all custom QUERY aggregators would expect to still work?
There was a problem hiding this comment.
Yes, AggregateCombiners exist specifically for metric rollup during ingestion
There was a problem hiding this comment.
This is still incompatible, because aggregators that could have been used during ingestion will not be able to be used after this. The aggregators in question arguably only make sense at query time, but it is still an incompatibility.
There was a problem hiding this comment.
Can you update the master PR comment with information on what will throw a UOE after this patch?
There was a problem hiding this comment.
Updated comment. BTW nothing is broken right after this PR, because it just adds functionality, but doesn't use it yet. Actual breakage will happen after N more PRs. However they all target for Druid 0.11.
| } | ||
| } | ||
|
|
||
| public Histogram(Histogram other) |
There was a problem hiding this comment.
(minor) Other complex aggregators rely solely on copyFrom is there an explicit need for having a constructor-copy method?
There was a problem hiding this comment.
As opposed to, for example, implementing clone or just using copyFrom?
There was a problem hiding this comment.
The use of this constructor could be replaced by copyFrom() without adding ugly and questionable code like empty non-sense Histogram object. Replacing with clone() is possible, but I don't see why it's better. "Effective Java" advocates preferring copy constructors over clone().
|
Just waiting for TravisCI |
AggregateCombiner interface is a replacement of
AggregatorFactory.combine()during index merging, in the grand plan of #4622.Incompatibility of this PR is that it adds
AggregateCombinerandAggregatorFactory.makeAggregateCombiner()that throwsUnsupportedOperationExceptionby default, and authors of custom aggregators in private extensions need to override this method with an actual implementation to keep their aggregator working for metric rollup during ingestion.