Skip to content

Add AggregateCombiners#4676

Merged
drcrallen merged 6 commits intoapache:masterfrom
metamx:metric-combiners
Aug 21, 2017
Merged

Add AggregateCombiners#4676
drcrallen merged 6 commits intoapache:masterfrom
metamx:metric-combiners

Conversation

@leventov
Copy link
Copy Markdown
Member

@leventov leventov commented Aug 10, 2017

AggregateCombiner interface is a replacement of AggregatorFactory.combine() during index merging, in the grand plan of #4622.

Incompatibility of this PR is that it adds AggregateCombiner and AggregatorFactory.makeAggregateCombiner() that throws UnsupportedOperationException by default, and authors of custom aggregators in private extensions need to override this method with an actual implementation to keep their aggregator working for metric rollup during ingestion.

@leventov leventov added this to the 0.11.0 milestone Aug 10, 2017
@leventov leventov changed the title Add MetricCombiners Add AggregateCombiners Aug 10, 2017
@jon-wei jon-wei self-assigned this Aug 11, 2017
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of comments, otherwise code LGTM.

Do you have any examples (e.g., in another branch) of how these combiners would be used?


@Override
public AggregateCombiner makeAggregateCombiner()
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to implement this, or is there something preventing this from being supported?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense, and useless: only metric columns are aggregated during indexing, not the timestamp column.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timestamp min/max aggs can be used to aggregate columns other than _time:

https://github.com/druid-io/druid/blob/master/docs/content/development/extensions-contrib/time-min-max.md

To use this feature, a "timeMin" or "timeMax" aggregator must be included at indexing time. They can apply to any columns that can be converted to timestamp, which include Long, DateTime, Timestamp, and String types.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added. Though I'm not sure there is a way currently to use it. And if anything, because of #4658 I doubt that anybody used it. (Also fixed that bug)

* object returned from {@link io.druid.segment.ObjectColumnSelector#get()} must not be modified, and must not become
* a subject for modification during subsequent combine() calls.
*
* Since the state of AggregateCombiner is underfined before {@link #reset} is ever called on it, the effects of calling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underfined -> undefined

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed


import javax.annotation.Nullable;

public interface ObjectColumnSelector<T> extends ColumnValueSelector
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make ObjectColumnSelector<T extends Number>?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object in ObjectColumnSelector is usually not a Number. It's a number only when numeric columns (long, double, float) are presented as object columns, but there are also "genuinely" object columns (thetaSketch, histogram, etc.)

@leventov
Copy link
Copy Markdown
Member Author

@jon-wei could you please merge this PR?

Copy link
Copy Markdown
Contributor

@drcrallen drcrallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some comments about the use of combine vs fold and its consistency of intention in the druid code.

*
* @see AggregatorFactory#combine
*/
void combine(ColumnValueSelector selector);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere this functionality is called fold, which to me indicates you are eliminating the prior state of this object, which is the behavior here. combine seems to indicate that it may or may not return a new object instead of folding in the other values. Would fold be more appropriate here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically the contract for combine is very different compared to https://github.com/druid-io/druid/pull/4676/files#diff-90e2d51a725b4d59f09e2f8b740b7f37R51, where modifications to lhs and rhs are allowed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to fold()

* @see io.druid.segment.IndexMerger
*/
public AggregateCombiner makeAggregateCombiner()
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to have a default implementation that prevents proprietary aggregators from hard-failing when they upgrade?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marked as Incompatible for this until "how to make it not incompatible" is highlighted some more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to provide a safe default implementation, because delegating to combine() may modify source objects, that keeps the #4672 bug alive and it's why AggregateCombiner prohibits mutation. OTOH we don't know how to clone the input object for any kind of object.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combiners ONLY apply at indexing time currently (or very near future) right? all custom QUERY aggregators would expect to still work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AggregateCombiners exist specifically for metric rollup during ingestion

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still incompatible, because aggregators that could have been used during ingestion will not be able to be used after this. The aggregators in question arguably only make sense at query time, but it is still an incompatibility.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the master PR comment with information on what will throw a UOE after this patch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment. BTW nothing is broken right after this PR, because it just adds functionality, but doesn't use it yet. Actual breakage will happen after N more PRs. However they all target for Druid 0.11.

}
}

public Histogram(Histogram other)
Copy link
Copy Markdown
Contributor

@drcrallen drcrallen Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Other complex aggregators rely solely on copyFrom is there an explicit need for having a constructor-copy method?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to, for example, implementing clone or just using copyFrom?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of this constructor could be replaced by copyFrom() without adding ugly and questionable code like empty non-sense Histogram object. Replacing with clone() is possible, but I don't see why it's better. "Effective Java" advocates preferring copy constructors over clone().

@drcrallen
Copy link
Copy Markdown
Contributor

Just waiting for TravisCI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants