QueryMetrics: abstraction layer of query metrics emitting (part of #3798)#3954
QueryMetrics: abstraction layer of query metrics emitting (part of #3798)#3954drcrallen merged 20 commits intoapache:masterfrom
Conversation
| @Override | ||
| public void remoteAddress(String removeAddress) | ||
| { | ||
| builder.setDimension("removeAddress", removeAddress); |
|
|
||
| public class DefaultQueryMetricsFactory implements QueryMetricsFactory | ||
| { | ||
| private static final DefaultQueryMetricsFactory INSTANCE = new DefaultQueryMetricsFactory(new ObjectMapper()); |
|
@nishantmonu51 could you please review this? |
drcrallen
left a comment
There was a problem hiding this comment.
Approach looks great overall.
needs unit tests for the new classes.
| * to emit new dimension or metric, and the user handles them manually: if the new dimension or metric is useful and not | ||
| * very expensive to process and store then emit, skip (see above Goals, 1.) otherwise. | ||
| * | ||
| * <p>QueryMetrics is designed for use from a single thread, implementations shouldn't care about thread-safety. |
There was a problem hiding this comment.
How true is this for something like an emitter with a queue/actor kind of setup? If the object reference is passed to another thread, without thread safety mechanisms, how can you guarantee the fields are set?
There was a problem hiding this comment.
QueryMetrics objects exist within query execution by design. All queries performed in a single thread, except GroupBy AFAIK, it is supposed that if GroupBy want to report some metrics from the code which is executed in parallel, it should transmit this information to the primary thread which was assigned to execute this GroupBy query, gather it and report via QueryMetrics object which is used only within this thread. I. e. it should be a headache of GroupBy query implementors, not QueryMetrics implementors. Methods of GroupByQueryMetrics should be designed accordingly.
I don't understand how it is related to the nature of emitter
There was a problem hiding this comment.
I admit this is getting into the realm where my JVM memory consistency guarantees gets a little funny
Here a queue is being added to for being operated on by a runnable in a parallel thread.
The object value consistency is kept because the emit method is in the same thread as the serialization at https://github.com/metamx/emitter/blob/master/src/main/java/com/metamx/emitter/core/HttpPostEmitter.java#L156
the emitting is then done in a separate thread as per
https://github.com/metamx/emitter/blob/master/src/main/java/com/metamx/emitter/core/HttpPostEmitter.java#L259
If the emitting object were passed instead of the byte array, I think the synchronization would be enough to keep memory consistency between the thread calling emit and the thread executing EmittingRunnable.
Such a thing could be very easy to screw up though.
|
@drcrallen added tests |
| * metrics are actually emitted, what dimensions do they have, etc. | ||
| * | ||
| * | ||
| * Goals of QueryMetrics |
There was a problem hiding this comment.
When reading these docs, I wondered, who is supposed to have the kinds of control you mention?
- The Druid project itself -- is this a tool to help us write code more easily?
- The operator of a Druid cluster -- is this a tool to help Druid cluster operators control metrics emitted by built-in query types? If so, how is this done: through config, through writing a site-specific extension, or through forking Druid?
- Authors of extensions involving new Druid query types -- is this a tool to help implement metrics emission for new queries?
There was a problem hiding this comment.
As far as I can tell from reading the rest, it's intended to provide control to the operator of a Druid cluster, who would exercise that control through a site-specific extension adding a QueryMetricsFactory impl.
| * | ||
| * Implementors expectations | ||
| * ------------------------- | ||
| * QueryMetrics is expected to be changed often, in every Druid release. Users who create their custom implementations |
There was a problem hiding this comment.
I guess this means even a patch release could change QueryMetrics, so this isn't a "stable" extension point. That's fine by me, just want to be clear.
| * a topN, groupBy or timeseries query). This method must call {@link QueryMetrics#query(Query)} with the given query | ||
| * on the created QueryMetrics object before returning. | ||
| */ | ||
| QueryMetrics<Query<?>> makeMetrics(Query<?> query); |
There was a problem hiding this comment.
I guess this means extensions that add new query types (like scan for example) can't define a custom QueryMetrics implementation. What would they do if they want to emit new dimensions? For example what if scan wanted to do one for the number of columns scanned?
There was a problem hiding this comment.
They can return their custom QueryMetrics subclass for the specific query type and cast the returned object to that subclass in the corresponding QueryToolChest. It is not as beautiful as for core query types (topN, groupBy and timeseries), but should still be possible.
Changed Javadoc a bit to make it "possible".
There was a problem hiding this comment.
@leventov how would that work, in the scan query example (druid-scan extension)? The javadoc now says:
Public extensions (belonging to the Druid source tree) should use injected
{@link QueryMetricsFactory}object for creating the QueryMetrics object, returned from this method.
I guess this means they need to call QueryMetrics.makeMetrics(query) at some point. Then, to add some custom dimensions (like numColumns for example) would they call userDimensions(...) on that QueryMetrics object, and return it?
There was a problem hiding this comment.
If someone wants to use the same dimensions for scan query, but emit them differently, i. e. more or less verbose than default, he should implement his own QueryMetricFactory with makeMetrics() similar to:
QueryMetrics makeMetrics(Query q)
{
QueryMetrics metrics = q instanceof ScanQuery ? new CustomScanQueryMetrics() : new DefaultQueryMetrics();
metrics.query(q);
return metrics;
}
If ScanQuery wants to emit unique dimensions, I think it needs to make own subclass of QueryMetrics, the same way topN, groupBy and timeseries already go. Since it's an extension, I don't think it should make a new method in QueryMetricsFactory.
On second thought it seems unnecessarily complex and creating undesired dependencies between components to require extensions to inject though the general QueryMetricsFactory. We may require extensions to provide their own means to injecting specialized implementations of their QueryMetrics subclasses. I. e. if ScanQuery needs ScanQueryMetrics, it should also make an injectable ScanMetricsQueryFactory and allow users to overwrite it. What do you think?
Regarding userDimensions(), it was actually an artifact of the old string-based approach, removed it.
There was a problem hiding this comment.
I guess the challenge here is that we want:
- Query-providing extension authors should be able to write new query types and emit metrics with custom dimensions.
- Operators should be able to customize emitted metrics, and their dimensions, even for query types that come from extensions.
- Query-providing extensions can't depend on QueryMetricsFactory-providing extensions, since the latter is site specific.
How about getting rid of QueryMetricsFactory and instead adding one for each query type, like TopNQueryMetricsFactory, GroupByQueryMetricsFactory, ScanQueryMetricsFactory, etc.?
Then Query-providing extensions in public Druid should also provide an appropriate XQueryMetricsFactory extension point.
And site-specific extensions that customize metrics would provide impls for any kinds of QueryMetricsFactories they want to override.
There was a problem hiding this comment.
I don't like the idea of adding XxxQueryMetricsFactory for each existing query type up front, while they don't yet need a custom QueryMetrics subtype. Added dedicated factories for topN, groupBy and timeseries.
For total compatibility in all directions a complex plan for moving from generic QueryMetrics to a query-specific subtype should be followed. I've put this plan in QueryMetrics javadocs, "Making subinterfaces of QueryMetrics" section. Please review this plan.
| * @return A QueryMetrics that can be used to make metrics for the provided query | ||
| */ | ||
| public abstract ServiceMetricEvent.Builder makeMetricBuilder(QueryType query); | ||
| public abstract QueryMetrics<? super QueryType> makeMetrics(QueryType query); |
There was a problem hiding this comment.
Is it expected that any extensions adding query types should use an injected QueryMetricsFactory to create this instance, rather than do it from scratch?
There was a problem hiding this comment.
I think the answer is yes for public extensions (either "core" or "contrib"). Private extensions could do whatever they want. Reflected that in Javadocs.
| /** | ||
| * Registers "query time" metric. | ||
| */ | ||
| QueryMetrics<QueryType> queryTime(long timeNs); |
There was a problem hiding this comment.
I feel like these should have some extra word on them so they don't "look" the same as the other metrics. I recognize they can chain, and that's a difference, but they still look similar to me. Maybe reportQueryTime instead of queryTime?
|
@gianm addressed comments, thanks for review |
|
@leventov, that justification makes sense. This patch looks good to me. |
|
@drcrallen do you wish to review changes that I made since you approved this PR? |
|
|
||
| public class DefaultGenericQueryMetricsFactory implements GenericQueryMetricsFactory | ||
| { | ||
| private static final DefaultGenericQueryMetricsFactory INSTANCE = new DefaultGenericQueryMetricsFactory(new DefaultObjectMapper()); |
There was a problem hiding this comment.
is it possible to inject the jsonMapper as done in QueryResouce ?
There was a problem hiding this comment.
It probably doesn't matter much, since the ObjectMapper is only used to serialize the query context as JSON. There won't be anything in there that requires fancy Jackson modules to be loaded.
Maybe worth doing if it's easy, just for cleanliness. But I think it's not essential.
There was a problem hiding this comment.
agreed with @gianm not a blocker, though should be straightforward to make it injectable.
will leave it to the author to handle this on not.
There was a problem hiding this comment.
@nishantmonu51 I think I inject ObjectMapper a few lines below. This static final instance is used only in tests, directly or indirectly: by filling QueryMetricsFactory constructor parameters of some QueryToolChests; those shorter constructors of QueryToolChests are used in tests.
Or should I inject it somehow in static final?
There was a problem hiding this comment.
cool, No, we dont need to inject that in tests.
if its only used in tests, can we move it to test code then, some Util class ?
If not adding a comment here that this is supposed to be used only in tests would also be good.
There was a problem hiding this comment.
@nishantmonu51 added comments and @VisibleForTesting annotations.
nishantmonu51
left a comment
There was a problem hiding this comment.
looks good to me, one minor comment about injecting jsonMapper
|
@nishantmonu51 reviewed it and @gianm reviewed it so I'm good |
Introduced in apache#3954 when some floating math was changed to integer math. This patch restores the old math.
* DirectDruidClient: Fix division by zero. Introduced in #3954 when some floating math was changed to integer math. This patch restores the old math. * Added comment.
|
@leventov Getting this exception while running |
|
Got similar exception with |
|
@pjain1 could you try with the attached patch? |
Problems
Metrics which Druid emits for each individual query is a sensitive area. If a lot of queries are done, emitting more metrics or adding dimensions to existing metrics could exhaust the capacity of the system which consumes the metrics of the Druid cluster (Graphite, or another smaller Druid cluster, etc.). Or it could simply be too expensive to store a lot of (unused) metrics data.
Evolution of metrics (changing dimension names, time unit of emitted time metrics, etc.) is hard or impossible, because existing users may depend on the specific dimension names, and expect specific time units of time metrics in their monitoring systems. In fact, it is a part of the Druid's public interface.
Solution
QueryMetricsis an interface which abstracts accumulating dimensions inServiceMetricEvent.Builderand emitting toServiceEmitter. There are separate methods for each dimension added and metric emitted. Custom implementations ofQueryMetricsmay drop unused dimensions or skip emitting unused metrics by overriding corresponding methods with empty bodies.QueryMetricsalso allows to control the actual dimension and metric names, and precision (time units) of the metric values. See Javadoc forQueryMetricsfor more details.There are specializations for topN, groupBy and timeseries queries:
TopNQueryMetrics,GroupByQueryMetricsandTimeseriesQueryMetrics. More query types could have specialized versions ofQueryMetrics, if needed.Generic
QueryMetricsand specializations are injected usingQueryMetricsFactoryinQueryToolChestModule.Evolution of metrics
Druid core may add hooks for adding more dimensions to existing query metrics and emitting more metrics (i. e. add more methods to
QueryMetricsinterface or one of it's specializations), but leave the bodies of those methods empty in the default implementations ofQueryMetrics, to ensure that when users update Druid they don't suddenly have more unexpected metrics. Those users who need those additional dimensions or metrics implement their customQueryMetricsandQueryMetricsFactoryand inject them in their private Druid extension.When more methods are added to the
QueryMetricsinterface in future releases, the build of private extensions (customQueryMetricsimpls) is broken, because of unimplemented methods. It forces the user to resolve this problem manually and handle all new potential dimensions and metrics emitted in the Druid core.A part of #3798