granularity method in QueryMetrics.#4570
Conversation
leventov
left a comment
There was a problem hiding this comment.
Make this method similar to QueryMetrics.interval() and QueryMetrics.duration():
- Move it's declaration(s) next to those methods
- Call it next to the calls of those methods
- Accept
QueryTypeand returnvoid - No default implementation in QueryMetrics interface itself. It's the point of QueryMetrics to not have
defaultmethods - see it's Javadoc.
Also, Granularity.toString() should be more readable than JSON form
|
Thanks @leventov, added |
There was a problem hiding this comment.
-
Why haven't you added granularity() to root QueryMetrics/DefaultQueryMetrics? To avoid duplicating it 5 times, and also not needing to introduce SearchQueryMetrics and SelectQueryMetrics.
-
Methods added to QueryMetrics after the introduction of this abstraction should have empty default implementations, that emit nothing. See #3954 description, "Evolution of metrics" section. An example of this later addition is #4284, note how it adds empty default implementations: https://github.com/druid-io/druid/pull/4284/files#diff-25e10cfecf5e9bf9ba50838f8b7cd1caR177. To actually emit granularity in your case, you should also create your private implementation of corresponding QueryMetrics in your private extensions and configure it, see #4336.
|
Thanks @leventov for reviewing the PR. I totally agree with you on giving more flexibility to users to select which dimensions they want instead druid core deciding it, but IMO granularity is an important dimension and druid core should emit it by default. I added specialized
a) add Current patch is using |
|
How importance of a dimension is measured? Nobody needed and nobody asked for it yet (I might be wrong about this - please point if), and you are the first one who needs it and probably will stay the only one for some time. So why add extra stuff to everyone's else metrics? Also if you will no longer need this dimension, with empty default implementation it's safe to remove this method - people who didn't use it won't notice, and people who used to override it in their custom QueryMetrics (to actually emit this dimension -- they have to override!) will be "notified" by API breakage and will have a chance to react - e. g. emit it differently, or raise an issue on Github. If the default implementation emits this dimension, somebody may depend on this fact in non-programmatic way (e. g. parsing) and it's much more dangerous to revoke a dimension in this case. This is also a reason why QueryMetrics interface is introduced, see #3954. And this is why dimension that used to be emitted at the moment of QueryMetrics introduction are still emitted by default. So please leave the default implementation empty, and actually emit it in your private extension. |
|
On Select/Search, I'm ok either way, but just want to point out that SegmentMetadataQuery and DataSourceMetadataQuery already emit pretty non-sense "intervals" (see their getIntervals()) and it's used in QueryMetrics.interval() |
|
@leventov It doesn't make sense to extend |
|
@leventov I didn't read your last comment #4570 (comment). I think we are saying the same thing 👍 |
|
@niketh the point of adding methods to QueryMetrics is to call them from core, so that extensions shouldn't touch the core. |
|
Thanks @leventov. I will also ask other Druid members to share their thought on Config based approach could be:
Config based approach has some advantages:
@gianm @himanshug @cheddar @leventov Suggestions ? |
|
@akashdw I considered "config" approach from the very beginning instead of QueryMetrics. In fact it was already questioned in the original PR, see #3954 (comment), And I chosen
So the complexity of the configs needed is very high, it's much harder to refactor them than code, while this system is still fundamentally much more restricted than "interface" approach.
I understand that you don't want to do this but it's just what you need to do, if you want custom metrics and/or dimensions. It was clear when
How this statement is supported?
Why the first is easier? |
|
I wonder if we should change the direction of the conversation. One sad thing that we ran into and only just found out is that we've always wanted to have metrics emitted that represent the number of rows scanned. In looking into this, it looks like that has actually been implemented because there is a metric method added for that, but the actual code implementation isn't anywhere to be seen so we're not able to take advantage of the work someone else has already done. So, maybe we need to discuss when we think a metric or dimension should be in the default, core set and when it shouldn't. If we have that conversation, then the question becomes whether or not "granularity" should be a dimension included on all query metrics that have a granularity. I tend to think that including granularity isn't a big deal (you will get a lot more cardinality from the id or the segment ids) and it provides a nice level of drill down that can help when trying to understand the performance profile. What are your concerns with adding it to the core set of dimensions? |
|
@cheddar what do you think about adding a set of |
|
The problem with adding dimensions/metrics emitted by default, is not that it harms e. g. us (we already use our custom QueryMetrics query impls, so it's indifferent), but that it's almost impossible to take them away in the future. The amount of metrics emitted by default is like entropy, never goes down. The existing set of dimensions emitted by default is already bad, e. g. intervals, segment, id almost doesn't have any practical sense. Making own QueryMetrics has a non-trivial upfront cost (about a man-day) but Yahoo as a big user of Druid will have to do it anyway, sooner or later. But after the setup is done, the support is very easy. |
|
I'd be down with a "Full Metrics" type thing that we could turn on with a single config. Funny thing about your examples of things that aren't useful, aside from interval, we use segment and id a lot in debugging our clusters... |
|
Ok, I'll create a PR which adds "full metrics" |
|
Thanks @leventov, Should I include "full metrics" changes in this PR or I can do it in a separate PR? I also propose to include a blacklist metrics config, What do you think ? |
|
@akashdw I can post "full metrics" PR because essentially I already have it implemented internally (minus some specifics which I can remove). After that PR is merged, you could add blacklist configs to "FullMetrics" specifically, with defined semantics, not QueryMetrics "in general". E. g. it will look like |
|
Thanks @leventov, I will also update this PR to have an empty implementation of |
|
@leventov added empty implementation for |
There was a problem hiding this comment.
Add comment "Don't emit by default". Same in other places
There was a problem hiding this comment.
Please follow the 6-step procedure defined in the "Making subinterfaces of QueryMetrics for emitting custom dimensions and/or metrics for specific query types" section in QueryMetrics docs. Same for SelectQueryMetrics. Also, please update that section to use another query type in the example, because "Search" won't be a query time without specific QueryMetrics any more, as assumed in that doc.
There was a problem hiding this comment.
@VisibleForTesting says almost the same, suggested to remove comment
|
Also, consider extracting |
|
@leventov Addressed comments. |
There was a problem hiding this comment.
Please add Javadoc to this class like: This class is implemented with delegation to another QueryMetrics for compatibility, see "Making subinterfaces of QueryMetrics for emitting custom dimensions and/or metrics for specific query types" section in {@link QueryMetrics} javadoc
There was a problem hiding this comment.
Consider naming "delegateQueryMetrics" or simply "delegate"
There was a problem hiding this comment.
I think the body of this query() method should be empty, because query() should be already called on the provided delegate QueryMetrics, according to the GenericQueryMetricsFactory documentation. Please reflect this in a doc comment to the DefaultSearchQueryMetrics() constructor, like queryMetrics.query(query) must already be called on the provided queryMetrics
There was a problem hiding this comment.
This also means that bodies of other methods, that are called from query(), should be empty or even throw IllegalStateException, to ensure there is no mistake
There was a problem hiding this comment.
To be precise, I think query() should have empty body (with a comment), and the methods which used to be called from query() should throw ISE.
There was a problem hiding this comment.
Please place this method along with other methods of it's group, i. e. after queryId().
There was a problem hiding this comment.
search pkg contains another search pkg, https://github.com/druid-io/druid/tree/master/processing/src/main/java/io/druid/query/search/search.
There was a problem hiding this comment.
Indeed, but seems like a error? I see no reason for that
There was a problem hiding this comment.
Yes, seems like an error. I will create a pr either to move all the classes from search.search package to search package or rename search.search pkg to something else ?
There was a problem hiding this comment.
IMO just move from search.search to search
There was a problem hiding this comment.
This javadoc has different formatting that all other similar Javadocs, including SelectQueryMetricsFactory added in the same PR
There was a problem hiding this comment.
Instead of referencing TopN, GroupBy and Timeseries as something that should not be taken as examples of following this procedure, in the end of the description of this procedure, please reference Search and Select as examples of this procedure implemented. Also please add notes about empty or exception throwing of query() and other "pre-query-execution-time" methods to this procedure.
PR to emit granularity dimension for timeseries, search, groupBy, select and topN queries.
497a842 to
84c1493
Compare
71a845e to
65775e0
Compare
|
@akashdw please don't rebase commits after making a PR, it breaks the thread. See https://github.com/druid-io/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master |
|
Thanks @leventov, sure won't rebase the commits from next time onwards. |
|
@akashdw didn't you plan to make a PR that fixes |
|
@leventov Addressed comments and search package name correction pr got merged. Can you please review this PR. |
|
👍 |
|
@akashdw this PR broke compilation of the project. Please fix |
|
@leventov Having a look. |
|
@leventov QueryMetrics now has a new method, |
PR to emit granularity dimension for
timeseries, search, groupBy, select and topNqueries.