Support descending time ordering for time series query#2014
Support descending time ordering for time series query#2014fjy merged 1 commit intoapache:masterfrom
Conversation
|
Can you please provide a description of what functionality you are implementing as well as a high level overview of the approach? |
There was a problem hiding this comment.
This change is going to add a branch inside this tight loop checking for descending. It's true that branch prediction will likely do a good job with this, but we can completely eliminate this branch by doing it earlier.
Let's move the Function to a variable. Then, in the conditional on lines 250:252, we can override the reference to an implementation that is specific to the descending case.
There was a problem hiding this comment.
I agree, but can do that later? There are so much issues to be addressed.
There was a problem hiding this comment.
If it is a feature rather than a bug fix I'd rather not incur technical debt.
There was a problem hiding this comment.
@cheddar @drcrallen Sorry for delay. Addressed comment. Thanks.
|
This looks like a great start on re-ordering data to return in reverse-time order. One big thing that this highlighted for me, however, is how the
After going through all these options and realizing that I don't like any of them, what do people think about introduce a new element to the
As we get more things related to how a specific query processes time, we could add them here. What do people think? Am I over-thinking it? |
|
One thing I wonder about is how the merge happens when we go in a different time-sort order. I notice that you haven't made any changes to the comparators used for the merges that take place. I think those will also have to be adjusted for reversing the time order, but maybe not. |
|
Implemented |
|
@cheddar I've also thought on the position of For merging works, I thing I've changed related codes(comparators, etc.) but cannot sure I've done it right. I'll add more tests on current supported query types and also will make group-by and select queries support descending. Thanks. |
|
@cheddar I'm not sure how I feel about a "timeSpec" as the grouping. "intervals" is technically a filter and "granularity" and "direction" both define the formatting of the result. I think a "resultSpec" or "applySpec" is more intuitive to users. At some point we should have a "filterSpec" with "intervals" and "dimensionFilters" in it. Would love to get some feedback from @vogievetsky as well. |
|
I think of all operations in terms of (filter-)split-apply-combine where for timeseries: Filter - self explanatory ( The proposed |
|
For example, descending processing of time-series queries will make descending ordered result(good). But for group-by queries, the order of processing has no meaning and there are another ordering spec for result ordering in it. For search queries? because druid just lookups index and dictionary, it just not have meaning of order of processing. I think time-series and search queries can make differences by this. But for others, I don't know. |
|
Ok, I kinda like the I wonder if maybe we should look at creating a query with the chunks as Vadim has them laid out "split/apply/combine/filter". We could rewrite that into whatever queries we have right now as an initial implementation and then eventually implement it to actually run against segments too? If we were to take this approach, I think that the way you are doing it now would still make sense for the long-term ('cause eventually maybe those queries would be hidden behind the split/apply/combine/filter query?). What do you guys think? |
|
@cheddar +1. The way I always think about Druid's current query API is that it is the low level API and over time we should migrate a higher level API that is much easier to reason about and extend. However, what do you want to do about the changes required in this current PR? /druid/v3 would be cool :). What was /druid/v1? |
|
If we want to take the approach of trying to do a split/apply/combine/filter query to replace them "all", then I think having it at the query level like it is can make sense. So let's just leave it there and maybe try to get @vogievetsky to propose how he would prefer to specify his split/apply/combine/filter queries? |
|
Yes, |
There was a problem hiding this comment.
Technically, this interface is something that someone can extend in an extension, so this change is going to mean that this can't go out until 0.9.0 (which is our next planned release, so no big deal, really). I say this just to make sure that we tag this PR as 0.9.0 and include something in the release notes about the compatibility change.
|
This generally makes sense to me. I think that the way you went about adding the |
|
Ok, yeah, I was right in that there was a simpler way to do things. It required a bit of butchering of interfaces. Tthere were methods on QueryToolChest that shouldn't've been there. Those methods were breaking the abstraction and needed to be eliminated in order to make the simplified changes. It probably wasn't readily apparent that the problem was the bad methods on the interface, but once they are cleaned up, the code cleans up quite a bit. I did a PR against your PR branch, you can see the changes here: Let me know what you think. |
|
Merged @cheddar's patch and rebased on master. Let's see the test results. |
|
There's still two comments I'd like to see addressed, but once they are in I'll be 👍 |
|
@navis I have a comment to please update the documentation so people can know how to get results in reverse order |
802401a to
3c8bdb3
Compare
|
@nishantmonu51 do you have any more comments? |
There was a problem hiding this comment.
all those method are marked as @deprecated in the Query interface, should we mark them deprecated here too?
There was a problem hiding this comment.
all those methods in Query interface are changed to static method and that can be regarded as committing the deprecation, because it's not backward compatible anymore.
There was a problem hiding this comment.
I see what you mean. We deprecated them because we planned avoid using string parsing going forward, but that may warrant a separate discussion. I'm fine leaving as is.
There was a problem hiding this comment.
given that we technically only need to check one of those two conditions based on whether the query is descending or not, is it faster to do a check based on the descending flag, to always check both, or is there maybe a benefit to do the branching outside of the loop, i.e have something like a DescendingTimestampCheckingOffset?
There was a problem hiding this comment.
Made Ascending/DescendingTimestampCheckingOffset
There was a problem hiding this comment.
@navis we don't necessarily need to separate it out if it doesn't make a difference. My question was mainly whether we branch prediction would help us more than doing both checks, or maybe it doesn't make a difference at all, in which case we should leave the simplest code
|
@navis there's some merge conflicts now, hopefully they are small |
|
@navis any chance of resolving merge conflicts and finishing this one up? |
|
@fjy fixed conflict and addressed comments. |
|
I'll squash commits when @cheddar approves. |
There was a problem hiding this comment.
why is index transient?
There was a problem hiding this comment.
It's some kind of habit of me. I'll remove that.
|
I have verified that all of navis's changes since I looked last are good on the basic functionality. I did not verify that the caching looks good, but I'm happy with everyone else's eyes on that. so I'm 👍 |
|
Rebased on trunk & squshed. |
Support descending time ordering for time series query
|
@fjy my comment here was not addressed https://github.com/druid-io/druid/pull/2014/files#r49129298 |
No description provided.