Add "uncoveredIntervals" to responseContext#2058
Conversation
|
Minor comments but 👍. This is a really useful feature IMO |
bb2866f to
f33e082
Compare
|
@fjy updated. Is it ok to merge into master? |
|
I'm cool but need another +1 |
|
This seems handy but I want to make sure I understand what its trying to communicate. It currently only evaluates if a result has ANY data within an interval, regardless of the completeness of the data. I suppose I'm not quite sure what the intended problem to solve here is. If it is simply that perhaps there is a hole in the data, then I can understand this does communicate "the data in the query is supposed to be indexed and loaded" But if it is trying to communicate the completeness of the data, it is not immediately obvious if that is covered by this. For example, if a historical node is in the process of loading a date range(say a rolling restart with no replication), but it is no all available, then this would return that the interval is present, when in fact it might not be "complete". Could you expand a little more (and put in a code comment) what this is supposed to communicate? |
There was a problem hiding this comment.
Header in other files is
/*
* Druid - a distributed column store.
* Copyright 2012 - 2015 Metamarkets Group Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
There was a problem hiding this comment.
This code is not copyright Metamarkets, so I removed that.
There was a problem hiding this comment.
I'm less concerned about the lack of MMX and more concerned about the lack of ANY copyright owner, which seems odd
There was a problem hiding this comment.
we've been using this one on all new source files, recently @himanshug updated old ones too in #2050
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
There was a problem hiding this comment.
Oh, cool. Ok, I just grabbed the license from one of the old files and it was still the old thing. Hadn't been keeping up on the language, will update to this.
|
I'm 👍 once its documented exactly what its supposed to communicate, and the verbiage of the copyright notice is verified |
|
When you issue a query to a broker, you say, "give me results for this time range". When the broker looks to the segments that exist on the server, sometimes it won't actually have a complete set of segments for the entire time range specified. This will populate the "uncoveredIntervals" that did not have a segment loaded. I do not believe there are any races with historicals reloading or anything like that. The broker rewrites its queries into per-segment specifiers before sending to historicals, so if the broker is unaware of the segment, the result won't include results from that segment. |
|
@cheddar if a query interval is used which extends beyond the scope of data (example: " But if an interval is well within the scope of the data, then this should only return intervals which definitely have "missing" data. which is a different statement than "should have all data". So I was just hoping you could clarify as a comment that this is not intended as an indicator that all data in the interval is fully loaded. |
|
@drcrallen It is telling you which intervals were covered by segments and which were not. That seems very succinct and technically accurate. I'm not understanding what distinction you are trying to draw, perhaps provide a counter proposal for a comment in the code (I'm guessing that's what you are wanting?) |
|
Proposed code comment:
|
|
@cheddar one more comment to please add some docs on how this can be used. |
|
@fjy do we already have response headers documented somewhere in the docs, or will a new section need to be created? |
|
@cheddar Hmmm, they are not documented. We need a query response page similar to query context. It might make sense to follow up those docs in another PR |
|
@fjy agree with separate PR for docs since they don't exist yet |
This change will cause the CachingClusteredClient to populate the "uncoveredIntervals" key in the responseContext map. The value will be any intervals that were requested in the query but are not actually covered by the segments underlying the data source. For unit testing, CachingClisteredClientTest is testing the caching behavior of the object and it is pretty hard to adjust it to only test this new behavior, so I created a new, parallel "CachingClusteredClientFunctionalityTest" to simplify testing just basic functionality.
f33e082 to
a361859
Compare
|
@drcrallen I think I understand what you are getting at now. I added a code comment that hopefully covers what you are trying to say and provides a bit more color as to why? Please verify that it's clarifying the thing that you were hoping to clarify. @fjy Put in the updated text on the header. Sorry for the confusion. Given the statement that we should have another PR for the docs, is this g2g? |
|
Cool, thanks. 👍 after travis |
Add "uncoveredIntervals" to responseContext
This change will cause the CachingClusteredClient to populate the "uncoveredIntervals"
key in the responseContext map. The value will be any intervals that were requested in
the query but are not actually covered by the segments underlying the data source.
For unit testing, CachingClisteredClientTest is testing the caching behavior of the
object and it is pretty hard to adjust it to only test this new behavior, so I created
a new, parallel "CachingClusteredClientFunctionalityTest" to simplify testing just
basic functionality.