Conversation
| */ | ||
| UNCOVERED_INTERVALS( | ||
| "uncoveredIntervals", | ||
| (oldValue, newValue) -> { |
There was a problem hiding this comment.
I think this lambda argument of Key() constructor should be indented at the same level as the first argument. The same about other constants in this enum.
There was a problem hiding this comment.
Right, we need to fix checkstyle config
There was a problem hiding this comment.
Looks like this cannot be fixed just by updating the config as checkstyle plugin has some known issues with indentation of lambdas as arguments
checkstyle/checkstyle#4638
checkstyle/checkstyle#3342
| responseContext.put(ResponseContext.CTX_UNCOVERED_INTERVALS, uncoveredIntervals); | ||
| responseContext.put(ResponseContext.CTX_UNCOVERED_INTERVALS_OVERFLOWED, uncoveredIntervalsOverflowed); | ||
| responseContext.merge(ResponseContext.Key.UNCOVERED_INTERVALS, uncoveredIntervals); | ||
| responseContext.merge(ResponseContext.Key.UNCOVERED_INTERVALS_OVERFLOWED, uncoveredIntervalsOverflowed); |
There was a problem hiding this comment.
uncoveredIntervalsOverflowed should be based on post-merge size of the list.
There was a problem hiding this comment.
it is so right now. the merge is applicable only for resulting values
| * The number of scanned rows. | ||
| */ | ||
| public static final String CTX_COUNT = "count"; | ||
| public enum Key |
There was a problem hiding this comment.
I think the design should be more extension-friendly. Some ideas for extensible enums are presented here: #6823 (comment) (completely unrelated to ResponseContext, but may be useful).
There was a problem hiding this comment.
Developed extension-friendly enum with an example
| responseContext, | ||
| JacksonUtils.TYPE_REFERENCE_MAP_STRING_OBJECT | ||
| ); | ||
| return new ResponseContext() |
There was a problem hiding this comment.
Please comment on why it creates an inner class instead of creating a DefaultResponseContext.
There was a problem hiding this comment.
Resulting ResponseContext depends on a TypeReference so in general in case of changing the TypeReference the resulting context should be also updated. To eliminate the possible resulting context update I used the inner class. If that fits I can add this description as a comment, if not I may remove inner class usage and use DefaultReponseContext as a resulting map.
| * The method removes max-length fields one by one if the resulting string length is greater than the limit. | ||
| * The resulting string might be correctly deserialized as a {@link ResponseContext}. | ||
| */ | ||
| public SerializationResult serializeWith(ObjectMapper objectMapper, int maxLength) throws JsonProcessingException |
There was a problem hiding this comment.
Please specify units (chars/bytes)
There was a problem hiding this comment.
renamed the argument (also units are mentioned in a method description)
| return query.getLimit() - (long) responseContext.get(ResponseContext.CTX_COUNT); | ||
| return query.getLimit() - (long) responseContext.get(ResponseContext.Key.COUNT); | ||
| } | ||
| return query.getLimit(); |
There was a problem hiding this comment.
Please rename this property to "scanRowsLimit" for clarity.
| @@ -358,8 +358,8 @@ private void computeUncoveredIntervals(TimelineLookup<String, ServerSelector> ti | |||
| // Which is not necessarily an indication that the data doesn't exist or is | |||
There was a problem hiding this comment.
The phrase above "This returns intervals..." is strange. I would say "Record in the response context the intervals..."
| if (responseContext.get(ResponseContext.CTX_ETAG) != null) { | ||
| builder.header(HEADER_ETAG, responseContext.get(ResponseContext.CTX_ETAG)); | ||
| responseContext.remove(ResponseContext.CTX_ETAG); | ||
| if (responseContext.get(ResponseContext.Key.ETAG) != null) { |
There was a problem hiding this comment.
Double get looks awkward. It could be
Object entityTag = responseContext.remove(ResponseContext.Key.ETAG);
if (entityTag != null) {
builder.header(HEADER_ETAG, entityTag);
}There was a problem hiding this comment.
Nice catch, updated
| builder.header(HEADER_ETAG, responseContext.get(ResponseContext.CTX_ETAG)); | ||
| responseContext.remove(ResponseContext.CTX_ETAG); | ||
| if (responseContext.get(ResponseContext.Key.ETAG) != null) { | ||
| builder.header(HEADER_ETAG, responseContext.get(ResponseContext.Key.ETAG)); |
There was a problem hiding this comment.
I think would be clearer to call this variable responseBuilder
| RESPONSE_CTX_HEADER_LEN_LIMIT | ||
| ); | ||
| if (serializationResult.isReduced()) { | ||
| log.warn( |
There was a problem hiding this comment.
Should Druid cluster operators monitor these messages? Can they do anything about them? If not, this should probably be info(). See #7362.
There was a problem hiding this comment.
I'm not sure about that, even left the log message as is although the context is not truncated anymore but "reduced". According to the corresponding PR discussion, it was important to have a log message with full context, it's likely someone has a filter in a log aggregator for this kind of message.
BTW I see your point and started discussion to only mention backward compatibility for log filters.
There was a problem hiding this comment.
Please add a comment like "Whether or not this logging statement should properly be on the WARN level (which is unclear), it's kept on the warn level for backward compatibility: see #2336".
(If I understood your comment correctly.)
There was a problem hiding this comment.
Since the change will be tagged as incompatible I decided to update the log level to info
|
Labelled |
Renamed an argument Updated comparator Replaced Pair usage with Map.Entry Added a comment about quadratic complexity Removed boolean field with an expression Renamed SerializationResult field Renamed the method merge to add and renamed several context keys Renamed field and method related to scanRowsLimit Updated a comment Simplified a block of code Renamed a variable
| * Merge function associated with a key: Object (Object oldValue, Object newValue) | ||
| * TreeMap is used to have the natural ordering of its keys | ||
| */ | ||
| private static Map<String, BaseKey> map = new TreeMap<>(); |
There was a problem hiding this comment.
Suggested to call it "registeredKeys".
There was a problem hiding this comment.
I think this static variable and the associated methods don't need to be nested in Key. They might as well be in the higher-level ResponseContext.
There was a problem hiding this comment.
Yes, they might be there. But I think we may leave them inside enum as this static variable and methods are part of enum "extension" and might be helpful in a way of understanding how this "extension" is implemented. Since there is no built-in enum extension support this implementation may be used as an example in some cases so I would prefer to keep enum and this static field and methods together.
| * The primary way of registering context keys. | ||
| * Only the keys registered this way are considered during the context merge. | ||
| */ | ||
| public static void addKey(BaseKey key) |
|
|
||
| /** | ||
| * The primary way of registering context keys. | ||
| * Only the keys registered this way are considered during the context merge. |
There was a problem hiding this comment.
Please note what happens if a context has an unregistered key. (I think, ideally, it should throw ISE.)
There was a problem hiding this comment.
Updated exceptions and related comments
| ETAG("ETag"), | ||
| /** | ||
| * Query fail time (current time + timeout). | ||
| * The final value in comparison to continuously updated TIMEOUT_AT. |
There was a problem hiding this comment.
I failed to understand this sentence after several readings. Please reword.
| * @Override public BiFunction<Object, Object, Object> getMergeFunction() { return mergeFunction; } | ||
| * } | ||
| * }</pre> | ||
| * Make sure all extension enum values added with Key.addKey method. |
There was a problem hiding this comment.
Please make Key.addKey a {@link }
| } | ||
|
|
||
| /** | ||
| * Keys associated with objects in the context. The enum is extension-friendly. |
There was a problem hiding this comment.
I think it doesn't make a lot of sense to say that "The enum is extension-friendly." Enum is not extension-friendly, but the key system (based from BaseKey) is. So I would just remove this sentence.
| /** | ||
| * Returns all keys the enum contains and the added via addKey method | ||
| */ | ||
| public static Collection<BaseKey> getKeys() |
There was a problem hiding this comment.
Suggested "getAllRegisteredKeys"
| } | ||
|
|
||
| public Object get(String key) | ||
| protected abstract Map<String, Object> getDelegate(); |
There was a problem hiding this comment.
Could you please add a comment like we are mapping from Strings rather than {@link BaseKey}s because ...?
| /** | ||
| * Serializes the context given that the resulting string length is less than the provided limit. | ||
| * The method removes max-length fields one by one if the resulting string length is greater than the limit. | ||
| * The resulting string might be correctly deserialized as a {@link ResponseContext}. |
There was a problem hiding this comment.
-
Please put this discussion in code comments.
-
I see a regression scenario: before this PR,
UNCOVERED_INTERVALSandMISSING_SEGMENTSkeys were always reasonably short. Now, they may grow very large at Broker, and Broker will prune them altogether. I suggest to hard-code reduction logic specifically forUNCOVERED_INTERVALSandMISSING_SEGMENTS.
| ", resultFormat='" + resultFormat + '\'' + | ||
| ", batchSize=" + batchSize + | ||
| ", limit=" + limit + | ||
| ", limit=" + scanRowsLimit + |
There was a problem hiding this comment.
There are no backward compatibility concerns in toString(), please change the key.
| )); | ||
| } | ||
| // quadratic complexity: while loop with map serialization on each iteration | ||
| while (!copiedMap.isEmpty() && !serializedValueEntries.isEmpty()) { |
There was a problem hiding this comment.
I think we can get away with just one empty check as both copiedMap and serializedValueEntries have same number of entries and entries are being removed from both in the loop.
There was a problem hiding this comment.
Updated the whole method
| /** | ||
| * Serializes the context given that the resulting string length is less than the provided limit. | ||
| * The method removes max-length fields one by one if the resulting string length is greater than the limit. | ||
| * The resulting string might be correctly deserialized as a {@link ResponseContext}. |
There was a problem hiding this comment.
I don't remember exactly but some of the systems relied on having the MISSING_SEGMENTS key in the header to do something, so removing the entire entry would break the logic for them. @will-lauer can confirm ?
I agree a good solution would be to have a reduce_length function in the enum itself which would reduce the length step by step (for example removing segment information one by one for missing segments key) until the header length is in bounds. Probably it can be skipped because as per my understanding the truncation previously was random without any guarantees on what keys will be present or truncated. @gianm @himanshug any thoughts on this ?
|
@pjain1 While we certainly talked about using MISSING_SEGMENTS, I don't believe we ever actually implemented it in production, so while removing it completely probably won't break anything we have, it is less than desirable. I'd prefer to have a partial list than no list at all. Or at least some other indication that the list was non-empty. |
I haven't looked at the code , but this is definitely an incompatibility with previous behavior, so should be tagged as such . It should also be mentioned in release notes. I don't think most customers would care about it. |
…tions Reducing serialized context length by removing some of its' collection elements
|
Thanks, everyone. I updated the logic of truncation and kept it as general as possible without mentioning custom context keys. New algorithm removes some values from resulting (serialized) JSON arrays (no matter if it's |
…enum # Conflicts: # server/src/main/java/org/apache/druid/server/QueryResource.java
| * @throws IllegalArgumentException if the key has already been registered. | ||
| */ | ||
| public static void addKey(BaseKey key) | ||
| public static void registerKey(BaseKey key) |
There was a problem hiding this comment.
Just in case, please make this method synchronized
There was a problem hiding this comment.
added synchronized
| public static Collection<BaseKey> getAllRegisteredKeys() | ||
| { | ||
| return map.values(); | ||
| return registeredKeys.values(); |
There was a problem hiding this comment.
Just in case, please wrap with Collections.unmodifiableCollection()
| * The method removes max-length fields one by one if the resulting string length is greater than the limit. | ||
| * The resulting string might be correctly deserialized as a {@link ResponseContext}. | ||
| * This method tries to remove some elements from context collections if it's needed to satisfy the limit. | ||
| * The resulting string might be correctly deserialized to {@link ResponseContext}. |
There was a problem hiding this comment.
Please comment on why explicit priorities of keys are not implemented.
| for (Map.Entry<String, JsonNode> e : sortedNodesByLength) { | ||
| final String fieldName = e.getKey(); | ||
| final JsonNode node = e.getValue(); | ||
| if (node.isArray()) { |
There was a problem hiding this comment.
If this block aims for MISSING_SEGMENTS and UNCOVERED_INTERVALS, please comment on that with an example.
There was a problem hiding this comment.
commented in the javadoc
| if (node.isArray()) { | ||
| if (needToRemoveCharsNumber >= node.toString().length()) { | ||
| final int lengthBeforeRemove = node.toString().length(); | ||
| // Empty array could be correctly deserialized so we remove only its elements. |
There was a problem hiding this comment.
I think the logic of this block should avoid producing empty array because it may be misleading.
There was a problem hiding this comment.
Removed empty arrays
| add(Key.TRUNCATED, true); | ||
| final ObjectNode contextJsonNode = objectMapper.valueToTree(getDelegate()); | ||
| final ArrayList<Map.Entry<String, JsonNode>> sortedNodesByLength = Lists.newArrayList(contextJsonNode.fields()); | ||
| final Comparator<Map.Entry<String, JsonNode>> valueLengthReversedComparator = |
There was a problem hiding this comment.
Please extract this comparator as a constant.
| final int lengthAfterRemove = node.toString().length(); | ||
| needToRemoveCharsNumber -= lengthBeforeRemove - lengthAfterRemove; | ||
| } else { | ||
| final ArrayNode arrNode = (ArrayNode) node; |
There was a problem hiding this comment.
This block needs a comment. It's not obvious what and why is going on here. Please extract as a method (or the upper block) if possible.
There was a problem hiding this comment.
added a comment and extracted
added some comments
|
|
||
| protected abstract Map<BaseKey, Object> getDelegate(); | ||
|
|
||
| private final Comparator<Map.Entry<String, JsonNode>> valueLengthReversedComparator = |
There was a problem hiding this comment.
It can be static final constant.
| * This method tries to remove some elements from context collections if it's needed to satisfy the limit. | ||
| * This method removes some elements from context collections if it's needed to satisfy the limit. | ||
| * There is no explicit priorities of keys which values are being truncated because for now there are only | ||
| * two potential limit breaking keys (UNCOVERED_INTERVALS and MISSING_SEGMENTS) and their values are arrays. |
There was a problem hiding this comment.
Please wrap UNCOVERED_INTERVALS and MISSING_SEGMENTS with {@link }
| ((ArrayNode) node).removeAll(); | ||
| final int lengthAfterRemove = node.toString().length(); | ||
| needToRemoveCharsNumber -= lengthBeforeRemove - lengthAfterRemove; | ||
| // We need to remove more chars than the field's lenght so removing it completely |
There was a problem hiding this comment.
Typo, "length". There is one other instance of this typo in the repository, in StringDimensionHandler - please fix it too.
There was a problem hiding this comment.
fixed both typos
| } | ||
|
|
||
| /** | ||
| * Removes {@code node}'s elements which total lenght of serialized values is greater or equal to the passed limit. |
| * @param needToRemoveCharsNumber the number of chars need to be removed. | ||
| * @return the number of removed chars. | ||
| */ | ||
| private int removeNodeElementsToSatisfyCharsLimit(ArrayNode node, int needToRemoveCharsNumber) |
There was a problem hiding this comment.
Looks like this method can be static.
| final ArrayNode arrayNode = (ArrayNode) node; | ||
| needToRemoveCharsNumber -= removeNodeElementsToSatisfyCharsLimit(arrayNode, needToRemoveCharsNumber); | ||
| if (arrayNode.size() == 0) { | ||
| // The field is empty, removing it. |
There was a problem hiding this comment.
Please extend the comment like The field is empty, removing it because an empty array field may be misleading for the recipients of the truncated response context.
| ETAG("ETag"), | ||
| /** | ||
| * Query fail time (current time + timeout). | ||
| * It is not updated continuously as TIMEOUT_AT. |
There was a problem hiding this comment.
Please wrap TIMEOUT_AT with {@link }.
Description
Aggregated
ResponseContextkeys into enum as a next step of ResponseContext refactoring. Previously the keys were just static strings so theoretically there was no obstacle to use any string as a key ofReponseContext. This refactoring eliminates this possibility by introducing the enum ofResponseContextkeys and exposing only methods requiring the enum instance as a key.Fixed the issue of merging
ResponseContextinstances returned by Historicals to BrokerThere was no rule of merging different response contexts but from my point of view, the current solution of rewriting existing values by last ones is incorrect because of losing valuable information. For example, a value associated with the key
UNCOVERED_INTERVALScontains a list of uncovered intervals and the result value is not the last returned but the concatenation of all returned lists. The same issue with the keyMISSING_SEGMENTS(a list of missing segments) andCOUNT(the number of scanned rows). Thus I decided to provide every key with a merge function. So response contexts merge became a simple procedure.Also improved
ResponseContextserialization. Previously the result of serialization was truncated if its length was greater than the limit. I believe it would be better to keep the context structure and make it deserializable thus the process of serialization removes max-length fields completely from the context until the final result length doesn't exceed the limit.This PR has:
For reviewers: the key changed classes is
ResponseContext.