Enum of ResponseContext keys by esevastyanov · Pull Request #8157 · apache/druid

esevastyanov · 2019-07-25T12:08:28Z

Description

Aggregated ResponseContext keys into enum as a next step of ResponseContext refactoring. Previously the keys were just static strings so theoretically there was no obstacle to use any string as a key of ReponseContext. This refactoring eliminates this possibility by introducing the enum of ResponseContext keys and exposing only methods requiring the enum instance as a key.

Fixed the issue of merging `ResponseContext` instances returned by Historicals to Broker

There was no rule of merging different response contexts but from my point of view, the current solution of rewriting existing values by last ones is incorrect because of losing valuable information. For example, a value associated with the key UNCOVERED_INTERVALS contains a list of uncovered intervals and the result value is not the last returned but the concatenation of all returned lists. The same issue with the key MISSING_SEGMENTS (a list of missing segments) and COUNT (the number of scanned rows). Thus I decided to provide every key with a merge function. So response contexts merge became a simple procedure.

Also improved ResponseContext serialization. Previously the result of serialization was truncated if its length was greater than the limit. I believe it would be better to keep the context structure and make it deserializable thus the process of serialization removes max-length fields completely from the context until the final result length doesn't exceed the limit.

This PR has:

been self-reviewed.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added unit tests or modified existing tests to cover new code paths.

For reviewers: the key changed classes is ResponseContext.

leventov · 2019-07-25T15:22:44Z

+     */
+    UNCOVERED_INTERVALS(
+        "uncoveredIntervals",
+            (oldValue, newValue) -> {


I think this lambda argument of Key() constructor should be indented at the same level as the first argument. The same about other constants in this enum.

Right, we need to fix checkstyle config

Looks like this cannot be fixed just by updating the config as checkstyle plugin has some known issues with indentation of lambdas as arguments
checkstyle/checkstyle#4638
checkstyle/checkstyle#3342

leventov · 2019-07-25T15:26:01Z

-        responseContext.put(ResponseContext.CTX_UNCOVERED_INTERVALS, uncoveredIntervals);
-        responseContext.put(ResponseContext.CTX_UNCOVERED_INTERVALS_OVERFLOWED, uncoveredIntervalsOverflowed);
+        responseContext.merge(ResponseContext.Key.UNCOVERED_INTERVALS, uncoveredIntervals);
+        responseContext.merge(ResponseContext.Key.UNCOVERED_INTERVALS_OVERFLOWED, uncoveredIntervalsOverflowed);


uncoveredIntervalsOverflowed should be based on post-merge size of the list.

it is so right now. the merge is applicable only for resulting values

leventov · 2019-07-25T15:30:11Z

-   * The number of scanned rows.
-   */
-  public static final String CTX_COUNT = "count";
+  public enum Key


I think the design should be more extension-friendly. Some ideas for extensible enums are presented here: #6823 (comment) (completely unrelated to ResponseContext, but may be useful).

Developed extension-friendly enum with an example

leventov · 2019-07-25T15:32:00Z

+        responseContext,
+        JacksonUtils.TYPE_REFERENCE_MAP_STRING_OBJECT
+    );
+    return new ResponseContext()


Please comment on why it creates an inner class instead of creating a DefaultResponseContext.

Resulting ResponseContext depends on a TypeReference so in general in case of changing the TypeReference the resulting context should be also updated. To eliminate the possible resulting context update I used the inner class. If that fits I can add this description as a comment, if not I may remove inner class usage and use DefaultReponseContext as a resulting map.

leventov · 2019-07-25T15:34:46Z

+   * The method removes max-length fields one by one if the resulting string length is greater than the limit.
+   * The resulting string might be correctly deserialized as a {@link ResponseContext}.
+   */
+  public SerializationResult serializeWith(ObjectMapper objectMapper, int maxLength) throws JsonProcessingException


Please specify units (chars/bytes)

renamed the argument (also units are mentioned in a method description)

leventov · 2019-07-25T16:04:14Z

-      return query.getLimit() - (long) responseContext.get(ResponseContext.CTX_COUNT);
+      return query.getLimit() - (long) responseContext.get(ResponseContext.Key.COUNT);
    }
    return query.getLimit();


Please rename this property to "scanRowsLimit" for clarity.

leventov · 2019-07-25T16:08:09Z

@@ -358,8 +358,8 @@ private void computeUncoveredIntervals(TimelineLookup<String, ServerSelector> ti
        // Which is not necessarily an indication that the data doesn't exist or is


The phrase above "This returns intervals..." is strange. I would say "Record in the response context the intervals..."

leventov · 2019-07-25T16:11:05Z

-        if (responseContext.get(ResponseContext.CTX_ETAG) != null) {
-          builder.header(HEADER_ETAG, responseContext.get(ResponseContext.CTX_ETAG));
-          responseContext.remove(ResponseContext.CTX_ETAG);
+        if (responseContext.get(ResponseContext.Key.ETAG) != null) {


Double get looks awkward. It could be

Object entityTag = responseContext.remove(ResponseContext.Key.ETAG); if (entityTag != null) { builder.header(HEADER_ETAG, entityTag); }

Nice catch, updated

leventov · 2019-07-25T16:12:34Z

-          builder.header(HEADER_ETAG, responseContext.get(ResponseContext.CTX_ETAG));
-          responseContext.remove(ResponseContext.CTX_ETAG);
+        if (responseContext.get(ResponseContext.Key.ETAG) != null) {
+          builder.header(HEADER_ETAG, responseContext.get(ResponseContext.Key.ETAG));


I think would be clearer to call this variable responseBuilder

leventov · 2019-07-25T16:14:32Z

+            RESPONSE_CTX_HEADER_LEN_LIMIT
+        );
+        if (serializationResult.isReduced()) {
+          log.warn(


Should Druid cluster operators monitor these messages? Can they do anything about them? If not, this should probably be info(). See #7362.

I'm not sure about that, even left the log message as is although the context is not truncated anymore but "reduced". According to the corresponding PR discussion, it was important to have a log message with full context, it's likely someone has a filter in a log aggregator for this kind of message.
BTW I see your point and started discussion to only mention backward compatibility for log filters.

Please add a comment like "Whether or not this logging statement should properly be on the WARN level (which is unclear), it's kept on the warn level for backward compatibility: see #2336".

(If I understood your comment correctly.)

Since the change will be tagged as incompatible I decided to update the log level to info

leventov · 2019-07-25T16:19:15Z

Labelled ResponseContext because this PR heavily affects ResponseContext, a @PublicApi class.

Renamed an argument Updated comparator Replaced Pair usage with Map.Entry Added a comment about quadratic complexity Removed boolean field with an expression Renamed SerializationResult field Renamed the method merge to add and renamed several context keys Renamed field and method related to scanRowsLimit Updated a comment Simplified a block of code Renamed a variable

leventov · 2019-07-30T15:07:20Z

-     * Merge function associated with a key: Object (Object oldValue, Object newValue)
+     * TreeMap is used to have the natural ordering of its keys
     */
+    private static Map<String, BaseKey> map = new TreeMap<>();


Suggested to call it "registeredKeys".

I think this static variable and the associated methods don't need to be nested in Key. They might as well be in the higher-level ResponseContext.

Yes, they might be there. But I think we may leave them inside enum as this static variable and methods are part of enum "extension" and might be helpful in a way of understanding how this "extension" is implemented. Since there is no built-in enum extension support this implementation may be used as an example in some cases so I would prefer to keep enum and this static field and methods together.

leventov · 2019-07-30T15:07:41Z

+     * The primary way of registering context keys.
+     * Only the keys registered this way are considered during the context merge.
+     */
+    public static void addKey(BaseKey key)


What about "registerKey"?

leventov · 2019-07-30T15:08:41Z

+
+    /**
+     * The primary way of registering context keys.
+     * Only the keys registered this way are considered during the context merge.


Please note what happens if a context has an unregistered key. (I think, ideally, it should throw ISE.)

Updated exceptions and related comments

leventov · 2019-07-30T15:10:10Z

    ETAG("ETag"),
    /**
     * Query fail time (current time + timeout).
+     * The final value in comparison to continuously updated TIMEOUT_AT.


I failed to understand this sentence after several readings. Please reword.

reworded it

leventov · 2019-07-30T15:10:34Z

+   *   @Override public BiFunction<Object, Object, Object> getMergeFunction() { return mergeFunction; }
+   * }
+   * }</pre>
+   * Make sure all extension enum values added with Key.addKey method.


Please make Key.addKey a {@link }

leventov · 2019-07-30T15:12:32Z

+  }
+
+  /**
+   * Keys associated with objects in the context. The enum is extension-friendly.


I think it doesn't make a lot of sense to say that "The enum is extension-friendly." Enum is not extension-friendly, but the key system (based from BaseKey) is. So I would just remove this sentence.

leventov · 2019-07-30T15:20:48Z

+    /**
+     * Returns all keys the enum contains and the added via addKey method
+     */
+    public static Collection<BaseKey> getKeys()


Suggested "getAllRegisteredKeys"

leventov · 2019-07-30T15:24:10Z

  }

-  public Object get(String key)
+  protected abstract Map<String, Object> getDelegate();


Could you please add a comment like we are mapping from Strings rather than {@link BaseKey}s because ...?

leventov · 2019-07-30T15:29:53Z

+  /**
+   * Serializes the context given that the resulting string length is less than the provided limit.
+   * The method removes max-length fields one by one if the resulting string length is greater than the limit.
+   * The resulting string might be correctly deserialized as a {@link ResponseContext}.


Please put this discussion in code comments.

I see a regression scenario: before this PR, UNCOVERED_INTERVALS and MISSING_SEGMENTS keys were always reasonably short. Now, they may grow very large at Broker, and Broker will prune them altogether. I suggest to hard-code reduction logic specifically for UNCOVERED_INTERVALS and MISSING_SEGMENTS.

leventov · 2019-07-30T15:31:53Z

           ", resultFormat='" + resultFormat + '\'' +
           ", batchSize=" + batchSize +
-           ", limit=" + limit +
+           ", limit=" + scanRowsLimit +


There are no backward compatibility concerns in toString(), please change the key.

pjain1 · 2019-07-30T21:15:46Z

+        ));
+      }
+      // quadratic complexity: while loop with map serialization on each iteration
+      while (!copiedMap.isEmpty() && !serializedValueEntries.isEmpty()) {


I think we can get away with just one empty check as both copiedMap and serializedValueEntries have same number of entries and entries are being removed from both in the loop.

Updated the whole method

pjain1 · 2019-07-31T07:25:28Z

+  /**
+   * Serializes the context given that the resulting string length is less than the provided limit.
+   * The method removes max-length fields one by one if the resulting string length is greater than the limit.
+   * The resulting string might be correctly deserialized as a {@link ResponseContext}.


I don't remember exactly but some of the systems relied on having the MISSING_SEGMENTS key in the header to do something, so removing the entire entry would break the logic for them. @will-lauer can confirm ?

I agree a good solution would be to have a reduce_length function in the enum itself which would reduce the length step by step (for example removing segment information one by one for missing segments key) until the header length is in bounds. Probably it can be skipped because as per my understanding the truncation previously was random without any guarantees on what keys will be present or truncated. @gianm @himanshug any thoughts on this ?

will-lauer · 2019-07-31T15:09:19Z

@pjain1 While we certainly talked about using MISSING_SEGMENTS, I don't believe we ever actually implemented it in production, so while removing it completely probably won't break anything we have, it is less than desirable. I'd prefer to have a partial list than no list at all. Or at least some other indication that the list was non-empty.

himanshug · 2019-07-31T19:55:44Z

Also improved ResponseContext serialization. Previously the result of serialization was truncated if its length was greater than the limit. I believe it would be better to keep the context structure and make it deserializable thus the process of serialization removes max-length fields completely from the context until the final result length doesn't exceed the limit.

I haven't looked at the code , but this is definitely an incompatibility with previous behavior, so should be tagged as such . It should also be mentioned in release notes. I don't think most customers would care about it.
Now, if possible, a better strategy might be to retain all keys but trim the bigger ones so to keep serialized response limited.
Or, don't change the behavior since we haven't had use cases where this has been a problem so far. In general, it might be good to keep refactoring sort of PRs separate from PRs that introduce a change in behavior.
Finally, if this PR does get merged with a behavioral change, then should be tagged incompatible .

…tions Reducing serialized context length by removing some of its' collection elements

esevastyanov · 2019-07-31T21:45:48Z

Thanks, everyone. I updated the logic of truncation and kept it as general as possible without mentioning custom context keys. New algorithm removes some values from resulting (serialized) JSON arrays (no matter if it's MISSING_SEGMENTS or UNCOVERED_INTERVALS) to satisfy the length limit. So as previously, the serialized context has some but not all array's values if the limit exceeded. I also added a boolean indicator if a context was truncated.

…enum # Conflicts: # server/src/main/java/org/apache/druid/server/QueryResource.java

pjain1

👍

leventov · 2019-08-01T18:03:29Z

+     * @throws IllegalArgumentException if the key has already been registered.
     */
-    public static void addKey(BaseKey key)
+    public static void registerKey(BaseKey key)


Just in case, please make this method synchronized

added synchronized

Turns out it was not quite enough, #9106

leventov · 2019-08-01T18:13:32Z

+    public static Collection<BaseKey> getAllRegisteredKeys()
    {
-      return map.values();
+      return registeredKeys.values();


Just in case, please wrap with Collections.unmodifiableCollection()

leventov · 2019-08-01T18:24:35Z

-   * The method removes max-length fields one by one if the resulting string length is greater than the limit.
-   * The resulting string might be correctly deserialized as a {@link ResponseContext}.
+   * This method tries to remove some elements from context collections if it's needed to satisfy the limit.
+   * The resulting string might be correctly deserialized to {@link ResponseContext}.


Please comment on why explicit priorities of keys are not implemented.

leventov · 2019-08-01T18:28:34Z

+      for (Map.Entry<String, JsonNode> e : sortedNodesByLength) {
+        final String fieldName = e.getKey();
+        final JsonNode node = e.getValue();
+        if (node.isArray()) {


If this block aims for MISSING_SEGMENTS and UNCOVERED_INTERVALS, please comment on that with an example.

commented in the javadoc

leventov · 2019-08-01T18:30:16Z

+        if (node.isArray()) {
+          if (needToRemoveCharsNumber >= node.toString().length()) {
+            final int lengthBeforeRemove = node.toString().length();
+            // Empty array could be correctly deserialized so we remove only its elements.


I think the logic of this block should avoid producing empty array because it may be misleading.

Removed empty arrays

leventov · 2019-08-01T18:32:26Z

+      add(Key.TRUNCATED, true);
+      final ObjectNode contextJsonNode = objectMapper.valueToTree(getDelegate());
+      final ArrayList<Map.Entry<String, JsonNode>> sortedNodesByLength = Lists.newArrayList(contextJsonNode.fields());
+      final Comparator<Map.Entry<String, JsonNode>> valueLengthReversedComparator =


Please extract this comparator as a constant.

leventov · 2019-08-01T18:37:12Z

+            final int lengthAfterRemove = node.toString().length();
+            needToRemoveCharsNumber -= lengthBeforeRemove - lengthAfterRemove;
+          } else {
+            final ArrayNode arrNode = (ArrayNode) node;


This block needs a comment. It's not obvious what and why is going on here. Please extract as a method (or the upper block) if possible.

added a comment and extracted

added some comments

leventov · 2019-08-02T16:42:09Z


+  protected abstract Map<BaseKey, Object> getDelegate();
+
+  private final Comparator<Map.Entry<String, JsonNode>> valueLengthReversedComparator =


It can be static final constant.

leventov · 2019-08-02T16:44:19Z

-   * This method tries to remove some elements from context collections if it's needed to satisfy the limit.
+   * This method removes some elements from context collections if it's needed to satisfy the limit.
+   * There is no explicit priorities of keys which values are being truncated because for now there are only
+   * two potential limit breaking keys (UNCOVERED_INTERVALS and MISSING_SEGMENTS) and their values are arrays.


Please wrap UNCOVERED_INTERVALS and MISSING_SEGMENTS with {@link }

leventov · 2019-08-02T16:46:05Z

-            ((ArrayNode) node).removeAll();
-            final int lengthAfterRemove = node.toString().length();
-            needToRemoveCharsNumber -= lengthBeforeRemove - lengthAfterRemove;
+            // We need to remove more chars than the field's lenght so removing it completely


Typo, "length". There is one other instance of this typo in the repository, in StringDimensionHandler - please fix it too.

fixed both typos

leventov · 2019-08-02T16:46:54Z

  }

+  /**
+   * Removes {@code node}'s elements which total lenght of serialized values is greater or equal to the passed limit.


leventov · 2019-08-02T16:47:36Z

+   * @param needToRemoveCharsNumber the number of chars need to be removed.
+   * @return the number of removed chars.
+   */
+  private int removeNodeElementsToSatisfyCharsLimit(ArrayNode node, int needToRemoveCharsNumber)


Looks like this method can be static.

leventov · 2019-08-02T16:50:43Z

+            final ArrayNode arrayNode = (ArrayNode) node;
+            needToRemoveCharsNumber -= removeNodeElementsToSatisfyCharsLimit(arrayNode, needToRemoveCharsNumber);
+            if (arrayNode.size() == 0) {
+              // The field is empty, removing it.


Please extend the comment like The field is empty, removing it because an empty array field may be misleading for the recipients of the truncated response context.

leventov · 2019-08-02T16:51:49Z

+    ETAG("ETag"),
+    /**
+     * Query fail time (current time + timeout).
+     * It is not updated continuously as TIMEOUT_AT.


Please wrap TIMEOUT_AT with {@link }.

Eugene Sevastianov added 6 commits July 25, 2019 13:50

Refactored ResponseContext and aggregated its keys into Enum

d62807e

Added unit tests for ResponseContext and refactored the serialization

7c0d71d

Removed unused methods

051b5e3

Fixed code style

1c4fd67

Fixed code style

8b30da2

Fixed code style

9ad6f26

leventov requested changes Jul 25, 2019

View reviewed changes

leventov added the Design Review label Jul 25, 2019

leventov added the Refactoring label Jul 25, 2019

Eugene Sevastianov added 4 commits July 25, 2019 19:22

Made SerializationResult static

1d4df6f

Added JsonProperty annotation to renamed ScanQuery field

69c747f

Extension-friendly context key implementation

7ed5089

esevastyanov marked this pull request as ready for review July 29, 2019 17:15

leventov requested changes Jul 30, 2019

View reviewed changes

pjain1 reviewed Jul 31, 2019

View reviewed changes

Refactored ResponseContext: updated delegate type, comments and excep…

9b001f3

…tions Reducing serialized context length by removing some of its' collection elements

Fixed tests

2106869

pjain1 reviewed Aug 1, 2019

View reviewed changes

Comment thread processing/src/main/java/org/apache/druid/query/context/ResponseContext.java

Eugene Sevastianov added 2 commits August 1, 2019 14:49

Merge remote-tracking branch 'upstream/master' into response-context-…

ed637be

…enum # Conflicts: # server/src/main/java/org/apache/druid/server/QueryResource.java

Simplified response context truncation during serialization

91e7c22

pjain1 approved these changes Aug 1, 2019

View reviewed changes

leventov requested changes Aug 1, 2019

View reviewed changes

leventov reviewed Aug 1, 2019

View reviewed changes

Extracted a method of removing elements from a response context and

180075c

added some comments

leventov requested changes Aug 2, 2019

View reviewed changes

Fixed typos and updated comments

f1d41b4

leventov approved these changes Aug 2, 2019

View reviewed changes

leventov merged commit 3f3162b into apache:master Aug 3, 2019

leventov deleted the response-context-enum branch August 3, 2019 09:05

clintropolis added this to the 0.16.0 milestone Aug 8, 2019

clintropolis mentioned this pull request Aug 19, 2019

use Number instead of long for response context #8342

Merged

2 tasks

		@@ -358,8 +358,8 @@ private void computeUncoveredIntervals(TimelineLookup<String, ServerSelector> ti
		// Which is not necessarily an indication that the data doesn't exist or is


		protected abstract Map<BaseKey, Object> getDelegate();

		private final Comparator<Map.Entry<String, JsonNode>> valueLengthReversedComparator =

Conversation

esevastyanov commented Jul 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fixed the issue of merging ResponseContext instances returned by Historicals to Broker

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leventov commented Jul 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esevastyanov commented Jul 25, 2019 •

edited

Loading

Fixed the issue of merging `ResponseContext` instances returned by Historicals to Broker