Skip to content

Fix time-ordered scan queries on realtime segments#7546

Merged
jon-wei merged 7 commits intoapache:masterfrom
justinborromeo:fix-multihydrant-realtime-query
Apr 26, 2019
Merged

Fix time-ordered scan queries on realtime segments#7546
jon-wei merged 7 commits intoapache:masterfrom
justinborromeo:fix-multihydrant-realtime-query

Conversation

@justinborromeo
Copy link
Copy Markdown
Contributor

@justinborromeo justinborromeo commented Apr 25, 2019

This patch fixes the following 2 bugs introduced in #7133:

  1. ClassCastException is thrown when a timestamp less than Integer.MAX_VALUE is being scan queried with a ASCENDING or DESCENDING time-ordering. The original timestamp comparator was written to assume that all timestamps are longs. However, if a timestamp less than the maximum integer value is used, the broker will parse the value as an Integer. Since Integers can't be cast to Longs, an exception is thrown. This patch introduces a type check for timestamps that properly parses it if it's either an Integer or a Long.

  2. UnsupportedOperationException is thrown when time-ordered scan querying a realtime segment. In the case of published segments, there's a 1:1 mapping between query runners and segment descriptors. The existing implementation relied on this mapping to calculate which query runners corresponded to each interval. However, this mapping doesn't exist when querying a realtime segment since a query runner is generated for each hydrant but only one segment descriptor is generated. This patch introduces a SinkQueryRunners class that implements Iterable<QueryRunner> and holds mappings between intervals and query runners.

This patch was validated on an in-house cluster.

The following types of queries have been tested for regressions:

  • Scan
  • GroupBy
  • Timeseries
  • TopN
  • Search
  • Segment Metadata

@gianm gianm added this to the 0.15.0 milestone Apr 26, 2019
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Apr 26, 2019

Tagged 0.15.0 since it fixes bugs that were introduced in patches that are new in 0.15.0.


// If timestamp is < Integer.MAX_VALUE, it'll be an Integer object which can't be cast to a Long. This method
// checks the type of the timestamp object and converts to a long value
private long convertTimestampObjectToLong(Object timestampObj)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use DimensionHandlerUtils.convertObjectToLong() instead

final List<ScanResultValue> results4 =
QueryPlus.wrap(query4).run(appenderator, new HashMap<>()).toList();
Assert.assertEquals(2, results4.size()); // 1 per segment
// Should return the 2 rows with `met` values of 8 and 64 (based on the segment spec provided)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add checks for the values within the results?

@jon-wei jon-wei merged commit 07dd742 into apache:master Apr 26, 2019
@justinborromeo justinborromeo deleted the fix-multihydrant-realtime-query branch April 26, 2019 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants