Allow IncrementalIndex to store Long/Float dimensions#2263
Conversation
There was a problem hiding this comment.
static variable name in UPPERCASE.
94c8b13 to
7de6bdc
Compare
There was a problem hiding this comment.
Just curious, do we have any parser in Druid currently that returns [] type for multi-valued dimension ?
There was a problem hiding this comment.
@pjain1 Do you mean if the multi-value array can have subitems with an array[] type? If so, I don't think that will occur, the values would be single numerics or Strings for the JSON parsers and all single strings for the delimited parsers, I believe.
There was a problem hiding this comment.
then why there is a need for [] types here as TYPE_MAP is used like TYPE_MAP.get(singleVal.getClass().getSimpleName()) where singleValue as you said "would be single numerics or Strings for the JSON parsers and all single strings for the delimited parsers, I believe."
There was a problem hiding this comment.
@pjain1 It's not strictly necessary now, I think I had it in there from an older implementation where I used the TYPE_MAP to get a ValueType from a Comparable[].
I kept it there in case, but I can remove it if needed.
There was a problem hiding this comment.
@jon-wei I was just curious to know why it was there. No need to remove if it does not hurt anything.
|
closed for now to rebase to master, will reopen |
|
dont' force-push your branch until the PR is open or it will render this unable to re-open |
|
@drcrallen yeah, forgot about that yesterday =\ I was able to re-open it though by doing a "git reset --hard" to the SHA at the time the PR was closed |
|
@jon-wei we need to rebase this |
6850331 to
7a92273
Compare
|
Updated this to rebase on top of changes introduced by: The PR above changed TimeAndDims to use a dictionary encoded int[] array instead of String[]; this patch uses dictionary encoding for Long/Float values as well (unnecessary). I plan to change TimeAndDims to use Comparable[] instead of int[] in a later PR, still WIP in my fork: Strings will store a dictionary encoding in TimeAndDims as in PR #2085, numerics will store their values directly in TimeAndDims. |
f2aa86d to
8e51849
Compare
| @@ -0,0 +1,178 @@ | |||
| package io.druid.benchmark; | |||
|
👍 |
| dims[dimIndex][i] = dimLookups[dimIndex].idToIndex(dimValues[dimIndex][i]); | ||
| dims[dimIndex][i] = sortedDimLookups[dimIndex].getSortedIdFromUnsortedId(dimValues[dimIndex][i]); | ||
| //TODO: in later PR, Rowboat will use Comparable[][] instead of int[][] | ||
| // Can remove dictionary encoding for numeric dims then. |
There was a problem hiding this comment.
please file an issue for this todo.
|
What is the behavior of null for Long/Float dimensions ? can we also add tests for null values in Long/Float dimension to make sure it behaves as expected. |
|
Also need to add docs for the added functionality. |
|
RE: null handling, the nulls are being converted to 0. It's not in this PR, but a follow on PR that I am finishing up: That PR reuses the existing single-value long/float column formats for the numeric dims, representing null without changes to the storage format (need some mechanism to distinguish null from normal values) or using dictionary encoding (have a dictionary entry for null) seems problematic. Throwing an exception sounds okay to me too (maybe let user configure? choice of "throw exception" or "convert null to 0") RE: docs and tests, this open PR doesn't actually enable the use of long/float dimensions, it's just preparing IncrementalIndex for later changes:
I address those points and have tests in the follow-on patch (not ready yet), I think it would make more sense to include docs in the later PR when the numeric dims are actually usable |
|
@fjy @nishantmonu51 will file issues for TODOs |
|
+1, I think this is ready to be merged. docs can be added in the follow up PR. |
|
@jon-wei do you mind resolving merge conflicts? |
|
@fjy rebased |
Allow IncrementalIndex to store Long/Float dimensions
This PR updates IncrementalIndex such that it can ingest and store Long/Float dimensions in addition to Strings. It is intended as a step towards the larger goals being discussed in the following topic:
https://groups.google.com/forum/#!topic/druid-development/obtfNJnXPDg/discussion
I ran the row-adding benchmark with type discovery enabled for testing purposes and the numbers are shown below:
With patch, non-String dims:
Without patch, String dims only: