add truncated timestamp column to row#3438
Conversation
|
Instead of ingesting an extra dimension, would it suffice to add a time format string parameter to the IntervalDimFilter(PR for that here, for reference: #3315), to allow it to accept more than ISO-8601 strings? |
|
IMO, if we need to have two time columns with basically the same data, that seems like a workaround to an underlying deficiency in the query language. Ideally we could improve the query language to make things work on a single time column. #3180 was meant to solve (2) without need for a separate column. @kaijianding could you double check if that functionality satisfies your needs? If not, do you think it'd be possible to go down the road of using the __time column directly through some additional options (like @jon-wei had suggested)? |
|
For a sql like |
|
@kaijianding does something like this work? "filter" : {
"type": "bound",
"dimension" : "__time",
"lower" : "201608",
"upper" : "201609",
"extractionFn" : {
"type" : "timeFormat",
"format" : "yyyyMMdd HHmmss"
}
} |
|
It seems just what I need! Will try this solution, @gianm . |
|
Does this PR still need review, or was the solution proposed by @gianm sufficient? |
|
@kaijianding does this PR still require review? |
|
@kaijianding did you still need this PR or @gianm suggestion is not enough ? |
|
Close this PR for now, will open it if I still need it in future |
We're working on sql on druid, when we transform such sql
select * from someDataSource where (time > '20160801 010000' and time < '20160801 100000' and dimA='xxx') or (time > '20160801 110000 ' and time <= '20160801 160000' and dimA='yyy')to a druid query, we need 2 things from druid:
1: the timestampSpec info (done in another PR): the timestamp column and format, then we can know the 'time' range can be transformed to 'intervals':['2016-08-01 01:00:00/2016-08-01 10:00:00', '2016-08-01 11:00:00/2016-08-01 16:00:00']
2: the ability to do complex filter on time.
When ingestion, the timestamp column defined in timestampSpec is explicit excluded in dimension list because of the need of rollup, this patch adds the timestamp column back, but with truncated value according to queryGranularity and in origin format, in other word, it is the dimension version of __time.
If queryGranularity is set to 'HOUR', the original value is '20160801 013556', the final value in row is '20160801 010000'.
This feature is very useful when filtering by complex time range which 'intervals' can't represent(but still can use intervals to narrow down to less segments), like the above example.
The above query can be transformed to BoundDimFilter on 'time' or IntervalDimFilter on __time within an OrDimFilter.
But for a sql like
select * from someDataSource where time > '201608' and time < '201609', we can't use IntervalDimFilter due to the '201608' is not in pattern 'yyyyMMdd HHmmss' and we don't have another pattern to parse the '201608' string to a valid representation that theIntervalclass can understand.To use this feature, just set the option includeTruncatedTimestampColumnAsDimension=true in dimensionsSpec in DataSchema: