Skip to content

add truncated timestamp column to row#3438

Closed
kaijianding wants to merge 1 commit intoapache:masterfrom
kaijianding:truncate
Closed

add truncated timestamp column to row#3438
kaijianding wants to merge 1 commit intoapache:masterfrom
kaijianding:truncate

Conversation

@kaijianding
Copy link
Copy Markdown
Contributor

@kaijianding kaijianding commented Sep 7, 2016

We're working on sql on druid, when we transform such sql
select * from someDataSource where (time > '20160801 010000' and time < '20160801 100000' and dimA='xxx') or (time > '20160801 110000 ' and time <= '20160801 160000' and dimA='yyy')
to a druid query, we need 2 things from druid:
1: the timestampSpec info (done in another PR): the timestamp column and format, then we can know the 'time' range can be transformed to 'intervals':['2016-08-01 01:00:00/2016-08-01 10:00:00', '2016-08-01 11:00:00/2016-08-01 16:00:00']

"timestampSpec": {
  "column": "time",
  "format": "yyyyMMdd HHmmss"
}

2: the ability to do complex filter on time.

When ingestion, the timestamp column defined in timestampSpec is explicit excluded in dimension list because of the need of rollup, this patch adds the timestamp column back, but with truncated value according to queryGranularity and in origin format, in other word, it is the dimension version of __time.

If queryGranularity is set to 'HOUR', the original value is '20160801 013556', the final value in row is '20160801 010000'.

This feature is very useful when filtering by complex time range which 'intervals' can't represent(but still can use intervals to narrow down to less segments), like the above example.

The above query can be transformed to BoundDimFilter on 'time' or IntervalDimFilter on __time within an OrDimFilter.
But for a sql like
select * from someDataSource where time > '201608' and time < '201609', we can't use IntervalDimFilter due to the '201608' is not in pattern 'yyyyMMdd HHmmss' and we don't have another pattern to parse the '201608' string to a valid representation that the Interval class can understand.

To use this feature, just set the option includeTruncatedTimestampColumnAsDimension=true in dimensionsSpec in DataSchema:

"parser": {
  "type": "string",
  "parseSpec": {
    "timestampSpec": {
      "column": "time",
      "format": "iso"
    },
    "dimensionsSpec": {
      "includeTruncatedTimestampColumnAsDimension": true,
      "dimensions": [
        "appId",
        "containerId",
        "cluster"
      ]
    },
    "format": "json"
  }
}

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Sep 7, 2016

Instead of ingesting an extra dimension, would it suffice to add a time format string parameter to the IntervalDimFilter(PR for that here, for reference: #3315), to allow it to accept more than ISO-8601 strings?

@fjy fjy added this to the 0.9.3 milestone Sep 7, 2016
@fjy fjy added the Feature label Sep 7, 2016
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Sep 8, 2016

IMO, if we need to have two time columns with basically the same data, that seems like a workaround to an underlying deficiency in the query language. Ideally we could improve the query language to make things work on a single time column.

#3180 was meant to solve (2) without need for a separate column. @kaijianding could you double check if that functionality satisfies your needs? If not, do you think it'd be possible to go down the road of using the __time column directly through some additional options (like @jon-wei had suggested)?

@kaijianding
Copy link
Copy Markdown
Contributor Author

For a sql like
select * from someDataSource where time > '201608' and time < '201609', we can't just say it is in 'yyyyMM' format, '201608' can be '20160801' or '20160801 10' in other sqls, thus IntervalDimFilter can't handle this case even it can support customized format due to we don't know the format in advance.
But we still can do string compare on dimension. @jon-wei

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Sep 8, 2016

@kaijianding does something like this work?

"filter" : {
  "type": "bound",
  "dimension" : "__time",
  "lower" : "201608",
  "upper" : "201609",
  "extractionFn" : {
    "type" : "timeFormat",
    "format" : "yyyyMMdd HHmmss"
  }
}

@kaijianding
Copy link
Copy Markdown
Contributor Author

It seems just what I need! Will try this solution, @gianm .
But I think TimeFormatExtractionFn still need to support the formats used in TimestampParser, like posix/nano/ruby/millis/iso.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 8, 2016

@jon-wei @gianm can u guys take a look here?

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Nov 8, 2016

Does this PR still need review, or was the solution proposed by @gianm sufficient?

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 10, 2016

@kaijianding does this PR still require review?

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Nov 23, 2016

@kaijianding did you still need this PR or @gianm suggestion is not enough ?

@kaijianding
Copy link
Copy Markdown
Contributor Author

Close this PR for now, will open it if I still need it in future

@gianm gianm removed this from the 0.10.0 milestone Feb 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants