Skip to content

Fix groupBy regression on time extractions named __time#3684

Closed
jon-wei wants to merge 1 commit intoapache:masterfrom
jon-wei:fix3683
Closed

Fix groupBy regression on time extractions named __time#3684
jon-wei wants to merge 1 commit intoapache:masterfrom
jon-wei:fix3683

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Nov 12, 2016

Fixes #3683

@jon-wei jon-wei added the Bug label Nov 12, 2016
@jon-wei jon-wei added this to the 0.9.2 milestone Nov 12, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 12, 2016

👍

{
DimensionHandler handler = null;

if (dimensionName.equals(Column.TIME_COLUMN_NAME)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a reminder that this logic probably needs to change with new dim typing work?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 12, 2016

Hmm after thinking about this some more I wonder if it's something we really want to support. Maybe having this query fail is OK, and it should actually fail with a more descriptive error. The reason is that naming output fields __time is problematic for features like,

  • nested queries: If the inner query has granularity "hour" and also has a dimension with outputName __time, and the outer query filters or groups on __time, is that referring to the timestamp of the inner rows or is it referring to the inner dimension named __time?
  • having specs: If a query has a metric with output name __time, and a having spec filters on __time, is that referring to the timestamp of the rows or to the metric named __time?

Currently I believe for nested queries it refers to the timestamp but for having specs it refers to the metric named __time. This is inconsistent and weird and I think there's the best answer is to disallow dimensions or metrics with output names of __time.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 12, 2016

You can see all of these issues come from the fact that groupBy results have anonymous timestamps in addition to actual named fields coming from dims and metrics and postaggregators. "Higher level" things like having specs and nested queries can access the timestamp by calling it __time, but that ends up shadowing any similarly named field from a dim or metric or postaggregator.

As a separate but related issue, I think nothing is really stopping anyone from defining a dimension and metric with the same output name. We should detect and fail on that at the constructor level, since otherwise one will clobber the other.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 12, 2016

Maybe since this used to work, we can take the attitude that for 0.9.x that the behavior is that you are allowed to have a dimension or metric with name __time, and it will be returned to you just fine in the query result, but "higher level" Druid constructs like having specs and outer queries will not be able to see it. Instead, they'll see the anonymous timestamp of the row. I think this matches how 0.9.1.1 worked.

In 0.10.0 I'd like to make it so naming something __time just fails outright.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 12, 2016

I suggested an alternate fix in #3685.

I also raised #3687 for discussing what to do about the remaining issues with __time and groupBy, and #3686 for 0.9.3 with validation of the "separate but related issue" that dimensions/metrics/postaggs could collide.

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Nov 13, 2016

Closing in favor of #3685

@jon-wei jon-wei closed this Nov 13, 2016
@jon-wei jon-wei deleted the fix3683 branch October 6, 2017 22:22
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants