Skip to content

[WIP] zero-filling on uncovered intervals#2259

Closed
jaehc wants to merge 1 commit intoapache:masterfrom
jaehc:zero-filling-on-uncoveredintervals
Closed

[WIP] zero-filling on uncovered intervals#2259
jaehc wants to merge 1 commit intoapache:masterfrom
jaehc:zero-filling-on-uncoveredintervals

Conversation

@jaehc
Copy link
Copy Markdown
Contributor

@jaehc jaehc commented Jan 13, 2016

As #2106 suggests, the way how zero-filling works seems not to meet someone's expectation. some people might expect that the result with zero-filling is continuous in a timeseries even when there are absent data points here and there.

This PR makes use of uncovered intervals refering PR #2058 to find out which intervals are absent.(However, I am aware of the fact that there is still a ambiguous point about what's the meaning of absence of data segments refering #2058 conversations).

I wrote code just working right now and please check its usefulness, then I will write test code or others.

@nishantmonu51
Copy link
Copy Markdown
Member

It should also be noted that #2058 also introduced an issue - #2108
since this PR builds functionality around uncovered Intervals, It will be useful if we can figure out a way to fix #2108

@jaehc
Copy link
Copy Markdown
Contributor Author

jaehc commented Jan 13, 2016

@nishantmonu51 Cool. It would be better to handle #2108 first.

@fjy fjy added the Discuss label Jan 19, 2016
@fjy fjy added this to the 0.9.1 milestone Feb 18, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Feb 19, 2016

@CHOIJAEHONG1 can we get some tests for this behavior? Also, please fix merge conflicts

@jaehc
Copy link
Copy Markdown
Contributor Author

jaehc commented Feb 23, 2016

@fly okay, I will add some tests and solve conflicts.

@fjy fjy modified the milestones: 0.9.2, 0.9.1 Feb 27, 2016
@jaehc
Copy link
Copy Markdown
Contributor Author

jaehc commented Mar 3, 2016

About the UT.
I could not use testQueryCaching(). The reason is that it can only verify mocked results by the same mocked results using SeverExpectation. The situation where expecting and mocking results are mixed up implies someone can not add extra expecting results besides mocking ones. I tried to come up with a workaround but it was not easy.
So, I decided to make a simple test case building the context, which omits some features like testing under randomness and test the function with incrementally expanding intervals.

@jaehc
Copy link
Copy Markdown
Contributor Author

jaehc commented Mar 3, 2016

resolve merge conflicts and rebase it

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 3, 2016

@CHOIJAEHONG1 I believe this feature should have a query context flag and be disabled by default, otherwise it might confuse existing users that do not expect any results when there is no data.

Also, it should be noted that zero may or may not be the appropriate result depending on the aggregator. We should make sure it does not break any post-aggregations, especially ones that convert complex types into numbers, since zero may not be a valid results for those.

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 3, 2016

@CHOIJAEHONG1 I believe we should add a query context flag to explicitly enable filling or not.

@xvrl xvrl closed this Mar 3, 2016
@jaehc
Copy link
Copy Markdown
Contributor Author

jaehc commented Mar 4, 2016

@xvrl I think the same way about handling complex types and others. It looks hard to fill empty values of complex columns especially when we use a javascript aggregator because the broker does not know what the type of argument is in a javascript function. Although the broker gets the type information by AggregatorFactory.getTypeName(), it is not a actual type of column to be accssed. A javascript aggregators returns float as its type name;

How about showing only timestamp without any values for empty buckets like the below?

{
  "timestamp" : "2013-09-04T23:00:00.000Z",
  "result" : {
    "langHyperUnique" : 0,
    "longMin" : null,
    "longSsum" : null,
    "count" : null,
    "sum(strlen(language))" : null,
    "sum(log(added))" : null,
    "uniqLang" : 0,
    "longMax" : null
  }

By the way, as-is skipEmptyBuckets works this way. The first is empty one. longMin shows 9223372036854775807 and longMax is -9223372036854775808 which dosen't seem to be good in some cases(I am not sure). Do you think this should be improved?

{
  "timestamp" : "2013-08-31T10:00:00.000Z",
  "result" : {
    "langHyperUnique" : 0.0,
    "longMin" : 9223372036854775807,
    "count" : 0,
    "longSum" : 0,
    "sum(strlen(language))" : 10.0,
    "sum(log(added))" : 10.0,
    "uniqLang" : 0.0,
    "longMax" : -9223372036854775808
  }
}, {
  "timestamp" : "2013-08-31T11:00:00.000Z",
  "result" : {
    "langHyperUnique" : 1.0002442201269182,
    "longMin" : 905,
    "count" : 1,
    "longSsum" : 1,
    "sum(strlen(language))" : 12.0,
    "sum(log(added))" : 16.807934943699927,
    "uniqLang" : 1.0002442201269182,
    "longMax" : 905
  }
}

query

{
  "queryType": "timeseries",
  "dataSource": "wikipedia",
  "intervals": [ "2013-08-28/2013-09-05" ],
  "granularity": "hour",
  "aggregations": [
     {"type": "count", "name" : "count"},
     {"type": "longSum", "fieldName": "count", "name": "longSsum"},
     {"type": "longMin", "fieldName": "added", "name": "longMin"},
     {"type": "longMax", "fieldName": "added", "name": "longMax"},
     {"type": "javascript", "name": "sum(log(added))",
                "fieldNames" : ["added"],
                "fnAggregate" : "function(current, a) { return current + Math.log(a); }",
                "fnCombine" : "function(partialA, partialB) { return partialA + partialB; }",
                "fnReset" : "function() { return 10; }"},
     {"type": "javascript", "name": "sum(strlen(language))",
                "fieldNames" : ["language"],
                "fnAggregate" : "function(current, a) { return current + a.length; }",
                "fnCombine" : "function(partialA, partialB) { return partialA + partialB; }",
                "fnReset" : "function() { return 10; }"},
    {"type": "cardinality", "name" : "uniqLang", "fieldNames" : ["language"] },
    {"type": "hyperUnique", "name" : "langHyperUnique", "fieldName" : "langHyperUnique" }
  ],
  "context": {
    "skipEmptyBuckets": "false",
    "uncoveredIntervalsLimit": 100,
    "useCache" : "false",
    "populateCache" : "false"
  }
}

@gianm gianm removed this from the 0.9.2 milestone Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants