Using Druid 0.10.1,
When running a topN query on a multi-valued dimension, Druid only returns the aggregation results for non-null values.
The following TopN query:
{
"queryType": "topN",
"dataSource": "audience-breakdown",
"intervals": "2017-09-07T00Z/2017-09-08T00Z",
"granularity": "all",
"dimension": {
"type": "default",
"dimension": "listenerDma",
"outputName": "DMA"
},
"aggregations": [
{
"name": "TLS",
"type": "longSum",
"fieldName": "tls"
}
],
"metric": "TLS",
"threshold": 3
}
Produces this response:
[
{
"timestamp": "2017-09-07T00:00:00.000Z",
"result": [
{
"DMA": "803",
"TLS": 2798577
},
{
"DMA": "501",
"TLS": 2509147
},
{
"DMA": "602",
"TLS": 1779172
}
]
}
]
Whereas using a similar groupBy query:
{
"queryType": "groupBy",
"dataSource": "audience-breakdown",
"intervals": "2017-09-07T00Z/2017-09-08T00Z",
"granularity": "all",
"dimensions": [
{
"type": "default",
"dimension": "listenerDma",
"outputName": "DMA"
}
],
"aggregations": [
{
"name": "TLS",
"type": "longSum",
"fieldName": "tls"
}
],
"limitSpec": {
"type": "default",
"columns": [
{
"dimension": "TLS",
"direction": "descending"
}
],
"limit": 3
}
}
We can see that the largest group is actually the one having null in the listenerDma dimension:
[
{
"version": "v1",
"timestamp": "2017-09-07T00:00:00.000Z",
"event": {
"DMA": null,
"TLS": 8770694
}
},
{
"version": "v1",
"timestamp": "2017-09-07T00:00:00.000Z",
"event": {
"DMA": "803",
"TLS": 2798577
}
},
{
"version": "v1",
"timestamp": "2017-09-07T00:00:00.000Z",
"event": {
"DMA": "501",
"TLS": 2509147
}
}
]
Is this the expected behavior of topN? If so, then why does it properly return the null group when I run my query on a single valued dimension?
For instance, if I use the city dimension (which is single valued):
{
"queryType": "topN",
"dataSource": "audience-breakdown",
"intervals": "2017-09-07T00Z/2017-09-08T00Z",
"granularity": "all",
"dimension": {
"type": "default",
"dimension": "city",
"outputName": "City"
},
"aggregations": [
{
"name": "TLS",
"type": "longSum",
"fieldName": "tls"
}
],
"metric": "TLS",
"threshold": 3
}
Druid does return an aggregated value for null
[
{
"timestamp": "2017-09-07T00:00:00.000Z",
"result": [
{
"TLS": 2412533,
"City": null
},
{
"TLS": 634690,
"City": "Los Angeles"
},
{
"TLS": 609145,
"City": "Houston"
}
]
}
]
Thanks,
Nicolas
Using Druid 0.10.1,
When running a topN query on a multi-valued dimension, Druid only returns the aggregation results for non-null values.
The following TopN query:
Produces this response:
Whereas using a similar groupBy query:
We can see that the largest group is actually the one having null in the listenerDma dimension:
Is this the expected behavior of topN? If so, then why does it properly return the null group when I run my query on a single valued dimension?
For instance, if I use the city dimension (which is single valued):
Druid does return an aggregated value for null
Thanks,
Nicolas