Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 14 additions & 20 deletions .idea/inspectionProfiles/Druid.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions distribution/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,8 @@
<argument>org.apache.druid.extensions.contrib:druid-time-min-max</argument>
<argument>-c</argument>
<argument>org.apache.druid.extensions.contrib:druid-virtual-columns</argument>
<argument>-c</argument>
<argument>org.apache.druid.extensions.contrib:accurate-cardinality</argument>
</arguments>
</configuration>
</execution>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
layout: doc_page
title: "AccurateCardinality Aggregator"
---

<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->

# AccurateCardinality Aggregator

In many case, we do need exactly count distinct.
Now, Druid offers the ability by nested group by query.
Its logic can be described as follows:

For a sql like:
```sql
select count(distinct pid) from DATASOURCE where col="val"
```
the exactly query will be like:
```sql
select count(*) from (
select pid from DATASOURCE_segments_in_historical1 where col="val" group by pid
UNION ALL
select pid from DATASOURCE_segments_in_historical2 where col="val" group by pid
UNION ALL
select pid from DATASOURCE_segments_in_historical3 where col="val" group by pid
...
) group by pid
```

For high cardinality case, the size of result transfered from historical node to broker node can be really large and leads to poor performance.
So this extension try to use bitmap as the container for the result data from the historical.
The performance can be 10 times better the nested group by method.

To use this extension, make sure to [include](../../operations/including-extensions.html) the `accurate-cardinality` extension.

DSL query example:

```json
{
"queryType": "timeseries",
"dataSource": "sample_datasource",
"granularity": "day",
"aggregations": [
{
"type": "accurateCardinality",
"name": "uv",
"fieldName": "clientid"
}
],
"intervals": [
"2018-12-01T00:00:00.000/2018-12-31T00:00:00.000"
]
}
```

SQL query example:

```json
{
"query": "select ACCURATE_CARDINALITY(clientid) as uv from sample_datasource where __time >= '2018-12-01 00:00:00' and __time < '2018-12-31 00:00:00'",
"context": {
"sqlTimeZone":"Asia/Shanghai"
}
}
```

There are some limitations:

1. `accurateCardinality` aggregator can only support dimension which is long type.
2. `ACCURATE_CARDINALITY` can only have one operand
110 changes: 110 additions & 0 deletions extensions-contrib/accurate-cardinality/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>druid</artifactId>
<groupId>org.apache.druid</groupId>
<version>0.13.0-incubating-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>

<groupId>org.apache.druid.extensions.contrib</groupId>
<artifactId>accurate-cardinality</artifactId>
<name>accurate-cardinality</name>
<description>accurate-cardinality</description>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>


<dependencies>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-core</artifactId>
<version>${project.parent.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-processing</artifactId>
<version>${project.parent.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-sql</artifactId>
<version>${project.parent.version}</version>
<scope>provided</scope>
</dependency>

<!-- Tests -->
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-processing</artifactId>
<version>${project.parent.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-core</artifactId>
<version>${project.parent.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.easymock</groupId>
<artifactId>easymock</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.caliper</groupId>
<artifactId>caliper</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-sql</artifactId>
<version>${project.parent.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.druid</groupId>
<artifactId>druid-server</artifactId>
<version>${project.parent.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
</dependencies>

</project>
Loading