Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions docs/content/development/extensions-core/test-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
layout: doc_page
---

# Test Stats Aggregators

Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.

Make sure to include `druid-stats` extension in order to use these aggregrators.

## Z-Score for two sample ztests post aggregator

Please refer to [https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf](http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf) for more details.

z = (p1 - p2) / S.E. (assuming null hypothesis is true)

Please see below for p1 and p2.
Please note S.E. stands for standard error where

S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }

(p1 – p2) is the observed difference between two sample proportions.

### zscore2sample post aggregator
* **`zscore2sample`**: calculate the z-score using two-sample z-test while converting binary variables (***e.g.*** success or not) to continuous variables (***e.g.*** conversion rate).

```json
{
"type": "zscore2sample",
"name": "<output_name>",
"successCount1": <post_aggregator> success count of sample 1,
"sample1Size": <post_aggregaror> sample 1 size,
"successCount2": <post_aggregator> success count of sample 2,
"sample2Size" : <post_aggregator> sample 2 size
}
```

Please note the post aggregator will be converting binary variables to continuous variables for two population proportions. Specifically

p1 = (successCount1) / (sample size 1)

p2 = (successCount2) / (sample size 2)

### pvalue2tailedZtest post aggregator

* **`pvalue2tailedZtest`**: calculate p-value of two-sided z-test from zscore
- ***pvalue2tailedZtest(zscore)*** - the input is a z-score which can be calculated using the zscore2sample post aggregator


```json
{
"type": "pvalue2tailedZtest",
"name": "<output_name>",
"zScore": <zscore post_aggregator>
}
```

## Example Usage

In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.

A JSON query example can be as follows:

```json
{
...
"postAggregations" : {
"type" : "pvalue2tailedZtest",
"name" : "pvalue",
"zScore" :
{
"type" : "zscore2sample",
"name" : "zscore",
"successCount1" :
{ "type" : "constant",
"name" : "successCountFromPopulation1Sample",
"value" : 300
},
"sample1Size" :
{ "type" : "constant",
"name" : "sampleSizeOfPopulation1",
"value" : 500
},
"successCount2":
{ "type" : "constant",
"name" : "successCountFromPopulation2Sample",
"value" : 450
},
"sample2Size" :
{ "type" : "constant",
"name" : "sampleSizeOfPopulation2",
"value" : 600
}
}
}
}

```
4 changes: 4 additions & 0 deletions extensions-core/stats/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@
<version>${project.parent.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
</dependency>

<!-- Tests -->
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
import com.google.common.collect.ImmutableList;
import com.google.inject.Binder;
import io.druid.initialization.DruidModule;
import io.druid.query.aggregation.teststats.PvaluefromZscorePostAggregator;
import io.druid.query.aggregation.teststats.ZtestPostAggregator;
import io.druid.query.aggregation.variance.StandardDeviationPostAggregator;
import io.druid.query.aggregation.variance.VarianceAggregatorFactory;
import io.druid.query.aggregation.variance.VarianceFoldingAggregatorFactory;
Expand All @@ -43,7 +45,9 @@ public List<? extends Module> getJacksonModules()
new SimpleModule().registerSubtypes(
VarianceAggregatorFactory.class,
VarianceFoldingAggregatorFactory.class,
StandardDeviationPostAggregator.class
StandardDeviationPostAggregator.class,
ZtestPostAggregator.class,
PvaluefromZscorePostAggregator.class
)
);
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.query.aggregation.teststats;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonTypeName;
import com.google.common.base.Preconditions;
import com.google.common.collect.Iterables;
import com.google.common.collect.Sets;
import io.druid.query.Queries;
import io.druid.query.aggregation.AggregatorFactory;
import io.druid.query.aggregation.PostAggregator;
import io.druid.query.aggregation.post.ArithmeticPostAggregator;
import io.druid.query.aggregation.post.PostAggregatorIds;
import io.druid.query.cache.CacheKeyBuilder;
import org.apache.commons.math3.distribution.NormalDistribution;

import java.util.Collections;
import java.util.Comparator;
import java.util.Map;
import java.util.Set;

@JsonTypeName("pvalue2tailedZtest")
public class PvaluefromZscorePostAggregator implements PostAggregator
{
private final String name;
private final PostAggregator zScore;

@JsonCreator
public PvaluefromZscorePostAggregator(
@JsonProperty("name") String name,
@JsonProperty("zScore") PostAggregator zScore
)
{
Preconditions.checkNotNull(
name,
"Must have a valid, non-null post-aggregator"
);
Preconditions.checkNotNull(
zScore,
"Must have a valid, non-null post-aggregator"
);
this.name = name;
this.zScore = zScore;
}

@Override
public Set<String> getDependentFields()
{
Set<String> dependentFields = Sets.newHashSet();

dependentFields.addAll(zScore.getDependentFields());

return dependentFields;
}

@Override
public Comparator getComparator()
{
return ArithmeticPostAggregator.DEFAULT_COMPARATOR;
}

@Override
public Object compute(Map<String, Object> combinedAggregators)
{

double zScoreValue = ((Number) zScore.compute(combinedAggregators))
.doubleValue();

zScoreValue = Math.abs(zScoreValue);
return 2 * (1 - cumulativeProbability(zScoreValue));
}

private double cumulativeProbability(double x)
{
try {
NormalDistribution normDist = new NormalDistribution();
return normDist.cumulativeProbability(x);
}
catch (IllegalArgumentException ex) {
return Double.NaN;
}
}

@Override
@JsonProperty
public String getName()
{
return name;
}

@Override
public PostAggregator decorate(Map<String, AggregatorFactory> aggregators)
{
return new PvaluefromZscorePostAggregator(
name,
Iterables.getOnlyElement(Queries.decoratePostAggregators(
Collections.singletonList(zScore), aggregators))
);
}

@JsonProperty
public PostAggregator getZscore()
{
return zScore;
}

@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}

PvaluefromZscorePostAggregator that = (PvaluefromZscorePostAggregator) o;

if (!name.equals(that.name)) {
return false;
}

return (zScore.equals(that.zScore));
}

@Override
public int hashCode()
{
int result = name.hashCode();
result = 31 * result + zScore.hashCode();
return result;
}

@Override
public String toString()
{
return "PvaluefromZscorePostAggregator{" +
"name='" + name + '\'' +
", zScore=" + zScore + '}';
}

@Override
public byte[] getCacheKey()
{
return new CacheKeyBuilder(PostAggregatorIds.PVALUE_FROM_ZTEST)
.appendCacheable(zScore).build();
}
}
Loading