Skip to content

adding new post aggregators for test statistics to druid-stats extension#4532

Merged
jon-wei merged 9 commits intoapache:masterfrom
chunghochen:stats-pvalue
Oct 10, 2017
Merged

adding new post aggregators for test statistics to druid-stats extension#4532
jon-wei merged 9 commits intoapache:masterfrom
chunghochen:stats-pvalue

Conversation

@chunghochen
Copy link
Copy Markdown
Contributor

@fjy fjy added this to the 0.11.0 milestone Jul 13, 2017
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chunghochen, thanks for the nice work!
I left a couple of comments. Also, please merge the master branch and check the build failure due to invalid formats.

{
"type": "zscore2sample",
"name": "<output_name>",
"fields": [<count 1 (post_aggregator1)>, <sample size 1 (post_aggregator2)>, <count 2 (post_aggregator3)>, <sample size 2 (post_aggregator4)>]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that the number of arguments is always 4 and their order matters. If so, I think it would be better to receive 4 individual arguments instead of a fields list because it is less error-prone and intuitive which arguments are needed. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fields in this case required 4 arguments and the order matters.
However, I don't think it is more error prone or less intuitive, than using individual arguments.
I think at this juncture, fields as input parameters fit well since

  1. It is aligned and consistent with the way fields are used in Druid. Pl see http://druid.io/docs/latest/querying/aggregations.html and http://druid.io/docs/latest/querying/post-aggregations.html
  2. As we might fine-tune, or evolve the input parameters so that the stat functions can be most useful and powerful, keep them in fields at this juncture should be good, so it can be continuously evolved / fine-tuned without dramatically changing the input signature.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I don't think it is more error prone or less intuitive, than using individual arguments.

With the current design, users should remember the meaning of arguments according to their appearance order in the fields list or see the document every time. What's worse, users are difficult to figure out even though they input the arguments in wrong order (e.g., [<count 2 (post_aggregator3)>, <sample size 2 (post_aggregator4)>, <count 1 (post_aggregator1)>, <sample size 1 (post_aggregator2)>]).

If they should be specified individually, users, at least, cannot input arguments of wrong number or in wrong order.

It is aligned and consistent with the way fields are used in Druid.

In this case, I think the usability (stated above) matters rather than consistency. There is no point for arbitrary number of arguments.

As we might fine-tune, or evolve the input parameters so that the stat functions can be most useful and powerful, keep them in fields at this juncture should be good, so it can be continuously evolved / fine-tuned without dramatically changing the input signature.

If you have any plan, please let us know. I guess even changing input signature is not a big problem because arguments can be easily added and removed (thanks to json).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jihoonson here -- a fields array makes sense if the fields are all treated basically the same, and if arbitrary numbers of fields are allowed, but in this case neither is true. The fields mean different things semantically, and the number is fixed, so the API would be nicer if the four arguments are given meaningful names.

It would still be easy to evolve the API with named parameters, and in fact I think it's easier, since the parameters are named rather than positional.

Comment thread extensions-core/stats/pom.xml Outdated
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.6.1</version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version is not needed here because the parent module specifies it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I overlooked it when porting to the latest version of Druid.

@jihoonson
Copy link
Copy Markdown
Contributor

Also, please add a doc for new post aggregators.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @chunghochen. I left some comments on the code.

I didn't review the math, just reviewed the Druid aspects of the code.

{
"type": "zscore2sample",
"name": "<output_name>",
"fields": [<count 1 (post_aggregator1)>, <sample size 1 (post_aggregator2)>, <count 2 (post_aggregator3)>, <sample size 2 (post_aggregator4)>]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jihoonson here -- a fields array makes sense if the fields are all treated basically the same, and if arbitrary numbers of fields are allowed, but in this case neither is true. The fields mean different things semantically, and the number is fixed, so the API would be nicer if the four arguments are given meaningful names.

It would still be easy to evolve the API with named parameters, and in fact I think it's easier, since the parameters are named rather than positional.

import java.util.Set;

/**
* Created by chunchen on 4/5/17.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't include author information in the code. Git will remember it.

public PvaluefromZscorePostAggregator(
@JsonProperty("name") String name,
@JsonProperty("field") PostAggregator field
) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brace style isn't quite right here (and elsewhere in this file) so please apply Druid code style linked off https://github.com/druid-io/druid/blob/master/CONTRIBUTING.md.

}

@Override
public String toString() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider overriding equals and hashCode as well.


@Override
public PostAggregator decorate(Map<String, AggregatorFactory> aggregators) {
return this;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should recursively decorate the PostAggregator field, for example:

return new PvaluefromZscorePostAggregator(name, Iterables.getOnlyElement(Queries.decoratePostAggregator(Collections.singletonList(field), aggregators));

That way, any nested post-aggregators that need decoration will receive it.

* @param the success count of population 2
* @param sample size of population 2
*/
private double zScoreTwoSamples(Double s1count, Double p1count, Double s2count, Double p2count) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might as well be double rather than Double as they are assumed to be non-null.

return (convertRate1 - convertRate2) /
Math.sqrt((convertRate1 * (1 - convertRate1) / p1count) +
(convertRate2 * (1 - convertRate2) / p2count));
} catch (Exception ex) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of Exception can be thrown here? Could this be made more specific?

Also, is it possible to use some other mechanism of detecting whatever problem this exception is detecting? Exceptions are quite slow compared to other forms of flow control and so I'm concerned about using them in a post-aggregator compute function.

import java.util.Map;

/**
* Created by chunchen on 4/23/17.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't include author tags.

try {
NormalDistribution normDist = new NormalDistribution();
return normDist.cumulativeProbability(x);
} catch (Exception ex) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of Exception can be thrown here? Could this be made more specific?

Also, is it possible to use some other mechanism of detecting whatever problem this exception is detecting? Exceptions are quite slow compared to other forms of flow control and so I'm concerned about using them in a post-aggregator compute function.

}

@Override
public String toString() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider overriding equals and hashCode as well.

@chunghochen
Copy link
Copy Markdown
Contributor Author

Thank you @gianm and @jihoonson for the reviews and comments!!
I've made changes per review comments.
In my env, it appears passing checkstyle, as from mvn log:

[INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ druid ---
[INFO] Starting audit...
Audit done.
Not sure why it behaves differently from druid CI env?
Thank you both for all the help!!

@jihoonson
Copy link
Copy Markdown
Contributor

Some new rules (e.g., #4651, #4564, ...) have been added to the master branch. Please merge the master branch first and test again.

@chunghochen
Copy link
Copy Markdown
Contributor Author

I can't see the errors when clicking the Details link of
continuous-integration/travis-ci/pr; other than #12696.3 canceled.
BTW, some anecdotal data points:

@jihoonson
Copy link
Copy Markdown
Contributor

@chunghochen, hmm i'm not sure why the job was canceled. Maybe there was a problem in travis. I restarted the job.

p1 = (successCount1) / (sample size 1)

For example,
p2 = (successCount2) / (sample size 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to change sample size 1 and sample size 2 to camel case for consistency and better readability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are doing two sample z-test to calculate the zscore, from sample 1 and sample 2.
Therefore, sample 1 size and sample 2 size are more meaningful in this context.
Will change sample2size to sample2Size for consistency.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

"name": "<output_name>",
"fields": [<count 1 (post_aggregator1)>, <sample size 1 (post_aggregator2)>, <count 2 (post_aggregator3)>, <sample size 2 (post_aggregator4)>]
"successCount1": <post_aggregator> success count of sample 1,
"sample1Size": <post_aggregaror> sample 1 size,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sampleSize1 better? Because successCount is postfixed by a number.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described
successCount1: success count of sample 1
sample1size: sample 1 size
In short, we are doing two sample z-test to calculating the zscore, from sample 1 and sample2.
It appears sampleSize1 deviates from its original statistical meaning.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people won't be confused if the sampleSize is postfixed by the number and we give a proper description. Two different types of numbering for arguments make people confused and being prone to mistakes.

# Test Stats Aggregators

Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for background and details.
Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details, although its input spec and example are out of date.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although its input spec and example are out of date.

I'm not sure this is a necessary statement because if the linked document is really out of date and irrelevant to the description here, we usually change it to more proper link.

Also, I think it's better to add links describing the definitions of z-score and p-value like wikipedia links because that blog post describes a sort of bigger concept how these new post aggregators are used in your case. Even the blog post links wikipedia for the definitions of z-score and p-value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jihoon,

The purpose of the blog post is to explain and connect all the related concepts to derive how post aggregators do the calculations. IOW, it is to bridge the gap between what wikipedia states and what the post aggregators do.

Not sure if you're asking to remove "although its input spec and example are out of date" in test-stats.md?
Please advise. Many thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the blog post is to explain and connect all the related concepts to derive how post aggregators do the calculations. IOW, it is to bridge the gap between what wikipedia states and what the post aggregators do.

If so, I think it's better to remove "although its input spec and example are out of date." statement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, if you insist.

"successCount1" :
{ "type" : "constant",
"name" : "successCountPopulation1",
"name" : "successCountFromPopulation1Sample",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Looks better to put the number at last rather than in the middle.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are calculating two-tailed p-value from zscore.

Copy link
Copy Markdown
Contributor Author

@chunghochen chunghochen Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to have the naming represents its statistical meaning.
In this case, zscore2sample really means we are doing two sample z-test to calculate the zscore.
Further, we are calculating two-tailed P-value from the z-score.

Naming calls for compressing a few statistical concepts together. It is more than mechanically placing the number in the middle or at the end of the name. We need to first describe the name in statistical terms and then try to compress it to a name which makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for explanation.

*/
@JsonTypeName("zscore2sample")
public class ZtestPostAggregator implements PostAggregator {
public class ZtestPostAggregator implements PostAggregator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ouf of curiosity, why is the class name different from its type name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class name follows the naming convention of all Druid PostAggregators.
The Json Type Name just tried to be brief and meaningful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class name tries to be brief while following the naming convention of all Druid postAggregators.
The Type Name just tries to be meaningful in the context of test statistics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, why the class name can't be simply ZScore2SamplePostAggregator? This is maybe a silly question because I'm not familiar with stats.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are various test stats can be calculated based on z-test. It' not clear whether we should have one class for every variation of z-test based stats. For example, it might make more sense to have 1-sample and 2-sample zscore in one class, rather than in two different classes. In other words, I'm not thinking in terms of naming convention, but rather more along the line of extensibility and componentization.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good to me, but if so, the json type name should also be an extendable name, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JsonTypeName (zscore2sample) is intended to be brief and meaningful to the user, so it is different from the class name ZtestPostAggregator. Not sure what you mean by "the json type name should also be an extendable name"? But if you mean we should make the type name in a way so that we can easily add a new type in a consistent manner; then yes. JsonTypeName is designed from the user perspective and is exposed to the user, while the class name is to do with implementations. The rationales behind these namings are different.

private final PostAggregator successCount1;
private final PostAggregator sample1Size;
private final PostAggregator successCount2;
private final PostAggregator sample2Size;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get() method for these methods are missing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add a json serde test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two postAggregators implement PostAggregator Interface. Get() is only needed for Aggregator Interface. I failed ti see any postAggregator implementing get() method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why json serde is needed in this case? These two postAggregators didn’t persist into Druid by per se. And there is no custom Serde either.
One can think of them of glorified arithmetic post aggregators to calculate some test statistics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two postAggregators implement PostAggregator Interface. Get() is only needed for Aggregator Interface. I failed to see any postAggregator implementing get() method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why json serde is needed in this case? These two postAggregators didn’t persist into Druid by per se. And there is no custom Serde either.
One can think of them of glorified arithmetic post aggregators to calculate some test statistics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks that my comments were confused. Sorry for lack of details. I meant, the series of methods of getSuccessCount1(), getSample1Size(), getSuccessCount2(), and getSample2Size() should be added and annotated with @JsonProperty because they are needed for JSON serialization. Unit tests are needed to avoid such mistakes.

Copy link
Copy Markdown
Contributor Author

@chunghochen chunghochen Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying this! will do as requested.

@@ -46,79 +48,92 @@
http://facweb.cs.depaul.edu/sjost/csc423/documents/test-descriptions/indep-z.pdf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be javadoc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do :-)

@JsonProperty("sample1Size") PostAggregator sample1Size,
@JsonProperty("successCount2") PostAggregator successCount2,
@JsonProperty("sample2Size") PostAggregator sample2Size

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary whitespace

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to remove line 67?
I'm using Druid 11 checkstyle.xml and IntelliJ to do reformatting. And it passes the stylecheck.
According to IntelliJ's interpretation continuation indent is 4.
Just like to make sure I understand what you mean. :-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean line#67? Or if there are more unnecessary whitespace you're referring to?
Just like to make sure we are on the same page. :-)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please remove this line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do! 👍

+ fields
+ "}";
"name='" + name + '\'' +
", successCount1" + successCount1 +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert '=' between string and value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

)
{
Preconditions.checkNotNull(name, "Must have a valid, non-null post-aggregator");
Preconditions.checkNotNull(zScore, "Must have a valid, non-null post-aggregator");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's better to check zScore is an intance of ZtestPostAggregator.

Copy link
Copy Markdown
Contributor Author

@chunghochen chunghochen Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can calculate the z-score oneself without using ZtestPostAggregator. Didn't check whether it is an instance of ZtestPostAggregator as we should keep it flexible and customizable if the user opts for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we want to keep it flexible and open.
For example, one can calculate it's own z-score for example if needed, and just feed its zscore to calculate p-value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. It makes sense.

}

@Override
public int hashCode()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement equals() method either. This will remove the potential problem when working with collections.

}

@Override
public int hashCode()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement equals() method either. This will remove the potential problem when working with collections.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@chunghochen
Copy link
Copy Markdown
Contributor Author

chunghochen commented Sep 20, 2017

Not sure why the checks failed?
My mvn regressions passed all tests in my env. I can provide the log if needed.

  • can't click into details of Inspections: pull requests (Druid) — TeamCity build failed. It asked me to sign up and but even I signed up, I can't see any thing meaningful.
  • continuous-integration/travis-ci/pr — The Travis CI build failed
    It showed failure with java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
    I don't think my change will cause this issue. It looks like intermittent issues I ran into before.

Also I can't see ihoonson requested changes, as I clicked "See review" link but failed to see new review comment.

Please help/advise.
Thanks so much!

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Sep 20, 2017

@chunghochen

We've been having issues with Travis CI recently, the third test there often fails with a timeout, I've reset that test.

To see the TeamCity report, you can click on "Login as Guest". I checked the error report there and they don't seem related to this patch.

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Sep 20, 2017

Moving this to 0.11.1, as 0.11.0 is feature frozen now

@jon-wei jon-wei modified the milestones: 0.11.0, 0.11.1 Sep 20, 2017
@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Sep 20, 2017

@chunghochen Can you try merging master to your branch? The TeamCity failures shown in the report were fixed by #4809

@chunghochen
Copy link
Copy Markdown
Contributor Author

chunghochen commented Sep 21, 2017

@jon-wei
Thanks so much for your help!
Here is what I see from my terminal:
...@...<druid-...> git status
On branch stats-pvalue
Your branch is up-to-date with 'origin/stats-pvalue'.
...
...@...<druid-...> git remote -v
origin https://github.com/chunghochen/druid.git (fetch)
origin https://github.com/chunghochen/druid.git (push)
upstream https://github.com/druid-io/druid.git (fetch)
upstream https://github.com/druid-io/druid.git (push)
...@...<druid-...> git merge upstream/master
Already up-to-date.

Can you try merging master to your branch?

Not sure what git command I should do to merge master to my branch?
Thanks so much for all the help!

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Sep 21, 2017

@chunghochen

Not sure what git command I should do to merge master to my branch?

You can do:

git fetch upstream
git merge upstream/master

@chunghochen
Copy link
Copy Markdown
Contributor Author

@jon-wei
Yeap, just realized that after sending out the question.
CI is running for ~ an hour now.

@chunghochen
Copy link
Copy Markdown
Contributor Author

Hopefully this pull request can be merged in when the codeline is open? :-)

@chunghochen
Copy link
Copy Markdown
Contributor Author

@jihoonson Not sure where we are on this pull request?
It passed all CI regressions 5 days ago.
Thanks so much for all the help!

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunghochen I think it's almost done. Left some trivial comments. In addition, please comment on #4532 (comment).


pvaluefromZscorePostAggregator = new PvaluefromZscorePostAggregator("pvalue", zscore);

System.out.print("zscore = " + zscore + "\n");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug code.

-1783.8762354220219
))).doubleValue();

System.out.print("pvalue = " + pvalue + "\n");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug code.

/* Assert P-value is positive and very small */
Assert.assertTrue(pvalue >= 0 && pvalue < 0.00001);

System.out.print(pvaluefromZscorePostAggregator.toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug code.

zscore, 0.0001
);

System.out.print(ztestPostAggregator.toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug code.

@Override
public int hashCode()
{
int result = name != null ? name.hashCode() : 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary null check for name. It's already tested in the constructor.

@Override
public int hashCode()
{
int result = name != null ? name.hashCode() : 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary null check for name. It's already checked in the constructor.

@chunghochen
Copy link
Copy Markdown
Contributor Author

@jihoonson and @jon-wei, can you guys please help?

Not sure why it fails on both checks?
Inspections: pull requests (Druid) — TeamCity build failed, it shows the following
server/src/main/java/io/druid/client/cache
CaffeineCacheConfig.java (3)
server/src/main/java/io/druid/segment/realtime/firehose
ServiceAnnouncingChatHandlerProvider.java (1)
However, the above is not my code.

continuous-integration/travis-ci/pr — The Travis CI build failed
I can't read the log to notice how druid-server fails.
In any event, my mvn regressions passed in my laptop environment before merging with master, as well as after merging with master.

Please help!!
BTW, my change here is very trivial - simply removing some blocks per review comments.

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Oct 6, 2017

@chunghochen

The CI test failed again because of an unrelated test timeout (happens to a lot of PRs), I've restarted the build for that test suite.

I believe the TeamCity build is failing because some PRs that introduced failures were committed without fixing the TeamCity issues, future builds for new PRs are still picking up the unaddressed inspection issues, your PR is fine and we can ignore them since none of the files are related to your changes.

@chunghochen
Copy link
Copy Markdown
Contributor Author

@jon-wei,

Thanks so much for looking into this! And confirming my PR is fine. :-)

@chunghochen
Copy link
Copy Markdown
Contributor Author

chunghochen commented Oct 7, 2017

@jihoonson

@chunghochen I think it's almost done. Left some trivial comments. In addition, please comment on #4532 (comment).

Great! Hope my latest commits address your concerns? I also commented on #4532 (comment).
Not sure why this pull request still shows "Changes requested"?
Hopefully, this pull request can be quickly closed. And it won't be dragged on forever. :-)
Thanks so much for your help!!

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chunghochen!

@jon-wei jon-wei merged commit 0614b92 into apache:master Oct 10, 2017
@chunghochen
Copy link
Copy Markdown
Contributor Author

@jihoonson @jon-wei
Cool! Thanks so much for all the help!
Look forward to see this commit in 0.11.1 release. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants