use the latest datasketches-java-4.0.0#14334
Conversation
|
Looks pretty clean. Thanks for the patch! Reading https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/quantilescommon/QuantileSearchCriteria.java, I don't immediately see situations where the difference between I also wonder, if the old behavior was |
|
We hesitated for some time, but finally decided that inclusive mode is a bit better. This is a major version change with some API incompatibility, so, if ever, this is the right time for the change. |
|
It seems that some test failed, but I cannot reproduce that in my environment. I guess some expectations of rank should be adjusted, but that particular case is not exercised when I run "mvn test" for some reason. |
Looks like this is probably related to SQL compatible null handling mode, you can run locally by adding The other thing that needs updated is the licenses.yaml file entries for the dependencies which have been updated https://github.com/apache/druid/blob/master/licenses.yaml#L3787 |
|
@clintropolis are you sure about the parameter? Tests pass with the parameter you suggested. |
|
What is the right way to run "mvn test" in that mode manually? |
|
@clintropolis oh, I see! you misspelled the parameter. I copied and did not notice. |
oof, my mistake, sorry about that! |
|
The datasketch version 4.xx spitting out weird splits for an Similar item sketch in the 3.2.0 generates the correct boundaries when the function Pseudo code used for experiments |
Description
This is to update the datasketches-java dependency to the latest 4.0.0 release with some API changes, and also to datasketches-memory-2.2.0
Some expectations in the unit tests were changed because in this version of datasketches-java all quantile sketches have "inclusive" mode by default. It is possible, if necessary, to use sketches in the "exclusive" mode explicitly to get the same functionality as before, but in my view it is not necessary, and most users won't notice any difference. It might happen to be important for someone in some special circumstances, but I doubt that. Perhaps even better would be to introduce a parameter, so that the mode could be chosen by the user, but that is a lot more work.