Add Mean and Median to std.algorithm#3592
Conversation
std/algorithm/iteration.d
Outdated
There was a problem hiding this comment.
This function does not find the mean lazily.
|
Fixed mentioned issues and rebased. |
std/algorithm/iteration.d
Outdated
There was a problem hiding this comment.
There's no check in the sig constraints for the range elements to be numeric types, yet the function freely applies + and 0.0 to the elements.
There was a problem hiding this comment.
I.e., the sig constraints should at least check that range elements are convertible to floating point (which precision, though?), otherwise + and 0.0 will cause a compile error.
There was a problem hiding this comment.
Would
isImplicitlyConvertible!(ElementType!R, float) || isImplicitlyConvertible!(ElementType!R, double)
work for that?
There was a problem hiding this comment.
That's a lot of typing (and reading). What about just straight-up:
is(ElementType!R : double)
?
For simplicity, I'd just stick with double to minimize precision errors. But if you really wanted to, you could parametrize the floating type and use isFloatingPoint!F in the sig constraints to enforce it, then the user can specify the type to use for computing the average.
|
BTW, it is possible to implement If For bidirectional ranges, you can just call Note, though, that for anything other than a random access range with |
std/algorithm/iteration.d
Outdated
There was a problem hiding this comment.
Hate to be nitpicking on this again, but Phobos style guide requires opening braces to be on a new line.
|
@quickfur Addressed all concerns and rebased. Please see my line note though. |
std/algorithm/iteration.d
Outdated
There was a problem hiding this comment.
As it states in the comment, I don't get this.
There was a problem hiding this comment.
Because popFront does not return anything. Don't you mean r.front instead?
There was a problem hiding this comment.
Yes, your right.
|
as the median is only a fancy word for the 0.5 quantil why not add a generic quantil functions like: auto quantil(R)(R r, double q) { // q <= 0.0 && q <= 1.0? |
|
And then add a convenience function for the median? Yeah that sounds reasonable. |
|
yes auto median(R)(R r) { return quantil(r, 0.5); } |
|
Changed code to match @burner's requests.
Fixed |
std/algorithm/iteration.d
Outdated
There was a problem hiding this comment.
Are you sure you should be using casts here? What about using std.math.round or std.math.trunc?
|
@John-Colvin can I get your opinion on this, seeing as you're the unofficial D statistics guy. |
|
Ok, now that's three people who have given this PR an ok. Ready to merge. |
|
Ping @DmitryOlshansky @JakobOvrum @quickfur @schveiguy It has been 11 days since this has been ready to merge (6 if you count very minor doc updates). As I said before I have another PR waiting on this, it would be great to get this merged soon. |
|
Overall it LGTM, it's just that I'm not really qualified to spot issues in the quantile algorithm. |
|
@JakobOvrum Yes, but burner and John-Colvin, who seem confident in their knowledge of these algorithms, gave it an ok. |
|
@JackStouffer That's not really a fair representation of what I've said. To be clear: I think |
|
@John-Colvin Sorry, I took your silence as acceptance. |
std/algorithm/iteration.d
Outdated
There was a problem hiding this comment.
I really don't like that assert, it is the same as assert(!e.empty)
There was a problem hiding this comment.
it is the same as assert(!e.empty)
Does that mean I should move it to the body? Or that you don't like its existence?
There was a problem hiding this comment.
no return NaN
Are you sure this function should do that? Because having a q that's less than zero or greater than one makes no sense what so ever, and I feel that accepting it anyway without error is wrong.
There was a problem hiding this comment.
and create a function
bool isValidQuantil(real q);There was a problem hiding this comment.
What? Why? This is the type of thing that contracts are made for. Why not use them?
There was a problem hiding this comment.
because the compiler can remove them, without you knowing.
and this way it is more composable
There was a problem hiding this comment.
because the compiler can remove them, without you knowing.
And that's fine, because they'll be in release mode which must be explicitly activated and they should have run it with contracts on at least once while developing. This seems to be perfectly acceptable for a lot of the functions in other modules, especially std.range which makes heavy use of contracts.
|
@JackStouffer on a meta note. I think you pushing to hard. D is a "open source/no one is paid to work" project. Asking people to do work for you drives people away, and they do less. This comes from personal experience. What makes me really itchy though is:
This functionally will get in eventually, but not as fast as you like |
|
Sorry, you're right. I was frustrated that people had given an ok but it was just sitting here. I'm happy to fix things if people tell me they're broken, I just didn't want to see this die on the vine. |
|
it will not die. just use more of the number one computer science skill: |
|
Quantiles from data aren't really a trivial, done-and-dusted thing. See https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html for a serious quantile function and the wikipedia article for an example of a simple one. A simple quantile, like is shown in the wikipedia example can be done something like this: Overall, I'm not sure something like Could I suggest that if you're interested in doing a more comprehensive treatment of quantiles, you could make a pull request to https://github.com/DlangScience/dstats with it?
|
Thanks for the link.
I'll make the functions more generic by allowing any user defined type that defines
I don't think std.math is a good fit, as there are no other range based functions in there. I think the only other module that fits is std.numeric, but that seems to be an attempt at a strait one to one implementation of the STL.
I keep quantile in there for now, until more people in this PR tell me it's no good for Phobos. If that's true, I'll open a PR on that repo. |
|
After doing more research, I have concluded that the I'll open a WIP PR on dstats and if it's good it can always be added back into phobos later. |
2d7e3b6 to
d4d7c8b
Compare
|
I think Generally, it's a good idea to split things up into multiple PRs when there are various pieces that don't directly depend on each other, otherwise perfectly good changes will get held up unnecessarily just because of another, mostly-unrelated piece. |
|
@quickfur ok, but before I do that there was one thing that I have been thinking about with this PR. I also want to add various other statistical methods to Phobos, such as standard deviation, which fit in |
|
@JackStouffer this is relevant to discussions about rolling means etc. #2991 |
|
@John-Colvin looks interesting. I'll wait on the rolling mean/median stuff until a decision is reached on that PR. In the mean (ha) time, |
This covers issue 14034.
I couldn't find a way to make a median function for input ranges that didn't use the GC, so I just left it to random access ranges.