Skip to content

Comments

Added median to std.numeric#3893

Closed
JackStouffer wants to merge 1 commit intodlang:masterfrom
JackStouffer:median
Closed

Added median to std.numeric#3893
JackStouffer wants to merge 1 commit intodlang:masterfrom
JackStouffer:median

Conversation

@JackStouffer
Copy link
Contributor

Now that #2991 has been closed, I am reintroducing this function. I believe std.numeric to be a better fit than std.algorithm.iteration, so I moved it.

See #3680 for previous discussion

@9il
Copy link
Member

9il commented Dec 31, 2015

@JackStouffer Great! But it is a little bit more complicated.

  1. Median is special case of Quantile, It is better to generalize it.
  2. Median/Quantile can be defined for user types that have not arithmetic ops. It would be significant addition to std.algorithm.sorting
  3. Median/Quantile can be found for unsorted range with algorithms complexity equals to O(n)
    1. http://cs.stackexchange.com/questions/1914/to-find-the-median-of-an-unsorted-array
    2. https://en.wikipedia.org/wiki/Median_of_medians
    3. https://en.wikipedia.org/wiki/Selection_algorithm
    4. http://dlang.org/phobos/std_algorithm_sorting.html#topN

@JackStouffer
Copy link
Contributor Author

Median is special case of Quantile, It is better to generalize it.

Originally, there was a quantile function in this commit, but as John-Colvin pointed out in the original thread, to be useful, a quantile function should enable all known types of quantile calculations like the R function does. And to be honest, that's not a project I'm willing to take on. I believe that this should forward to a quantile function should one ever exist, but in the mean (median? ha ha) time there should be a median function as it's easy to implement and add.

Median/Quantile can be defined for user types that have not arithmetic ops. It would be significant addition to std.algorithm.sorting

I suppose you could for ranges that have a odd length, but unless I am missing something, that would require that function to throw if it received an even length function. Which is fine, as only that overload would not be marked as nothrow.

Median/Quantile can be found for unsorted range with algorithms complexity equals to O(n)

Thanks for the links.

@JackStouffer
Copy link
Contributor Author

Does anyone know why the documentation for median is not included?

@9il
Copy link
Member

9il commented Jan 4, 2016

private?

@JackStouffer
Copy link
Contributor Author

Ha, yes, there was a private: about 200 lines up. Thanks.

@JackStouffer
Copy link
Contributor Author

Median/Quantile can be found for unsorted range with algorithms complexity equals to O(n)

Ok, I've run into a problem. I don't see a way of doing that algorithm without allocation. After the pivot is found, the two sub ranges can be constructed via filter, but filter is a forward range and the algorithm needs a random access range. Other than call std.array.array or malloc, I don't see a way to construct the sub arrays.

Here is what I have so far.

real median(R)(R r)
    if (!isInstanceOf!(SortedRange, R) &&
        isRandomAccessRange!R &&
        hasLength!R)
{
        import std.random : uniform;
        import std.algorithm.iteration : filter;
        import std.array : array;

        size_t i = r.length / 2;
        size_t k = uniform(0, r.length); // simple random for demonstration 
        auto pivot = r[k];
        auto less_than_range = r.filter!(a => a < pivot);
        auto greater_than_range = r.filter!(a => a > pivot);

        if (i == k)
        {
            return pivot;
        }

        if (i < k)
        {
            return less_than_range
               .array
               .median;
        }

        if (i > k)
        {
            return greater_than_range
              .array
              .median;
        }

        assert(0);
}

@PetarKirov
Copy link
Member

@JackStouffer luckily, since you are using SortedRange, you can use lowerBound and upprerBound, instead of filter. See: http://dlang.org/phobos/std_range#.SortedRange

@9il
Copy link
Member

9il commented Jan 5, 2016

@JackStouffer luckily, since you are using SortedRange, you can use lowerBound and upprerBound, instead of filter. See: http://dlang.org/phobos/std_range#.SortedRange

The last variant is for an unsorted range.

@PetarKirov
Copy link
Member

Sorry, I've missed that. If you need a RandomAccessRange partition from an unsorted range, then yes, you'll need to use dynamic allocation, unless you can make a median function that doesn't need random access.

@JackStouffer
Copy link
Contributor Author

I'm going to leave the median of medians algorithm out for now, as it's frowned upon to add more allocating code into Phobos when it can just wait for std.allocator to be moved out of experimental and be done properly. If this gets merged, I'll add a bugzilla report noting that it should be added in later.

@JackStouffer JackStouffer force-pushed the median branch 2 times, most recently from 88176bc to 7b741f2 Compare January 5, 2016 22:32
@JackStouffer
Copy link
Contributor Author

Median/Quantile can be defined for user types that have not arithmetic ops

Thinking about this more, does it really make sense? What kind user defined types that don't have these operators are going to be used in a range which someone wants to take the median of?

IMO I think it's reasonable to restrict this function to user types that are like numbers, and then we can assume that if someone is making a type to represent a numerical value, then they are going to have addition and division defined for them.

@andralex
Copy link
Member

Honest this looks a bit overkill. I don't see much use for these functions.

@JackStouffer
Copy link
Contributor Author

@andralex I disagree, I believe that the standard library should have, at least, the following basic stats functions:

  • mean
  • median
  • mode
  • quantile
  • range
  • correlation
  • covariance
  • variance
  • standard deviation

I wrote the functions this way because I don't believe that a median function should be mutating because the most common way to find the median is to sort the array and then find the middle result, so instead of making that implicit in the function, I made it explicit.

@andralex
Copy link
Member

All told I see 213 lines with a poor work/scaffolding ratio. The algorithm implemented is essentially "For empty ranges median doesn't make sense. For others, go to the kth position in the range, and if it has odd elements pick that guy, otherwise pick the average of the two adjacent guys."

I think it's unusual to compute the median for a sorted non-random range in O(n).

Computing the median of unsorted ranges is real work. There's algorithms, analyses, papers written about it etc. etc. Taking the median of an already sorted range is trivial - just go there and pick it up. I agree "picking up" does have a few details to be minded, but all in all this doesn't make the cut for inclusion in the standard library.

@JackStouffer JackStouffer deleted the median branch January 14, 2016 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants