-
Notifications
You must be signed in to change notification settings - Fork 20
Add std.random.discrete: Discrete sampler #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
eae1274
f1ec416
ed7fb70
75d7e4e
8f62f6b
04ed13b
2c9d511
49d6506
c2f8930
7562bb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| /** | ||
| Discrete distributions. | ||
|
|
||
| License: $(LINK2 http://boost.org/LICENSE_1_0.txt, Boost License 1.0). | ||
|
|
||
| Authors: Ilya Yaroshenko, Sebastian Wilzbach | ||
| */ | ||
|
|
||
| module mir.random.discrete; | ||
|
|
||
| import std.traits : isNumeric; | ||
|
|
||
| /** | ||
| Setup a _discrete distribution sampler. | ||
| Given an array of cumulative density points `cdPoints`, | ||
| a _discrete distribution is defined. | ||
| The cumulative density points can be calculated from probabilities or counts | ||
| using `std.algorithm.iteration.cumulativeFold`. Hence `cdPoints` is assumed to be sorted. | ||
|
|
||
| Params: | ||
| cdPoints = cumulative density points | ||
|
|
||
| Returns: | ||
| A $(LREF Discrete) sampler that can be called to sample from the distribution. | ||
| */ | ||
| Discrete!T discrete(T)(const(T)[] cdPoints) | ||
| { | ||
| return Discrete!T(cdPoints); | ||
| } | ||
|
|
||
| /// | ||
| unittest | ||
| { | ||
| // 10%, 20%, 20%, 40%, 10% | ||
| auto cdPoints = [0.1, 0.3, 0.5, 0.9, 1]; | ||
| auto ds = discrete(cdPoints); | ||
|
|
||
| // sample from the discrete distribution | ||
| auto obs = new uint[cdPoints.length]; | ||
| foreach (i; 0..1000) | ||
| obs[ds()]++; | ||
| } | ||
|
|
||
| /** | ||
| _Discrete distribution sampler that draws random values from a _discrete | ||
| distribution given an array of the respective cumulative density points. | ||
| `cdPoints` is an array of the cummulative sum over the probabilities | ||
| or counts of a _discrete distribution without a starting zero. | ||
|
|
||
| Complexity: O(log n) where n is the number of `cdPoints`. | ||
| */ | ||
| struct Discrete(T) | ||
| if (isNumeric!T) | ||
| { | ||
| import std.range : SortedRange; | ||
|
|
||
| private const(T)[] _cdPoints; | ||
| private SortedRange!(const(T)[], "a <= b") r; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand the need for the 2 entries here.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wilzbach fix this please. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, two style notes:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing else, except keywords and version identifiers is reserved by the langauge (furthermore we have a module system, to prevent name conflicts with symbols from druntime). I'm not a fan of Otherwise +1 for meaningful names.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
https://github.com/dlang/phobos/blob/master/std/range/package.d#L8023
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Isn't that a common style to denote that's a private variables. In this case the leading underscore is used to distinguish between the
Fair point! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I believe it's still considered preferable to use I recognize that leading-underscore names are widely used in phobos, but it's still my understanding that this is technically a style violation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see the current style guide has:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
OK, then use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This feels like using an inferior implementation for the sake of a public method that isn't strictly necessary for what If the |
||
|
|
||
| /** | ||
| The cumulative density points `cdPoints` are assumed to be sorted and given | ||
| without a starting zero. They can be calculated with | ||
| `std.algorithm.iteration.cumulativeFold` from probabilities or counts. | ||
|
|
||
| Params: | ||
| cdPoints = cumulative density points | ||
| */ | ||
| this(const(T)[] cdPoints) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason not to give the constructor the relative proportions for the different discrete element selection probabilities (as is done with I assume one motivation might be to not have the constructor do any memory allocation, but you can get round that by having a second constructor that accepts a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. meaty code here is one line. This looks like bad style to have big API for simple things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow what you mean, can you clarify?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean that having huge API for such simple thing like this distribution is too much efforts. API review needs time from a user / reviewer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but absent any other implementation I would suggest that the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are going to implement dramatically better FP generators. Description is too large and will be published in article. But all this will be after Tinflex.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Cool, looking forward to seeing that work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But it's nice to give people something they can use as an easy replacement.
If it's one/two UFCS calls, why not provide it here via an alternative constructor? (Actually I doubt it's necessarily quite so straightforward when it comes to details of doing sums of integral or floating-point numbers correctly, but that's a longer discussion...)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree that such an interface would be nice - even in Tinflex we have relative |
||
| { | ||
| _cdPoints = cdPoints; | ||
| r = typeof(r)(_cdPoints); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This just wraps Note that the |
||
| } | ||
|
|
||
| /// cumulative density points | ||
| const(T)[] cdPoints() const @property | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any need to define this method at all?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just more control for user There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, if the user needs to care, presumably they know what they provided as input ... ? I would suggest better to not provide this method unless someone has an explicit use-case. I also don't think it's a good idea to constrain the choice of cumulative density points to
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A user may accept this struct as function argument. Then he want to print intenal stuff for debug without changing the API. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would accept it wrapped in a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
| { | ||
| return _cdPoints; | ||
| } | ||
|
|
||
| /// samples a value from the discrete distribution | ||
| size_t opCall() const | ||
| { | ||
| import std.random : rndGen; | ||
| return opCall(rndGen); | ||
| } | ||
|
|
||
| /// samples a value from the discrete distribution using a custom random generator | ||
| size_t opCall(RNG)(ref RNG gen) const | ||
| { | ||
| import std.random : uniform; | ||
| T v = uniform!("[)", T, T)(0, _cdPoints[$-1], gen); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
... if it would be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, to me this feels like working around the lack of guarantees from upstream. Wouldn't it be safer to hold off declaring |
||
| return (cast(SortedRange!(const(T)[], "a <= b")) r).lowerBound(v).length; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't you just use
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... actually, isn't
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
because the internal variable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would repeat my suggestion that you look at
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but that would mean you can't use
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes I will try to fix Phobos, but that won't be released before 2.073 :/
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You have not time to fix Phobos templates, please consider on related issues There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that it's appropriate to prioritize putting This is a case where the fix needs to be upstream, and where the guarantee wanted here can easily be added after that fix is complete. |
||
| } | ||
| } | ||
|
|
||
| unittest | ||
| { | ||
| import std.random : Random; | ||
| auto rndGen = Random(42); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would advise not shadowing the name of the standard random generator instance defined in |
||
|
|
||
| // 10%, 20%, 20%, 40%, 10% | ||
| auto cdPoints = [0.1, 0.3, 0.5, 0.9, 1]; | ||
| auto ds = discrete(cdPoints); | ||
|
|
||
| auto obs = new uint[cdPoints.length]; | ||
| foreach (i; 0..1000) | ||
| obs[ds(rndGen)]++; | ||
| } | ||
|
|
||
| unittest | ||
| { | ||
| import std.random : Random; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Random alias may change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, this dose not matter for this tests, nevermind |
||
| auto rndGen = Random(42); | ||
|
|
||
| // 1, 2, 1 | ||
| auto cdPoints = [1, 3, 4]; | ||
| auto ds = discrete(cdPoints); | ||
| assert(ds.cdPoints == cdPoints); | ||
|
|
||
| auto obs = new uint[cdPoints.length]; | ||
| foreach (i; 0..1000) | ||
| obs[ds(rndGen)]++; | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you take at all a look at
hap.random'sDiscreteDistributionwhen creating this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why not? There is no licensing issue, and I'm happy for that code to be re-used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did not know about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you want to take a look: https://github.com/WebDrake/hap/blob/e25fc15a17e4e439333740d2a9894fc7d3c94012/source/hap/random/distribution.d#L232
Note there are things that won't be useful here (it's a class, it's a forward range...) but it should be straightforward to take most of the good bits and use them with the functor-based approach here.