-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[MetaSchedule][M3a] Add Sampling Primitive SampleCategorical. #8817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3af2d12
121e658
1ca4aed
fcf1d67
ce6e806
d546c38
b7542f1
cb0d96e
d735664
35aa00c
91454b1
3b30c93
ef4fda1
6e740b4
73f57e8
e937881
0da7894
e75ed2b
9f40d16
83cff2f
3d2d5d2
8efe5a3
93878a2
b67f14a
72e2456
ce8e6bb
b2fffa0
414f440
d7a545e
58de4a9
cb25711
f9c5458
b541d49
5a6b2d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,16 +18,19 @@ | |
| */ | ||
| #include "./concrete_schedule.h" | ||
|
|
||
| #include <random> | ||
|
|
||
| namespace tvm { | ||
| namespace tir { | ||
|
|
||
| Schedule Schedule::Concrete(IRModule mod, int debug_mask, | ||
| ScheduleErrorRenderLevel error_render_level) { | ||
| Schedule Schedule::Concrete(IRModule mod, support::LinearCongruentialEngine::TRandState seed, | ||
| int debug_mask, ScheduleErrorRenderLevel error_render_level) { | ||
| ObjectPtr<ConcreteScheduleNode> n = make_object<ConcreteScheduleNode>(); | ||
| n->state_ = ScheduleState(mod, debug_mask); | ||
| n->error_render_level_ = error_render_level; | ||
| n->symbol_table_ = {}; | ||
| n->analyzer_ = std::make_unique<arith::Analyzer>(); | ||
| support::LinearCongruentialEngine(&n->rand_state_).Seed(seed); | ||
| return Schedule(std::move(n)); | ||
| } | ||
|
|
||
|
|
@@ -208,6 +211,29 @@ Schedule ConcreteScheduleNode::Copy() const { | |
| } | ||
|
|
||
| /******** Schedule: Schedule: Sampling ********/ | ||
|
|
||
| void ConcreteScheduleNode::Seed(support::LinearCongruentialEngine::TRandState seed) { | ||
| if (seed == -1) { | ||
| seed = std::random_device()(); | ||
| } | ||
| support::LinearCongruentialEngine(&rand_state_).Seed(seed); | ||
| } | ||
zxybazh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| support::LinearCongruentialEngine::TRandState ConcreteScheduleNode::ForkSeed() { | ||
| // In order for reproducibility, we computer the new seed using RNG's random state and a different | ||
| // set of parameters. Note that both 32767 and 1999999973 are prime numbers. | ||
| return (support::LinearCongruentialEngine(&rand_state_)() * 32767) % 1999999973; | ||
| } | ||
|
Comment on lines
+222
to
+226
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like ForkSeed is analogous to what is called "splitting" in the random number generator literature. I'm not quite an expert on this, but I did do a bit of research into PRNGS for the Threefry implementation we have. Everything I read says that there are no proofs to the validity of splitting LCGs (is the method you use here from a paper?). The paper "Splittable Pseudorandom Number Generators using Cryptographic Hashing" provides some good explanations. In practice, I expect we will see some issues. If this function somehow perfectly bisects the space of random numbers generated by this PRNG, then we could expect to start seeing repeats of previous random numbers after 31 splits. Given that this splitting does not perfectly bisect the space, I'd assume that we start seeing repeats much sooner. Repeating portions of the search space may mean that we may no be able to visit the entire search space during tuning or that we may bias results towards a certain section of the space. I'd suggest we adopt a splittable PRNG here as that appears the be what we need. Maybe we can find an existing implementation online as implementing your own PRNG can have subtle issues.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LCGs are pretty easy to be cracked in terms of security, but our search isn't something where there is an adversarial against you haha. To be clear, we don't split the RNG too many times. It is only used in terms of multi-threaded search where we split the RNG for each thread, where in practice we didn't see repetition or any problem caused by repetition when running tens of real-world workloads.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't most modern machines have at least 31 hyper threads, i.e we will split at least 31 times on those machines?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually agree with Tristan's theory in general. Thank you for bringing this up! Indeed seeding of parallel PRNG would require some really careful thought to avoid quick repetition. LCG may not be the best candidate to ensure such a property. Fortunately, in our particular use case it is not a practical problem. Here is a quick example, supposing we have 128 threads and 10k trials: https://gist.github.com/junrushao1994/ea986add81b01b89fd99a5a7d41d087a. The result is that there is no repetition at all. This is a harsher condition than our practical usage. To further address the issue, architecturally we have designed the PRNG interface to be generic and compliant to STL, and easily switchable to any splittable PRNG in the future if there are new interesting usecases. Therefore, I assume it won't constitute an architecture issue :-) Thanks again for the discussion!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Coming late to the discussion. I read the thread yesterday evening and wanted Please let me try to summarize and share some thoughts As we generate random numbers, PRNG state circles through the space of states The following pts about the PRNGs
My read is that @tkonolige is right about A0 and A1 and seems we also agree. Because of the A0 and A1, it would be helpful need to consider the implication of
To make things simple, let us assume that there are two streams in the 32 threads At a high level, we have two parameters we can tweak, the number of sampling steps n, So in summary:
Note that the real sampling scenario is much more complicated. As junru's experiments
The end effect of A2 has a quite close analogy in parallel computing: as we start to use Yesterday I did not think of A2 in particular, which might change our perspective. So I would
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tqchen I happened to implement a BSGS discrete logarithm this morning. This is a simple but effective algorithm (but not effective enough for crypto) we use in high school competitive programming: https://gist.github.com/junrushao1994/d32f265f5b4815d4b346d6022e95f394. I use this script to find out what the minimal number of trials is required for a first repeat to happen given In a word, in practice the conflict with the 0-th thread won't happen after 1407035 trials in the first 999 threads which split this way.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tqchen @junrushao1994 You both lay out a lot of interesting points here, but I'm not sure I have the expertise to evaluate them. The PRNGS themselves might appear simple, but analysis of their randomness is complicated and non-intuitive. Looking at the paper I linked above, you can get subtle bugs if the PRNG is used incorrectly. I've tested the LCG implemented in TVM with some PRNG test suites (you can try it yourself here: https://github.com/tkonolige/prng-tests), and it fails all of them. This result is unsurprising because LCGs aren't particularly good random number generators, but it just adds a little to my concern. Given that we want to avoid any potential issues, why don't we just do things the right way and use a splittable PRNG? This page (https://www.pcg-random.org/posts/some-prng-implementations.html) lists some implementations of PRNGs including SplitMix which is splittable. (pcg-random appears to be a reputable source, it is run by the create of the PCG family of PRNGS). It seems like there is basically no overhead to just dropping this SplitMix implementation into the codebase. And then we won't have to worry about any bugs due to bad randomness.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't want to block this PR on this, so I'm going to approve. But I would like us to fix this in the future. |
||
|
|
||
| ExprRV ConcreteScheduleNode::SampleCategorical(const Array<Integer>& candidates, | ||
| const Array<FloatImm>& probs, | ||
| Optional<Integer> decision) { | ||
| TVM_TIR_SCHEDULE_BEGIN(); | ||
| return CreateRV(tir::SampleCategorical(&this->rand_state_, candidates, probs, &decision)); | ||
| TVM_TIR_SCHEDULE_END("sample-categorical", this->error_render_level_); | ||
| throw; | ||
| } | ||
|
|
||
| /******** Schedule: Get blocks & loops ********/ | ||
|
|
||
| BlockRV ConcreteScheduleNode::GetBlock(const String& name, const String& func_name) { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.