-
-
Notifications
You must be signed in to change notification settings - Fork 16
[WIP] Feature/support windows #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This looks great. I'm concerned that things are going to segfault if we disable DSTAN_THREADS. But we can work around this if we disable sampling in parallel on windows. (If we do this, we should definitely add a warning message for windows users.) |
|
Currently testing is probably failing, because I returned some items outside the tempfolder structure. Let's see if stuff segfaults. It did segfault with |
|
I also think we could add a script that tries to delete old I think this would be suitable fix for Windows (It can not delete imported files). |
|
I'm virtually certain |
|
Nope, it is needed. (testing with pytest + test_bernoulli) Without it parallel test segfaults. With it, it segfaults at exit. (mingw-w64 + libwinptreads) |
|
RStan must be dealing with the segfaults (at exit). I wonder what they
are recommending.
…On 2/3/19 2:05 PM, Ari Hartikainen wrote:
Nope, it is needed. (testing with pytest + test_bernoulli)
Without it parallel test segfaults. With it, it segfaults at exit.
(mingw-w64 + libwinptreads)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#142 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AmFA1g7G4cub_JpO-jerMRN7ymDEkpazks5vJzLigaJpZM4agE9W>.
|
|
I think so too, don't they use RTools 4 or something? I just wonder when they say threading works on Windows, does it really work. So I think if we disable parallel sampling for now and let's read environmental variable ( |
|
Hmm, there might be hope with the latest mingw-w64. Let's see how can we install it. |
|
We could also distribute two different wheels for Windows (on Github,
not PyPI). One wheel would have STAN_THREADS enabled and
parallel-sampling. The other would add some sort of lock to prevent
parallel sampling.
…On 2/3/19 5:03 PM, Ari Hartikainen wrote:
I think so too, don't they use RTools 4 or something? I just wonder when
they say threading works on Windows, does it really work.
So I think if we disable parallel sampling for now and let's read
environmental variable (|HTTPSTAN_THREADING|) which enables parallel
(throw a warning), because |clang| + |libc++| could have a working
implementation of |thread-local|. (And of course |clang-cl| + |MSVC| works).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#142 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AmFA1kjpv_Y7uctezIcAzn-MPR8w3i2Tks5vJ1yrgaJpZM4agE9W>.
|
|
I think I see the commit/fix you're talking about in the mingw-w64
webpage. My fingers are crossed.
…On 2/3/19 5:19 PM, Ari Hartikainen wrote:
Hmm, there might be hope with the latest mingw-w64. Let's see how can we
install it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#142 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AmFA1rxRsAdbCUIaZgEDUM8-dclmhrgtks5vJ2B0gaJpZM4agE9W>.
|
|
There's a contextlib error. Do you want help with this? |
|
We also might want to see if stan-dev/math#1135 (when merged) helps with the Windows crashes. |
|
Maybe you guys try out the branch on stan-math? It would be nice to know if this cleans up you problems. It is a bit unpredictable how long this will take to merge, really. |
|
Browsing through this thread it sounds as if you indeed spawn sub-threads yourself and then fire off new chains. So if you are going to use the PR for the faster AD TLS, then you need to slightly adjust your code (given things stay as they are) just like here: So in child threads the AD tape is not automatically initialised... rather you have to instantiate the object Knowing that this PR fixes your issues on windows would be really nice and motivating. ... ah... and things should get ~20% faster on average with threading turned on with the new approach which is really nice. |
|
Thanks! This is all very good to know.
We probably should split things up into smaller PRs. We'll need one just
to use the most recent version of Stan(-math), with the new code in it.
I also think we want a separate PR for the appveyor code. It's fine if
it only gets through compilation and installation.
…On 3/3/19 4:22 AM, wds15 wrote:
Browsing through this thread it sounds as if you indeed spawn
sub-threads yourself and then fire off new chains. So if you are going
to use the PR for the faster AD TLS, then you need to slightly adjust
your code (given things stay as they are) just like here:
https://github.com/stan-dev/math/blob/fadafd809b439fa00fb6915c8ea03a1a4a1c7461/test/unit/math/rev/mat/functor/gradient_test.cpp#L49
So in child threads the AD tape is not automatically initialised...
rather you have to instantiate the object |stan::math::ChainableStack|
to get a thread specific AD tape initialised. The tape will stay around
until the object you instantiate goes out of scope.
Knowing that this PR fixes your issues on windows would be really nice
and motivating.
... ah... and things should get ~20% faster on average with threading
turned on with the new approach which is really nice.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#142 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AmFA1rwWwS2lk5jYVSsaU28s6FhH-sCiks5vS5RngaJpZM4agE9W>.
|
|
@wds15 is there documentation or a code comment which explains why we need to put |
|
There is some description of this new behavior under "side-effects" of the PR. In short, instantiation this object will ensure that the AD tape is initialized. However, this PR is yet to be reviewed and things can change - so if you test this then you will probably have to change on your end some bits and pieces. Still, it would be valuable if you could confirm that Windows samples happily with this multiple chains (and it will even be faster). |
|
@ahartikainen I've been copying pieces of this commit into separate small PRs. I hope you don't mind. (I'm eager to see if we can get this new threading fix in.) |
|
Hi, looks great. I have been a bit busy, so this totally fine. |
|
Closing this in favor of #151 (conflicts are resolved). |
|
Thanks for all the work. It looks great. |
This includes minimal changes to support Windows.
/dev/nullI will fix appveyor on another PR.
It will remove
-DSTAN_THREADSfor now and user will need to enable it.I don't think this has any effect on Python threading. Let's not merge before I have tested this locally.
Currently, example compilation and sampling are successful.
curl on Windows doesn't like ' so I need to escape inner ".