Skip to content

Conversation

@ahartikainen
Copy link
Contributor

This includes minimal changes to support Windows.

  • don't need uvloop
  • Let windows fail with TemporaryDirectory (removing files)
  • Fix stdout /dev/null

I will fix appveyor on another PR.

It will remove -DSTAN_THREADS for now and user will need to enable it.
I don't think this has any effect on Python threading. Let's not merge before I have tested this locally.

Currently, example compilation and sampling are successful.

curl on Windows doesn't like ' so I need to escape inner ".

@riddell-stan
Copy link
Contributor

This looks great.

I'm concerned that things are going to segfault if we disable DSTAN_THREADS. But we can work around this if we disable sampling in parallel on windows. (If we do this, we should definitely add a warning message for windows users.)

@ahartikainen
Copy link
Contributor Author

Currently testing is probably failing, because I returned some items outside the tempfolder structure.

Let's see if stuff segfaults.

It did segfault with -DSTAN_THREADS but that was due to winpthread stuff. This implemention works with current pystan Windows instructions.

@ahartikainen
Copy link
Contributor Author

I also think we could add a script that tries to delete old httpstan tempfiles (httpstan_temp_ prefix) at start.

I think this would be suitable fix for Windows (It can not delete imported files).

@riddell-stan
Copy link
Contributor

I'm virtually certain -DSTAN_THREADS is required for sampling in parallel. If I'm wrong then we could turn it off for everyone. This would speed up sampling.

@ahartikainen
Copy link
Contributor Author

Nope, it is needed. (testing with pytest + test_bernoulli)

Without it parallel test segfaults. With it, it segfaults at exit. (mingw-w64 + libwinptreads)

@riddell-stan
Copy link
Contributor

riddell-stan commented Feb 3, 2019 via email

@ahartikainen
Copy link
Contributor Author

I think so too, don't they use RTools 4 or something? I just wonder when they say threading works on Windows, does it really work.

So I think if we disable parallel sampling for now and let's read environmental variable (HTTPSTAN_THREADING) which enables parallel (throw a warning), because clang + libc++ could have a working implementation of thread-local. (And of course clang-cl + MSVC works).

@ahartikainen
Copy link
Contributor Author

Hmm, there might be hope with the latest mingw-w64. Let's see how can we install it.

@riddell-stan
Copy link
Contributor

riddell-stan commented Feb 3, 2019 via email

@riddell-stan
Copy link
Contributor

riddell-stan commented Feb 5, 2019 via email

@riddell-stan
Copy link
Contributor

There's a contextlib error. Do you want help with this?

@riddell-stan riddell-stan mentioned this pull request Mar 1, 2019
@riddell-stan
Copy link
Contributor

We also might want to see if stan-dev/math#1135 (when merged) helps with the Windows crashes.

@wds15
Copy link

wds15 commented Mar 2, 2019

Maybe you guys try out the branch on stan-math? It would be nice to know if this cleans up you problems. It is a bit unpredictable how long this will take to merge, really.

@wds15
Copy link

wds15 commented Mar 3, 2019

Browsing through this thread it sounds as if you indeed spawn sub-threads yourself and then fire off new chains. So if you are going to use the PR for the faster AD TLS, then you need to slightly adjust your code (given things stay as they are) just like here:

https://github.com/stan-dev/math/blob/fadafd809b439fa00fb6915c8ea03a1a4a1c7461/test/unit/math/rev/mat/functor/gradient_test.cpp#L49

So in child threads the AD tape is not automatically initialised... rather you have to instantiate the object stan::math::ChainableStack to get a thread specific AD tape initialised. The tape will stay around until the object you instantiate goes out of scope.

Knowing that this PR fixes your issues on windows would be really nice and motivating.

... ah... and things should get ~20% faster on average with threading turned on with the new approach which is really nice.

@riddell-stan
Copy link
Contributor

riddell-stan commented Mar 3, 2019 via email

@riddell-stan
Copy link
Contributor

@wds15 is there documentation or a code comment which explains why we need to put stan::math::ChainableStack thread_instance; in each thread? I'm looking for something I can reference when updating the code.

@wds15
Copy link

wds15 commented Mar 3, 2019

There is some description of this new behavior under "side-effects" of the PR. In short, instantiation this object will ensure that the AD tape is initialized.

However, this PR is yet to be reviewed and things can change - so if you test this then you will probably have to change on your end some bits and pieces. Still, it would be valuable if you could confirm that Windows samples happily with this multiple chains (and it will even be faster).

@riddell-stan
Copy link
Contributor

@ahartikainen I've been copying pieces of this commit into separate small PRs. I hope you don't mind. (I'm eager to see if we can get this new threading fix in.)

@ahartikainen
Copy link
Contributor Author

Hi, looks great.

I have been a bit busy, so this totally fine.

@riddell-stan
Copy link
Contributor

Closing this in favor of #151 (conflicts are resolved).

@ahartikainen
Copy link
Contributor Author

Thanks for all the work. It looks great.

@ahartikainen ahartikainen deleted the feature/support_windows branch August 2, 2019 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants