-
Notifications
You must be signed in to change notification settings - Fork 4k
[WIP] ARROW-4418 [Plasma] replace event loop with boost::asio for plasma store #5736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Let me try to address all these conflicts first |
8f2c6c9 to
65560d2
Compare
revert some changes replace event loop with asio update plasma store protocol fix qualifiers update plasma store client protocol Remove all native socket operations. Implement general io support Fix bugs fix all compiling bugs fix bug Fix all tests. Add license header. try to fix cmake try to make asio standalone simplify code add license update url lint lint & fix fix restore entrypoint remove unused unix headers fix Update LICENSE fix rename handle signal move the function to its original place fix doc hide classes stop installing asio headers fix doc reverse changes minor fix tiny fix fix comments minor fix resolve conflicts fix optimize cmake fix update formatter fix according to github comments lint prevent the store from dying fix ARROW_CHECK Fix ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors). Author: Micah Kornfield <emkornfield@gmail.com> Author: Antoine Pitrou <antoine@python.org> Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits: 4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module a66f999 <Micah Kornfield> make format 040216d <Micah Kornfield> fixes for comments outside python 729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully) ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status()) 21e1b95 <Micah Kornfield> fix compilation 9c905b0 <Micah Kornfield> fix lint 74d563c <Micah Kornfield> fixes 85786ef <Micah Kornfield> change messages 3626a90 <Micah Kornfield> try removing message a4e6a1f <Micah Kornfield> add logging for debug 4586fd1 <Micah Kornfield> fix typo 8f011b3 <Micah Kornfield> fix status propagation 317ea9c <Micah Kornfield> fix complie 9f59160 <Micah Kornfield> don't make_shared inline 484b3a2 <Micah Kornfield> style fix 14e3467 <Micah Kornfield> dont rely on rtti cd22df6 <Micah Kornfield> format dec4585 <Micah Kornfield> not-quite pluggable error codes fix merge fix update update update update fix update fix update update revert some unknown comments rebase CMakeLists rebase eviction_policy.h rebase CMakeLists rebase
|
Just experienced a terrible rebase, something may have messed up. I'll try to fix them. |
|
Hi @suquark! I've been looking at using Arrow on Windows, but I noticed in your PR there are some Windows incompatibilities. For |
|
hi @mehrdadn thanks for you comment! yep, one purpose of this PR is to prepare for multi-platform support. |
|
I'm going to close this for now, you can re-open once it is ready for review. |
|
@suquark Hey Ryans, I was wondering what the state of this PR is. Do you recall what the "best" commit was, and whether it was ready to be merged? Were there any blocking issues (aside from rebase + code-review)? I'm considering taking a look and seeing if I can rebase it (and hopefully add Windows support for), but I'm not sure what commit is the best one to look at or whether there were any pending issues I should keep in mind, so if you could let me know that'd be great. Thanks! |
|
@mehrdadn The major issue is that the PR involves too many files, and it keeps conflicting with the master branch (the plasma API changes heavily over time). After I kept it untouched for a few months, I failed to rebase it on master anymore, because too much conflicts happen across too much commits. So later I tried to squash and skip some commits while rebasing it on master, but unfortunately something seems broken after the rebase. It takes time to figure out which part is wrong, because it is a bit messy now. I think I should be still responsible for cleaning up the messy stuff, otherwise it would be too hard for other developers to understand the messy part. Let me see if I can divide it into smaller PRs. The first part is https://issues.apache.org/jira/browse/ARROW-8030 |
|
Ah I see, that sounds like a lot of work. If it's only intended for Windows compatibility, do you feel it's worth the cost? I'm unclear on what's involved, but I (naively, I think) feel like merely communicating an IP address and switching to To give you some idea of possible issues, I'll think aloud here; I don't really know what the right approach is. One thing that's complicated (regardless of whether we use Asio) is that there's a fundamental difference between UNIX sockets and TCP sockets, and that is the fact that pre-allocating TCP/IP sockets is not a great idea: if we allocate a port and tell Plasma to use that port, we have to free the port first (so Plasma can bind), which poses an inherent race condition because someone else might allocate it in the meantime. Which I guess could be an issue with UNIX-domain sockets too, except names can be selected to avoid accidental collision. This means that, to use TCP sockets, we'd ideally let Plasma allocate its own port, and then tell us what that port number is somehow. This is also somewhat incorrect too, because it's possible to duplicate sockets on Windows across processes, so presumably we could allocate the socket and pass it to Plasma. But doing so would not be straightforward because it requires the owner of the socket to know the PID of the target process ( Another solution is to use Windows named pipes, which are very similar to UNIX-domain sockets as far as naming goes. The thing is though, they don't interoperate with socket APIs, and Boost.Asio kinda has its own separate abstraction over them, and I have yet to find a common abstraction between those classes and the socket classes (though maybe there is something and I just haven't found it). So this would probably be nontrivial too. I can also think of more complicated solutions, like using named pipes for initial discovery, TCP for subsequent communication. So this is another approach, and it may be easier? Anyway, these are just the conceptual issues I know. There are also some other considerations I'm probably not familiar with. I do see one such issue might be the fact that duplicating handles (e.g. Overall, I do think some of these would indeed likely benefit from restructuring of the code around Boost.Asio, but I'm much less clear on how much they would benefit. If the entire goal of the PR was Windows compatibility for Plasma, I'm not sure if it would reduce the work for that to (say) 20% of what it would be otherwise, or just (say) 80%; for all I know, it could be either of those. So if that's the case, since you're much more familiar with Arrow, my suggestion would be this: before putting in a massive amount of effort into this, it might not be a bad idea to consider how difficult it would be to make TCP sockets work (on Linux), since if they do, they can be ported rather directly to Windows. If you feel it's significantly easier (at least conceptually), maybe we should consider doing that instead; I can try to give it a shot and let you know if I run into trouble. But if you feel it might be difficult, then maybe getting this PR to work is better. |
|
@mehrdadn Thanks for your reply! Yep, your concerns totally make sense. This PR is not a direct path for windows support, but it does try to address some issues about windows support:
By using asio we can get rid of these issues pretty easy. This PR was meant to move fast so that we can implement windows support after it is merged. Do you think these two issues seems significant for you? If they are, then it should still be a good idea to first merge this PR. However, as you mentioned, this PR is not a solution for:
|
|
Regarding UNIX-only headers, it can be anything from trivial to impossible depending on which APIs we need. I actually already have a lot of shims for some headers in Ray (see here), but they're fundamentally incomplete. For example, I have So the question comes down to what APIs we need from those headers. For basic file descriptor I/O like For (b), it can be worked around (I just did this for Redis by using For (2), it's a nuisance to deal with For TCP, the port race condition isn't Windows-specific; it'd occur on any OS. But I think you're right, it might not be an issue, because in practice you can just ask the server to start on a random port, and if it's not allocated, just keep re-launching it with more random ports until it finds one (which is what Ray seems to do). It's not an elegant and one-shot like what I was imagining, but in practice it's probably good enough, at least as there's a way to verify the server has succeeded in allocating the port the client is launched. As for named pipes, well, named pipes are most definitely stable; they're a very core part of Windows and are used in many places outside it as well (e.g. see Chromium), and they're superior to sockets at least performance-wise if not in other ways. It's just that they're non-portable and only support local communication, so people don't bother supporting them if they don't feel they need to (maybe their bottleneck is elsewhere), or if they feel they might need remote communication over the internet. Oh, there's also one more issue, which is that you'd want to check |
|
Hi Ryans, I cherry pick your PR and did some work based on it. I appreciate it a lot :) I know you got some trouble in resolve conflict, and I'm happy to help you with that if you don't mind. Feel free to let me know if you need help. |
|
I create another PR that makes rebase easier: #6579 |
|
@jikunshang Thanks for your comments! Currently I am thinking about how to divide this huge PR into smaller ones, because even after we work out how to address all conflicts, the review process would be slow. It could be helpful if you have any ideas. At least for next several PR I am going to perform some code cleaning first. |
|
@suquark Thanks for your reply. I don't have idea about how to break down yet, since import boost::asio would have about 1000+ lines code... |
|
Currently I am waiting for the merge of #6587 |
|
No rush on this; I think I have workarounds for Arrow as far as Windows goes! |
reopen this PR. See #3704 for the older one.