-
Notifications
You must be signed in to change notification settings - Fork 6
feat(kernel): Handle syscall failures #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4508691 to
1ce1511
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On first read this looks very good.
I've clicked the Github checkbox that says "Request changes" but I'm not actually requesting changes per se, but just that you take another look with the following concern in mind (which I fear requires some long winded explanation):
A key difference between our implementation and SwingSet is the way we handle the vatstore. Rather than updating the vatstore via syscalls, all vatstore changes are delivered as a single unit as part of the crank completion record that gets sent back to the kernel after the vat supervisor has processed a delivery.
The main job that SwingSet's transaction rollback machinery did was to prevent vatstore changes (caused by vatstore syscalls) from becoming permanent if there was a crank failure, but our crank completion mechanism already does that simply by not sending the changes if there's a failure. Nevertheless, we still have vat-specific kernel state changes that happen on the kernel side as consequences of both syscall handling and the the act of delivery itself, e.g., clist changes, kernel promise resolutions, run queue updates, etc. The rollback stuff implemented in this PR should take care of those.
However, this means we now have two different kinds of "abandon persistent state in the event of error" mechanism that operate side by side. It's worth double checking to make sure that they aren't doing anything to interfere with each other's correct operation (in particular, due to the order of operations at the end of a crank). I tried to follow the logic of this through myself and I think it's all OK, but the changes here are sufficiently extensive that I'm not 100% confident of my analysis. Given that you presumably have a more complete and accurate mental model of what you've done here than I'm able to readily swap into my brain, I'd appreciate it if you'd take a look back at things with the above described concern in mind.
If your analysis turns up any problems we can deal with them then. Otherwise, if after checking this over you conclude that everything is copacetic (the likely outcome imho), let me know and I'll approve (if somebody else doesn't beat me to it).
1ce1511 to
f5ad6db
Compare
@FUDCo This was a vert spot on comment. With this change we now prevent the vat kvstore from being updated on syscall failures. Let me know if you think this is sufficient. |
rekmarks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few minor things, including some items that could / should be handled in a follow-up.
| async waitForSyscallsToComplete(): Promise<void> { | ||
| while (this.pendingSyscalls > 0) { | ||
| await delay(10); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up issue: Thinking about the possibilities of interleaving here makes me a little bit queasy. As a follow-up, we should create a device that stores a promise which resolves whenever the pending syscall count reaches 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the possibilities of interleaving? What about if there aren't any syscalls? Whatever I can think of is still relying on that pendingSyscalls value but feel free to suggest another solution.
rekmarks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes: #333
This change updates our syscall-handling plumbing so that any kernel-side failure during a liveslots syscall is no longer silently ignored. Instead, we: