Skip to content

Conversation

@FUDCo
Copy link
Contributor

@FUDCo FUDCo commented Jan 10, 2025

This PR implements the first phase of integration between the @agoric/swingset-liveslots package and our kernel.

This enables full vat-to-vat messaging, including serialization and deserialization of all passable value types, intervat object reference mapping, and cross-vat promise resolution. This means we now have a minimum viable multi-vat computational platform!

The word "minimal" there is important: there remain a few rough edges and a number of important things that still need to be done, which should be converted into issues of their own in our project plan.

Notable points:

  • This PR adds kernel support for the vat syscall and liveslots delivery interfaces, but while the skeleton for these is complete, the kernel does not yet implement the elements of these interfaces that realize distributed reference GC. Without this, the memory footprint of the vats and the kernel will grow over time without bound, which is, of course, bad, but it's entirely adequate for things like demos, proofs of concept, and of course as the jumping off point for actually implementing the DGC stuff. Issue Implement kernel-side support for liveslots distributed garbage collection #329 has been added for this.

  • The liveslots vat syscall interface is nominally a synchronous API, whereas the kernel<->vat message pathway is unavoidably asynchronous on account of how web APIs work. However, the only portions of this API that truly require synchronous invocation are those that pertain to vat storage access. We deal with this by implementing the vat storage portions of the syscall API directly in the vat process as part of the VatSupervisor class. Basically, each vat gets its own SQLite database for its state. One consequence of this is that, for the time being at least, vats run on top of a memory-only version of SQLite, which means the nominally persistent vat state is not actually persistent; our discussions have convinced us that for a lot of applications this ephemerality is actually acceptable. However, some applications will require genuine vat persistence. The way to achieve this is to run them inside web workers instead of iframes, where SQLite can make use of the synchronous part of the OPFS API (as the kernel does). However, the version of the code in this PR does not include support for vats in web workers; that will need to be added. Issue Implement support for running vats in web workers to enable SQLite to have synchronous OPFS access #332 has been added for this.

  • An additional point about wrapping the synchronous syscall interface around the asynchronous vat<->kernel interface: one reason we can get away with this is that, aside from the vat storage methods already discussed, the results returned by syscall methods largely don't matter -- the only thing returned is an error indication in the case of something going wrong, and none of these calls can go wrong as the result of user error. In other words, they should all always be successful except in the case of bugs in our own code. So the VatSupervisor implements these by shipping off the asynchronous syscall request to the kernel and then synchronously returning a success indicator to liveslots. This really should be OK, but probably some better logging is called for in order to catch bugs in our stuff in the event we have any (like that could ever happen, hah). Issue Make vats handle syscall failures better (or at all) #333 has been added for this.

  • In the same vein, in the interest of minimizing unanticipated implementation shear due to expectations built into liveslots, this PR implements the very serial one-crank-at-a-time model that SwingSet currently does. However, I don't think this is strictly necessary. In particular, since vats each have their own storage, they can implement internal message queues and then be allowed to run freely, with all of the side effects of a delivery being released to the kernel in a single blob upon crank completion. This is an opportunity for better parallelism and likely better performance, as well as giving us more protection from what happens if a piece of user code in a vat goes into an infinite loop. However, to support all this will require significant additions to both the vat supervisor and the kernel, so I'm kicking that can down the road for now. For the record I've added issue Enable vats to execute more independently of the kernel #334 for this.

  • Although the current vat store implementation is ephemeral, it should still support the complete virtual object and virtual collection APIs (e.g., exoclasses, baggage, et al) that liveslots makes available to vat user code. All that stuff should Just Work, modulo the fact that the bits will evaporate with their enclosing iframe. However, while I have high confidence that all this will, in fact, Just Work, none of this functionality has actually yet been tested in our environment, and you know what they say about untested code. Issue Test virtual object functionality in our environment #335 has been added for this.

  • All of the new intervat computation stuff and its supporting machinery implemented here currently lacks unit tests. This is an important gap, but I'm putting this PR up now anyway in the interest of making it all available to the whole team and of avoiding too much drift from the main branch. Issue Add tests for liveslots integration #336 as been added for this.

  • There were a number of issues with the @agoric packages as published not playing well with TypeScript. These issues are on the Agoric side, having to do with how they build and publish this stuff to npm. The Agoric folks are aware and supportive, but we will probably have to be more proactive on our side with respect to getting that stuff fixed. Everybody's expectation is that this is just some tedious build and configuration stuff that needs to be sorted out, but it's possible that we will end up having to be the ones to do it via a PR to Agoric. In the meantime, I've worked around these issues via the expedient of incorporating the necessary type declaration files into our own source tree in the kernel package. These were created by extracting the relevant declaration files from the Agoric sources in the agoric-sdk repo and then having Claude convert them from JSDoc to actual TypeScript (all these files have names beginning with ag- if you want to look at them). Once the npm packaging issues are dealt with, these can go away. Having our own type declaration files is bad mainly due to the risk of divergence if there are changes on the Agoric side; I'm not too worried about this as these APIs have been stable for some time, but still... I've added issue Arrange to get Agoric NPM packages fixed and absorb the results #337 for this.

  • Quite a lot of code has been added in a fairly short time, so there are certainly some code hygiene issues that should be addressed. I'm usually pretty obsessive about that kind of stuff, but, as with unit tests, in the interest of moving our process along I'm pushing now rather than taking additional time to make it all perfect and pretty first. In particular, I think there are some methods in Kernel, VatHandle, and VatSupervisor that should be private but aren't. Also, some additional love ought to be applied to the layout of the source files in terms of how things are ordered for the benefit of human readers. Also, more comments.

  • While we're on the subject of code hygiene, this exercise has really driven home to me just how clunky, awkward, ugly, and needlessly verbose doing things in the JS class pattern is, in contrast to the objects-as-closures pattern. I think I'd like to raise this as an internal issue in our team at some point.

  • I'd give even odds that there are one or two other notable points that I should be mentioning here but which I've lost track of in all the excitement. When and if I think of them I'll add additional comments to this PR.

Closes #29

@FUDCo FUDCo force-pushed the chip/integrate-liveslots branch from c024858 to 16e5423 Compare January 10, 2025 00:29
@socket-security
Copy link

socket-security bot commented Jan 10, 2025

👍 Dependency issues cleared. Learn more about Socket for GitHub ↗︎

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

Ignoring: npm/emoji-regex-xs@1.0.0, npm/@agoric/store@0.9.3-upgrade-18-dev-ef001c0.0, npm/@agoric/internal@0.4.0-u18.0, npm/@agoric/base-zone@0.1.1-upgrade-18-dev-ef001c0.0, npm/@agoric/swingset-liveslots@0.10.3-u18.0, npm/negotiator@0.6.4, npm/socks-proxy-agent@8.0.5, npm/@mapbox/node-pre-gyp@2.0.0-rc.0

View full report↗︎

Next steps

Take a deeper look at the dependency

Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support [AT] socket [DOT] dev.

Remove the package

If you happen to install a dependency that Socket reports as Known Malware you should immediately remove it and select a different dependency. For other alert types, you may may wish to investigate alternative packages or consider if there are other ways to mitigate the specific risk posed by the dependency.

Mark a package as acceptable risk

To ignore an alert, reply with a comment starting with @SocketSecurity ignore followed by a space separated list of ecosystem/package-name@version specifiers. e.g. @SocketSecurity ignore npm/foo@1.0.0 or ignore all packages with @SocketSecurity ignore-all

@FUDCo FUDCo force-pushed the chip/integrate-liveslots branch 2 times, most recently from 3e4a370 to b723f19 Compare January 10, 2025 18:51
@FUDCo FUDCo marked this pull request as ready for review January 10, 2025 20:19
@FUDCo FUDCo requested a review from a team as a code owner January 10, 2025 20:19
@rekmarks rekmarks changed the title Integrate kernel with liveslots feat: Integrate kernel with liveslots Jan 10, 2025
@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 10, 2025

@SocketSecurity ignore npm/@agoric/store@0.9.3-upgrade-18-dev-ef001c0.0 npm/@agoric/internal@0.4.0-u18.0 npm/@agoric/base-zone@0.1.1-upgrade-18-dev-ef001c0.0 npm/@agoric/swingset-liveslots@0.10.3-u18.0

Socket security appears to just be confused about these, per @mhofman

@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 10, 2025

@SocketSecurity ignore npm/negotiator@1.0.0 npm/unique-slug@5.0.0 npm/@npmcli/fs@4.0.0, npm/unique-filename@4.0.0 npm/negotiator@0.6.4 npm/@npmcli/fs@4.0.0 npm/@mapbox/node-pre-gyp@2.0.0-rc.0

Authors changed, yeah.

@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 10, 2025

@SocketSecurity ignore npm/socks-proxy-agent@8.0.5

Mature package that needs net access because it's fundamentally about net access.

@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 10, 2025

@SocketSecurity ignore npm/emoji-regex-xs@1.0.0

Not a typo squat, just an enhancement.

Copy link
Member

@rekmarks rekmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! I definitely haven't fully digested this work, but here I present some questions and suggestions that will hopefully bring me closer to understanding.

Copy link
Contributor

@sirtimid sirtimid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the ./packages/extension/src/ui/App.module.css

and replace

.rightPanel {
  position: relative;

with

.rightPanel {
  position: sticky;
  top: var(--spacing-lg);

so we don't lose the message panel on scroll?

Copy link
Contributor

@sirtimid sirtimid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just thinking that the const { log } = console; destructuring seems to bypass the purpose of having a structured logger utility. The original logger was created to add context (like [ocap kernel]) to log messages

Copy link
Contributor

@sirtimid sirtimid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following suggestion perhaps can be done on a follow up PR:

I think we can make the message passing between components (Vat → VatSupervisor → VatHandle → Kernel) simpler and safer by using a MessageBus pattern.

Right now we're doing a lot of type casting (like rawTarget as KRef) and passing messages through multiple components. Instead, we could have components talk through a MessageBus, which would remove the need for type casting - messages would have proper types from the start and make the flow simpler - components would just send messages to the bus instead of needing to know about each other.

Also having all messages in one place would make debugging and testing easier, since, with the MessageBus abstraction, we can move the syscall handling to the Kernel and remove direct kernel dependencies from VatHandle. some example code

// Instead of passing through multiple components:
Vat -> VatSupervisor -> VatHandle -> Kernel.deliver -> Kernel.routeMessage

// We'd just have:
Component -> MessageBus -> Subscribers

type SyscallOperation = 
  | 'send'
  | 'subscribe'
  | 'resolve'
  | 'exit'
  | 'dropImports'
  | 'retireImports'
  | 'retireExports'
  | 'abandonExports';

type MessageType = {
  type: 'syscall';
  vatId: VatId;
  operation: SyscallOperation;
  payload: unknown;
} | {
  type: 'delivery';
  target: VRef;
  message: Message;
} | {
  type: 'notification';
  resolutions: VatOneResolution[];
};

interface MessageBus {
  publish(message: MessageType): Promise<void>;
  subscribe(handler: (message: MessageType) => Promise<void>): void;
  unsubscribe(handler: (message: MessageType) => Promise<void>): void;
}

// Improved Kernel class
export class Kernel {
  #messageBus: KernelMessageBus;

  async #handleMessage(message: KernelMessage): Promise<void> {
    switch (message.type) {
      case 'send': {
        const { target, message: msg, vatId } = message.payload;
        // No type casting needed, flow is more direct
        const route = this.#routeMessage({ target, message: msg });
        if (route?.vatId) {
          const vat = this.#getVat(vatId);
          await vat.deliverMessage(
            this.#translateRefKtoE(vatId, target, false),
            this.#translateMessageKtoE(vatId, msg),
          );
        }
        break;
      }
    }
  }
}

// and we can call it like this:

this.#messageBus.publish({
  type: 'syscall',
  vatId: this.vatId,
  operation: 'send',
  payload: { target, message },
});

@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 15, 2025

I am just thinking that the const { log } = console; destructuring seems to bypass the purpose of having a structured logger utility. The original logger was created to add context (like [ocap kernel]) to log messages

Bypassing the structured logger was indeed the purpose - getting rid of visual clutter to make debugging easier.

@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 16, 2025

With respect to the message bus idea, I'm think I'm just not getting it.

It is definitely true that there's a tedious amount of casting, but I think most of it is due to impedance mismatch between our type definitions and those that come bundled with liveslots, which this proposal won't help at all. (The mismatch can largely be summarized as: they define a bunch of things as strings which we are more specific about. Once we get Agoric's NPM packaging under control I think we can fix the typedefs on their end.)

One thing about the current message setup that I do find maddening is the comingling of the vat command channel, in which the kernel sends requests, to which the vat replies, with the syscall channel, in which the vat sends requests, to which the kernel replies. But I don't see how the message bus idea helps with that; if anything it seems to continue it under another guise.

I also don't follow the motivation for moving syscall handling into the kernel. We want syscalls to be handled in VatHandle because it's the thing that holds the vat-scoped kernel state associated with the vat issuing the syscall; that's basically its job. Handling syscalls in the kernel would just add an additional parameter to a bunch of methods that would then be used to look up the appropriate VatHandle instance so it could be queried; that doesn't feel like an improvement.

Basically, I don't see what this buys us. But it's entirely possible I'm just not understanding what is actually being proposed here, so we should talk.

@FUDCo FUDCo force-pushed the chip/integrate-liveslots branch from 5e3da2d to 3d40600 Compare January 16, 2025 02:40
@FUDCo FUDCo requested review from rekmarks and sirtimid January 16, 2025 09:21
@sirtimid
Copy link
Contributor

With respect to the message bus idea, I'm think I'm just not getting it.

It is definitely true that there's a tedious amount of casting, but I think most of it is due to impedance mismatch between our type definitions and those that come bundled with liveslots, which this proposal won't help at all. (The mismatch can largely be summarized as: they define a bunch of things as strings which we are more specific about. Once we get Agoric's NPM packaging under control I think we can fix the typedefs on their end.)

One thing about the current message setup that I do find maddening is the comingling of the vat command channel, in which the kernel sends requests, to which the vat replies, with the syscall channel, in which the vat sends requests, to which the kernel replies. But I don't see how the message bus idea helps with that; if anything it seems to continue it under another guise.

I also don't follow the motivation for moving syscall handling into the kernel. We want syscalls to be handled in VatHandle because it's the thing that holds the vat-scoped kernel state associated with the vat issuing the syscall; that's basically its job. Handling syscalls in the kernel would just add an additional parameter to a bunch of methods that would then be used to look up the appropriate VatHandle instance so it could be queried; that doesn't feel like an improvement.

Basically, I don't see what this buys us.

Yeah ok, I think I was wrong about the benefits of a message bus here. I didn't think of the additional lookups. That would actually make it more complex. The current architecture actually makes more sense since VatHandle needs to keep vat-specific state and syscall handling together. Thank you for the thoughtful critique!

sirtimid
sirtimid previously approved these changes Jan 16, 2025
Copy link
Contributor

@sirtimid sirtimid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but a second approval would be great for confirmation.

@FUDCo FUDCo force-pushed the chip/integrate-liveslots branch from 63b5590 to a98115e Compare January 16, 2025 18:28
@rekmarks
Copy link
Member

Bypassing the structured logger was indeed the purpose - getting rid of visual clutter to make debugging easier.

I'm thinking about what to do about this. @FUDCo your log calls seem to add their own structure, e.g.

log(`@@@@ done ${vatId} notify ${kpid}`);

I suppose the structured logger would prepend that call with e.g. [kernel], and that adds visual clutter during debugging. Some questions:

  • Do you see your debug logs as a temporary measure, to be removed later?
  • I think the structured logger is probably good for logs that we will want to have on an ongoing basis. Is that also your view, or do you think we should get rid of it?
    • Should we add a debug logging utility?

those whinges having been introduced by the nodejs package
@socket-security
Copy link

Report too large to display inline

View full report↗︎

@FUDCo
Copy link
Contributor Author

FUDCo commented Jan 16, 2025

Bypassing the structured logger was indeed the purpose - getting rid of visual clutter to make debugging easier.

I'm thinking about what to do about this. @FUDCo your log calls seem to add their own structure, e.g.

log(`@@@@ done ${vatId} notify ${kpid}`);

I suppose the structured logger would prepend that call with e.g. [kernel], and that adds visual clutter during debugging. Some questions:

  • Do you see your debug logs as a temporary measure, to be removed later?

  • I think the structured logger is probably good for logs that we will want to have on an ongoing basis. Is that also your view, or do you think we should get rid of it?

    • Should we add a debug logging utility?

Yeah, the debug log output was intended to be ephemeral. The @@@@ helps spotting the messages in the output but it's mainly there to have an easy pattern to search for when removing the logging calls from the code before shipping it.

More generally, I've found a good log of message traffic (and very little else) is one of the most valuable tools for debugging distributed systems, and have invested a lot of effort over the years in making the output of these log entries legible. This is in direct conflict with most logging utilities which just keep adding stuff to log entries that makes the them less legible. (When I'm debugging some hairy network protocol thing I find I often end up spending ridiculous amounts of time in emacs replacing strings and reformatting things to boil log files down to the just the stuff I want to see). I suspect the long term solution is some kind of structured log thing where the entries are JSON that can be post-processed with jq and similar tools, so that log entries can accumulate all the crap that loggers inevitably end up piling on but still be quickly filtered to something usable. There are a lot of logging tools which try to do something like this but thus far I've found them all unsatisfying.

@boneskull
Copy link

worried my comment may be within a fold somewhere, so: #317 (comment)

Copy link
Member

@rekmarks rekmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@FUDCo FUDCo merged commit d989cda into main Jan 16, 2025
16 checks passed
@FUDCo FUDCo deleted the chip/integrate-liveslots branch January 16, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable message passing between objects that live in different vats

6 participants