Skip to content

Add upcall for delegation#556

Closed
sangho2 wants to merge 3 commits intomainfrom
sanghle/upcall
Closed

Add upcall for delegation#556
sangho2 wants to merge 3 commits intomainfrom
sanghle/upcall

Conversation

@sangho2
Copy link
Contributor

@sangho2 sangho2 commented Dec 13, 2025

This PR adds the upcall trait to let platforms delegate handling of unknown
messages/requests (e.g., OP-TEE messages from the normal-world/VTL0
kernel) to other layers of LiteBox (i.e., runner or shim).

@sangho2 sangho2 changed the title Add an upcall trait to handle OP-TEE messages Add an upcall trait to delegate handling of OP-TEE messages Dec 13, 2025
@sangho2 sangho2 changed the title Add an upcall trait to delegate handling of OP-TEE messages Add an upcall trait for delegation Dec 13, 2025
@sangho2 sangho2 marked this pull request as ready for review December 13, 2025 00:50
@wdcui wdcui requested a review from jstarks December 13, 2025 01:20
@sangho2 sangho2 changed the title Add an upcall trait for delegation Add upcall for delegation Dec 13, 2025
@github-actions
Copy link

🤖 SemverChecks 🤖 No breaking API changes detected

Note: this does not mean API is unchanged, or even that there are no breaking changes; simply, none of the detections triggered.

// Placeholder for now
let upcall = crate::UPCALL.get().expect("OP-TEE upcall not registered");
let mut ctx = litebox_common_linux::PtRegs::default();
let _ = upcall.execute(&mut ctx);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to ignore the return the value here?

Seems like this should be properly handled/propagated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As wrote in the comment, this is just a placeholder. we should handle the return value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: keeping a TODO:/XXX:/... or such signifier makes it easier to grep in the future quickly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +107 to +108
type Parameter = litebox_common_linux::PtRegs;
type Return = litebox_common_linux::PtRegs;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to know why you need the registers here?

Do you intend to run a thread during upcall.execute?

Copy link
Contributor Author

@sangho2 sangho2 Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for defining parameter and return types. We don't specify any particular data types here yet. The reason I use PtRegs here is because it is the most common data type used in many places in the LiteBox code base.

@wdcui wdcui requested a review from jaybosamiya-ms January 17, 2026 00:38
@wdcui
Copy link
Member

wdcui commented Jan 17, 2026

@jaybosamiya-ms It would be good to have you take a look at this PR since it changes litebox.

Copy link
Member

@jaybosamiya-ms jaybosamiya-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments throughout, but I have a more general comment about the design/idea itself: this is roughly equivalent to the functionality that already exists in litebox::shim::... right? This feels like adding a parallel way of handling roughly similar things, which seems suboptimal. Until we can clearly articulate when that should be used and when this should be used, I think this PR should not be merged, which is why I am marking this as "Request changes".

I am happy to discuss this IRL more if it would help tweak the design of the existing litebox::shim::... to support the use case (or maybe it already can support the use case?). Currently, this PR is only introducing the new functionality without the use case, which makes it a little hard to understand why the required use case cannot be handled by existing litebox::shim::EnterShim and related bits.

CC: @jstarks I expect you'll have some useful insights on this PR

Comment on lines +9 to +10
//! Examples of such messages or requests include HVCI/Heki requests from
//! VTL0 and OP-TEE SMC calls from the normal world.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would signals from the outside world be a similar situation (i.e., should they be handled through this mechanism)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the Linux shim case that all signals will be handled by the shim, certain exceptions/interrupts will be handled by the LVBS platform whereas some other ones are handled by the OP-TEE shim.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine right? The platform can just decide not to forward along things to the shim. There is nothing in the design that says that the platform must forward everything over, right? Maybe I am misunderstanding why the signals world and the upcall world are drastically different. Might be easier to chat in-person?

Copy link
Contributor Author

@sangho2 sangho2 Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the current explanation is misleading. It is more about there are something the platform doesn't know how to handle and delegates them to the others if applicable.

Comment on lines +17 to +19
//! calls within LiteBox. However, care must be taken to ensure that the upcall's
//! parameters and return values are properly validated and sanitized to prevent
//! potential security vulnerabilities. This is because the parameters might be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"care must be taken" -> by whom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the upcall handler. will clarify this.

Comment on lines +24 to +26
//! does not have semantics to validate them). We can specify a function for early
//! validation at the platform side if needed but its advantages are not clear at
//! this moment (since there is no costly context switch within LiteBox).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this sentence is surprising to see as part of the doc string. Might be better just as a comment, rather than being in the doc string.

Comment on lines +38 to +40
/// Initialize the upcall handler. Must be called by the platform exactly once.
/// Per-thread initialization is possible but all threads must share the same
/// upcall handler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing. What happens if the platform does not invoke this? (I might have missed something but I didn't see the platform invoke this here). I also don't understand the "per-thread initialization is possible but ..." comment here. Should each thread invoke this? Or should different threads within the platform coordinate to make sure they have not invoked this more than once?

A slightly easier-to-use interface is likely something like "get singleton", so that the platform doesn't need to store/manage this, and the runner side can choose to make it more/less performant as it needs. Alternatively, one can just eliminate initialization and set up the execute to initialize on first use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this is a callback registration. I agree that the comment is confusing. I'll fix this.

Comment on lines +41 to +45
fn init(
&self,
) -> alloc::boxed::Box<
dyn crate::upcall::Upcall<Parameter = Self::Parameter, Return = Self::Return>,
>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need self to get essentially a Box<dyn itself>? This is a confusing signature. Where do you get teh original Upcall to even invoke this initialization? Why is it returning a boxed version of itself?

Copy link
Contributor Author

@sangho2 sangho2 Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the runner has a function and registers it to the platform through this (Platform::register_upcall).

/// platform validates the parameters, the implementation of `execute` must validate
/// parameters to avoid potential security vulnerabilities. Also, it must sanitize
/// the return values before returning them to the platform.
fn execute(&self, ctx: &mut Self::Parameter) -> Result<Self::Return, UpcallError>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly surprised by seeing the &mut here for the parameter. Due to lifetime elision and such, Self::Return's lifetime cannot depend on that mutable reference anyways. A slightly cleaner design might be to just pass the parameter as a (consumed/owned) value? Alternatively, the Return should have a generic lifetime that gets bound to the lifetime of the parameter (or even more generally, it would get bound to a new lifetime that is itself bounded by the lifetimes of &self and ctx).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Technically, we don't need to struggle with lifetime. let me use values.

Comment on lines +58 to +59
#[error("Upcall failed")]
Failure,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed due to? It becomes impossible in the current design for the upcall to give more useful information to the place below. Would be good to make this generic in some way (can do it generically, or can do a Failure(Box<dyn Error>) might be enough).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do that.

Comment on lines +60 to +61
#[error("Upcall needs to be retried")]
Retry,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the upcall needs to be retried, it needs to mention when it needs to be retried, right? Otherwise, if "retry immediately" is allowed, might as well do the retrying internally. The failure case is that "it cannot be handled right now, please try later", but yeah, something else feels necessary here in the output so that a platform can know when to retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Technically, there is no easy way to know when the platform can try it again. Let me think.

// Placeholder for now
let upcall = crate::UPCALL.get().expect("OP-TEE upcall not registered");
let mut ctx = litebox_common_linux::PtRegs::default();
let _ = upcall.execute(&mut ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: keeping a TODO:/XXX:/... or such signifier makes it easier to grep in the future quickly

Comment on lines +395 to +401
upcall: &'static (
dyn litebox::upcall::Upcall<
Parameter = litebox_common_linux::PtRegs,
Return = litebox_common_linux::PtRegs,
> + Send
+ Sync
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, this looks like rustfmt did a horrifyingly bad job here. Consider putting the type into a type alias (type Upcall = ...) to make this be nicer maintained?

@sangho2
Copy link
Contributor Author

sangho2 commented Jan 17, 2026

Minor comments throughout, but I have a more general comment about the design/idea itself: this is roughly equivalent to the functionality that already exists in litebox::shim::... right? This feels like adding a parallel way of handling roughly similar things, which seems suboptimal. Until we can clearly articulate when that should be used and when this should be used, I think this PR should not be merged, which is why I am marking this as "Request changes".

I am happy to discuss this IRL more if it would help tweak the design of the existing litebox::shim::... to support the use case (or maybe it already can support the use case?). Currently, this PR is only introducing the new functionality without the use case, which makes it a little hard to understand why the required use case cannot be handled by existing litebox::shim::EnterShim and related bits.

CC: @jstarks I expect you'll have some useful insights on this PR

We can definitely discuss this later. This design is based on some discussion between John, Weidong, and me. Long story short, we need a runner or something below the shim to support this stuff. For example, two of this upcall's main usages are loading ELF and invoking run_thread that shim cannot do, like https://github.com/microsoft/litebox/pull/564/changes#diff-3eeb171b29f4643a455abd0bf670819afc33087f009eb4f6300fff0364f19d6bR127

Anyhow, what we need is there is a function in the runner which should be invoked by the platform through a simple interface.

@sangho2 sangho2 marked this pull request as draft January 21, 2026 00:01
@sangho2
Copy link
Contributor Author

sangho2 commented Jan 21, 2026

Discussion result: Instead of defining a new interface, we will add one function to EnterShim to support this feature.

@sangho2
Copy link
Contributor Author

sangho2 commented Jan 23, 2026

Turns that using EnterShim is not straightforward or may be not possible. The reason is that our current implementation passes shim to the platform via run_thread (we no longer explicitly register a shim to the platform). Perhaps we need to use litebox_common_optee for now (sounds like litebox_common_* are becoming a collection of second-class providers).

@sangho2 sangho2 closed this Jan 28, 2026
@sangho2
Copy link
Contributor Author

sangho2 commented Jan 28, 2026

We decided to move vtl_switch_loop and its sub functions to litebox_runner_lvbs. To this end, we no longer need upcall.

@sangho2 sangho2 deleted the sanghle/upcall branch February 6, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants