Skip to content

MSI-X support#2101

Open
cagatay-y wants to merge 2 commits intohermit-os:mainfrom
cagatay-y:cloud-hypervisor
Open

MSI-X support#2101
cagatay-y wants to merge 2 commits intohermit-os:mainfrom
cagatay-y:cloud-hypervisor

Conversation

@cagatay-y
Copy link
Contributor

@cagatay-y cagatay-y commented Dec 1, 2025

cloud-hypervisor only supports MSI-X interrupts for PCI devices, so support for MSI-X is needed to support running on it. Additionally, MSI-X can allow us to set separate interrupt handlers for the configuration change interrupts and queue updates (even per-queue handlers once we have multiple queues) in the future.

The implementation does not work on QEMU with the TAP network device (tested with cargo xtask ci rs --arch x86_64 --profile dev --package httpd --features ci,hermit/virtio-net qemu --accel --sudo --devices virtio-net-pci --tap) and possibly other platforms as well. It works with QEMU user network device in addition to cloud-hypervisor.

@mkroening mkroening self-assigned this Dec 1, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: 019e89f Previous: 77e74a5 Performance Ratio
startup_benchmark Build Time 87.86 s 90.63 s 0.97
startup_benchmark File Size 0.86 MB 0.86 MB 1.00
Startup Time - 1 core 0.97 s (±0.04 s) 0.97 s (±0.04 s) 1.00
Startup Time - 2 cores 0.97 s (±0.03 s) 0.96 s (±0.03 s) 1.01
Startup Time - 4 cores 0.97 s (±0.03 s) 0.98 s (±0.04 s) 0.99
multithreaded_benchmark Build Time 88.18 s 93.04 s 0.95
multithreaded_benchmark File Size 0.96 MB 0.96 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 92.90 % (±9.76 %) 88.28 % (±8.28 %) 1.05
Multithreaded Pi Efficiency - 4 Threads 44.86 % (±3.60 %) 43.79 % (±3.40 %) 1.02
Multithreaded Pi Efficiency - 8 Threads 25.61 % (±2.22 %) 25.71 % (±1.71 %) 1.00
micro_benchmarks Build Time 97.84 s 100.66 s 0.97
micro_benchmarks File Size 0.96 MB 0.97 MB 1.00
Scheduling time - 1 thread 69.89 ticks (±3.46 ticks) 73.51 ticks (±4.88 ticks) 0.95
Scheduling time - 2 threads 37.18 ticks (±3.36 ticks) 40.22 ticks (±4.45 ticks) 0.92
Micro - Time for syscall (getpid) 3.05 ticks (±0.21 ticks) 3.17 ticks (±0.23 ticks) 0.96
Memcpy speed - (built_in) block size 4096 63962.20 MByte/s (±45686.66 MByte/s) 63640.01 MByte/s (±45901.68 MByte/s) 1.01
Memcpy speed - (built_in) block size 1048576 29419.88 MByte/s (±24437.97 MByte/s) 29541.88 MByte/s (±24622.61 MByte/s) 1.00
Memcpy speed - (built_in) block size 16777216 28097.21 MByte/s (±23434.97 MByte/s) 25148.89 MByte/s (±21271.92 MByte/s) 1.12
Memset speed - (built_in) block size 4096 64404.80 MByte/s (±45994.65 MByte/s) 64656.50 MByte/s (±46596.27 MByte/s) 1.00
Memset speed - (built_in) block size 1048576 30217.63 MByte/s (±24894.79 MByte/s) 30310.85 MByte/s (±25062.86 MByte/s) 1.00
Memset speed - (built_in) block size 16777216 28893.94 MByte/s (±23894.19 MByte/s) 25932.09 MByte/s (±21770.52 MByte/s) 1.11
Memcpy speed - (rust) block size 4096 58724.66 MByte/s (±43644.87 MByte/s) 55967.04 MByte/s (±41115.97 MByte/s) 1.05
Memcpy speed - (rust) block size 1048576 29501.93 MByte/s (±24478.36 MByte/s) 29456.39 MByte/s (±24599.53 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 28096.83 MByte/s (±23434.23 MByte/s) 24623.14 MByte/s (±20742.31 MByte/s) 1.14
Memset speed - (rust) block size 4096 59555.43 MByte/s (±44160.51 MByte/s) 57111.44 MByte/s (±41931.40 MByte/s) 1.04
Memset speed - (rust) block size 1048576 30242.77 MByte/s (±24911.79 MByte/s) 30255.14 MByte/s (±25048.12 MByte/s) 1.00
Memset speed - (rust) block size 16777216 28854.94 MByte/s (±23862.51 MByte/s) 25386.50 MByte/s (±21230.33 MByte/s) 1.14
alloc_benchmarks Build Time 93.72 s 94.05 s 1.00
alloc_benchmarks File Size 0.93 MB 0.93 MB 1.01
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 10895.28 Ticks (±149.39 Ticks) 10646.81 Ticks (±140.58 Ticks) 1.02
Allocations - Average Allocation time (no fail) 10895.28 Ticks (±149.39 Ticks) 10646.81 Ticks (±140.58 Ticks) 1.02
Allocations - Average Deallocation time 4707.62 Ticks (±569.39 Ticks) 949.23 Ticks (±336.69 Ticks) 4.96
mutex_benchmark Build Time 94.36 s 93.64 s 1.01
mutex_benchmark File Size 0.96 MB 0.96 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 13.72 ns (±0.87 ns) 13.36 ns (±0.66 ns) 1.03
Mutex Stress Test Average Time per Iteration - 2 Threads 16.68 ns (±0.88 ns) 21.94 ns (±17.08 ns) 0.76

This comment was automatically generated by workflow using github-action-benchmark.

@cagatay-y cagatay-y force-pushed the cloud-hypervisor branch 2 times, most recently from 1bcd51c to 2f3ab43 Compare January 28, 2026 17:15
@cagatay-y cagatay-y closed this Jan 28, 2026
@cagatay-y cagatay-y reopened this Jan 28, 2026
@cagatay-y cagatay-y force-pushed the cloud-hypervisor branch 5 times, most recently from 844587c to d8017b5 Compare January 29, 2026 16:12
@cagatay-y cagatay-y marked this pull request as ready for review January 29, 2026 16:13
@mkroening
Copy link
Member

mkroening commented Feb 4, 2026

pci_types is updated as of #2220.

@cagatay-y cagatay-y force-pushed the cloud-hypervisor branch 2 times, most recently from 938d062 to 1ba6965 Compare February 5, 2026 15:50
cloud-hypervisor only supports MSI-X interrupts for PCI devices, so
support for MSI-X is needed to support running on it. Additionally,
MSI-X can allow us to set separate interrupt handlers for the
configuration change interrupts and queue updates (even per-queue
handlers once we have multiple queues) in the future.

MSI-X is not working correctly or at all on all platforms, so it is used
as a fallback.
Copy link
Member

@mkroening mkroening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)

Could you also add how to run this with QEMU user networking and cloud hypervisor to the PR description?

Is this tested in CI already?

Comment on lines +540 to +546
#[repr(C)]
pub(crate) struct MsixEntry {
addr_low: u32,
addr_high: u32,
data: u32,
control: u32,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need volatile access here, we could also use volatile::VolatileFieldAccess to make accessing fields more ergonomic.

Comment on lines +540 to +546
#[repr(C)]
pub(crate) struct MsixEntry {
addr_low: u32,
addr_high: u32,
data: u32,
control: u32,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to upstream this to pci_types?

That crate currently does not use any volatile operations at all, though. Is volatile necessary here? Should pci_types use volatile for some current operations? Or does this live in the PCI config region and should be accessed via pci_types::ConfigRegionAccess instead?

}
debug!("{:?}", self.checksums);

// If self.irq is some, it was filled with a legacy interrupt number which we prefer over MSI-X.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a world where MSI-X works everywhere, would we still want to prefer legacy interrupts?


let irq = device.get_irq();
if irq.is_none() {
warn!("No interrupt lanes found for virtio-net.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be an error instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the section 7.5.1.1.13 of the PCIe specification version 6.0, "A [Interrupt Pin Register] value of 00h indicates that the Function uses no legacy interrupt Message(s)," so a device may be well-behaving and still not have an interrupt pin and line, in which case we may still be able to use MSI-X. There are reserved and unknown values and values that indicate no connection to the interrupt controller but for get_irq we do not differentiate between these and return None just like we do for the no legacy interrupt message(s) case.

Comment on lines +265 to +269
#[cfg(all(
feature = "virtio-net",
not(feature = "rtl8139"),
target_arch = "x86_64"
))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why gate this on not(feature = "rtl8139")?
And why gate this on feature = "virtio-net"? Should we not enable it for all virtio devices?

Co-authored-by: Martin Kröning <mkroening@posteo.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants