Skip to content

flowey: split openvmm-deps into separate archives and use Linux 6.18 test kernel#3530

Open
moor-coding wants to merge 4 commits into
microsoft:mainfrom
moor-coding:update-test-kernel-6.18
Open

flowey: split openvmm-deps into separate archives and use Linux 6.18 test kernel#3530
moor-coding wants to merge 4 commits into
microsoft:mainfrom
moor-coding:update-test-kernel-6.18

Conversation

@moor-coding
Copy link
Copy Markdown
Contributor

@moor-coding moor-coding commented May 20, 2026

Summary

Update the flowey openvmm-deps resolver to handle the new split artifact structure from openvmm-deps 0.3.0-33, and switch the test kernel from 6.1 to 6.18.

Changes

  • resolve_openvmm_deps.rs: Add SourceArchive enum to route OpenvmmDepFile variants to the correct source archive. Generalize download/extract from hardcoded single-archive to multi-archive pattern.
  • resolve_openvmm_test_linux_kernel.rs: Add Linux6_18 variant and set as default kernel version.
  • cfg_versions.rs: Bump OPENVMM_DEPS from 0.1.0-20260427.3 to 0.3.0-33. Remove unused OPENVMM_TEST_LINUX_KERNEL constant (kernel version is now managed in resolve_openvmm_test_linux_kernel.rs).

Context

  • openvmm-deps PR #62 split the kernel and initrd into separate release artifacts
  • openvmm-deps PR #61 added the 6.18 kernel alongside 6.1
  • Release 0.3.0-33 contains all three archive types for both architectures

Known Issue

The openvmm_linux_x64_pcie_devices test fails on all runners with a NULL pointer dereference in the MANA/GDMA driver (mana_gd_probe) in the 6.18 kernel. The 6.18 MANA driver sends an SMC request (0x0) that the OpenVMM GDMA emulator does not handle. This is a pre-existing emulator gap exposed by the newer kernel, not a regression in this PR.

@moor-coding moor-coding requested a review from a team as a code owner May 20, 2026 15:48
Copilot AI review requested due to automatic review settings May 20, 2026 15:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates Flowey’s openvmm-deps resolver to support the new split release artifacts introduced in openvmm-deps 0.3.0-33, and switches the test Linux kernel selection to 6.18.

Changes:

  • Route each OpenvmmDepFile to the correct source archive (SDK tools vs test initrd vs test kernel).
  • Add configurable test_kernel_version and bump pinned OPENVMM_DEPS to 0.3.0-33 (defaulting test kernel to 6.18).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs Adds archive selection logic and downloads/extracts multiple per-arch artifacts instead of a single bundle.
flowey/flowey_lib_hvlite/src/_jobs/cfg_versions.rs Updates pinned openvmm-deps version and introduces a constant for selecting the 6.18 test kernel.

Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs
…kernel

Update resolve_openvmm_deps to download separate archives per dep type:
- openvmm-deps.{arch}.{ver}.tar.gz (SDK: dbgrd, shell, sysroot, petritools)
- openvmm-test-initrd.{arch}.{ver}.tar.gz (shared test initrd)
- openvmm-test-linux-{kernel_ver}.{arch}.{ver}.tar.gz (test kernel)

This matches the new openvmm-deps release structure (0.3.0-29+).
Archive format changed from .tar.bz2 to .tar.gz.

Add test_kernel_version field to Config and OPENVMM_TEST_LINUX_KERNEL
constant ("6.18") to cfg_versions.rs.

The Request::Get API is unchanged — consumers are unaffected.

Version set to TODO-PLACEHOLDER-PENDING-RELEASE until the openvmm-deps
release with 6.18 kernel artifacts is cut.
@moor-coding moor-coding force-pushed the update-test-kernel-6.18 branch from b6e164b to ef10d04 Compare May 20, 2026 16:23
@moor-coding moor-coding requested a review from a team as a code owner May 20, 2026 16:23
- Add Linux6_18 variant to LinuxTestKernelVersion enum
- Update DEFAULT_LINUX_TEST_KERNEL_VERSION to Linux6_18
- Remove unused OPENVMM_TEST_LINUX_KERNEL constant from cfg_versions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 16:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs
Comment thread flowey/flowey_lib_hvlite/src/_jobs/cfg_versions.rs
Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs
/// follow-ups, both upstream and as new variants of this enum.
#[derive(Serialize, Deserialize, Copy, Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum LinuxTestKernelVersion {
Linux6_1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is more a maintainer question than a question for this PR, but do we still want to use 6.1 for anything or can we just delete it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have one test that is not working with the 6.18 kernel. Looking at the lift to have that one test using the prior kernel version

Root cause: The Linux 6.18 MANA/GDMA driver crashes with a NULL pointer dereference in mana_gd_probe() when probing the
second GDMA device. The 6.18 driver sends an SMC request (0x0) that the OpenVMM GDMA emulator doesn't support, and the
driver doesn't handle the failure gracefully — it dereferences a NULL pointer at offset 0x28.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would be ok with deleting 6.1 entirely and marking that one test as unstable to get this in, but let's check with some mana folks first to see if they think it would be an easy fix. Either way I think 6.1 should go away.

Comment thread flowey/flowey_lib_hvlite/src/resolve_openvmm_deps.rs Outdated
@moor-coding
Copy link
Copy Markdown
Contributor Author

CI Failure Analysis: openvmm_linux_x64_pcie_devices

Summary

All CI jobs pass except one test: multiarch::pcie::openvmm_linux_x64_pcie_devices, which fails identically across all 4 runners (x64-linux-amd-kvm, x64-linux-intel-mshv, x64-windows-amd, x64-windows-intel). This is a guest kernel bug in the MANA/GDMA driver in Linux 6.18, not an infrastructure or configuration issue.

Root Cause

The Linux 6.18 kernel crashes with a NULL pointer dereference in mana_gd_probe() during PCI device enumeration:

BUG: kernel NULL pointer dereference, address: 0000000000000028
RIP: 0010:mana_gd_probe+0x173/0x280
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.31 #1

Call trace:

mana_gd_probe+0x173/0x280
 → local_pci_probe
 → pci_device_probe
 → really_probe
 → mana_driver_init (during kernel boot)

The crash occurs when the MANA driver probes the second GDMA device (0000:07:00.0, attached to a PCIe switch downstream port). The first GDMA device (0000:02:00.0, attached directly to a root port) probes successfully, but the driver hits a NULL pointer at offset 0x28 when initializing the second instance.

Preceding Errors

Before the crash, the emulated GDMA device logs: gdma: smc error error=unsupported request 0x0, indicating the 6.18 MANA driver sends an SMC request that the OpenVMM GDMA emulator does not support. The NVMe driver also reports Failed to configure AEN (cfg 100) on both controllers, though this is non-fatal.

Why This Only Affects 6.18

The 6.18 kernel includes an updated MANA/GDMA driver that exercises code paths not present in 6.1. The mana_gd_probe function likely changed to make new assumptions about device state (possibly related to the SMC response) that the OpenVMM GDMA emulator does not satisfy, resulting in a NULL pointer dereference.

Impact

  • 56/57 Linux VMM tests pass on all runners
  • All builds, clippy, fmt, doc checks pass
  • Only this single test is affected
  • The test configures 2 NVMe + 2 MANA/GDMA devices across root ports and switch ports

Recommended Path Forward

This is a compatibility issue between the Linux 6.18 MANA driver and the OpenVMM GDMA device emulator. Options:

  1. Fix the GDMA emulator to handle the new SMC request from the 6.18 driver (proper fix, separate PR)
  2. Skip this test with a tracking issue until the emulator is updated
  3. Merge as-is if the team considers this a known limitation to address separately

@smalis-msft
Copy link
Copy Markdown
Contributor

Interesting. We should confirm this with some MANA folks.

moor-coding and others added 2 commits May 20, 2026 20:00
The rewrite incorrectly used extract_tar_bz2_if_new which unnecessarily
installs the bzip2 package. Switch to extract_tar_gz_if_new consistent
with the sibling test kernel and initrd resolver modules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Inline the archive filename logic directly since there is only one
archive type (openvmm-deps). The kernel and initrd archives are handled
by their own dedicated resolver modules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 20:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines 4 to +9
//! Download various pre-built `openvmm-deps` dependencies, or use a local path if specified.
//!
//! The openvmm-deps release publishes separate archives:
//! - `openvmm-deps.{arch}.{ver}.tar.gz` — SDK tools (dbgrd, shell, sysroot, petritools)
//! - `openvmm-test-initrd.{arch}.{ver}.tar.gz` — shared test initrd
//! - `openvmm-test-linux-{kernel_ver}.{arch}.{ver}.tar.gz` — test kernel
/// which kernel they're using should pass this.
pub const DEFAULT_LINUX_TEST_KERNEL_VERSION: LinuxTestKernelVersion =
LinuxTestKernelVersion::Linux6_1;
LinuxTestKernelVersion::Linux6_18;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants