Skip to content

Conversation

@scaronni
Copy link
Member

@scaronni scaronni commented Apr 10, 2025

Force the rebuild of modules when BUILD_DEPENDS_REBUILD is set and modules declared in BUILD_DEPENDS change.

When a module is updated, all modules that depend on it need to be automatically rebuilt in the correct order.

This is all gated throughBUILD_DEPENDS_REBUILD, which should be set on a per installation basis, for example:

$ cat /etc/dkms/<module1>.conf
BUILD_DEPENDS=<module2>
BUILD_DEPENDS_REBUILD=yes

Whenever <module1> is built and BUILD_DEPENDS is set, the version of the dependent module is saved in:

/var/lib/dkms/<module>/<kernel_version>-<arch>/.dep_<dependency>

In case this changes, a new function check_and_rebuild_dependent_modules() that is called at the end of do_install() does the following:

Makes a first pass to collect all modules that need to be rebuilt by:

  • Iterating through all modules
  • Checking if they have BUILD_DEPENDS specified
  • Using the previously stored dependency file to see if they need to be rebuilt
    • A missing file it's equal to a different version (so a rebuild)

Then it does a second pass to rebuild modules in dependency order by:

  • Maintaining a list of already rebuilt modules
  • For each module to rebuild, checking if all its dependencies have been rebuilt
  • Only rebuilding modules whose dependencies are satisfied
  • Failures:
    • Reports any modules that couldn't be rebuilt due to circular dependencies
    • Continues with the process even if some modules couldn't be rebuilt

If the user runs a build/install command with the --force parameter for a module where BUILD_DEPENDS and BUILD_DEPENDS_REBUILD are specified, the program will behave as before, bypassing the check and not saving the dependency in the file.

Sample run:

$ dkms status
2nd/1.0.0, 6.13.9-200.fc41.x86_64, x86_64: installed
1st/1.0.0, 6.13.9-200.fc41.x86_64, x86_64: installed

$ cat /var/lib/dkms/evdi/kernel-6.13.9-200.fc41.x86_64-x86_64/.dep_1st
1.0.0

# dkms remove -m 1st/1.0.0
Module 1st/1.0.0for kernel 6.13.9-200.fc41.x86_64 (x86_64):
Before uninstall, this module version was ACTIVE on this kernel.
Deleting /lib/modules/6.13.9-200.fc41.x86_64/extra/1st.ko.xz
Running depmod.... done.

Deleting module 1st/1.0.0 completely from the DKMS tree.

# dkms install -m 1st/2.0.0
Creating symlink /var/lib/dkms/1st/2.0.0/source -> /usr/src/1st-2.0.0

Sign command: /lib/modules/6.13.9-200.fc41.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module(s)... done.
Signing module /var/lib/dkms/1st/2.0.0/build/1st.ko
Cleaning build area... done.
Installing /lib/modules/6.13.9-200.fc41.x86_64/extra/1st.ko.xz
Running depmod.... done.

Rebuilding module 2nd/1.0.0 due to updated dependencies

Module 2nd/1.0.0 for kernel 6.13.9-200.fc41.x86_64 (x86_64):
Before uninstall, this module version was ACTIVE on this kernel.
Deleting /lib/modules/6.13.9-200.fc41.x86_64/extra/2nd.ko.xz
Running depmod.... done.
Building module(s).... done.
Signing module /var/lib/dkms/2nd/1.0.0/build/2nd.ko
Cleaning build area... done.
Installing /lib/modules/6.13.9-200.fc41.x86_64/extra/2nd.ko.xz
Running depmod.... done.

$ cat /var/lib/dkms/evdi/kernel-6.13.9-200.fc41.x86_64-x86_64/.dep_1st
2.0.0

@scaronni
Copy link
Member Author

scaronni commented Apr 10, 2025

@anbe42 before proceeding with implementing the tests, the man page changes, and some stress testing with what I actually have to do, I would like your opinion on it.

@scaronni
Copy link
Member Author

Simplified a bit the versions in the example.

@anbe42
Copy link
Collaborator

anbe42 commented Apr 11, 2025

I need a bit time to think about that ... so here are just some things that came into my mind while brainstorming about it.

First of all, that reminds me of #406, although I was looking at the orthogonal dimension: rebuild all modules (for a single kernel) if the kernel headers change (but $kernelver does not change).

The problem you want to solve effectively means we need to rebuild multiple modules for multiple kernels...

I have no idea what this means in the context of modules with multiple versions installed concurrently. That is a concept "we do not use" in Debian.

If something is going to introduce a circular dependency, can we detect and reject that at dkms add time?

@scaronni
Copy link
Member Author

I have no idea what this means in the context of modules with multiple versions installed concurrently. That is a concept "we do not use" in Debian.

I'm not sure I understand. I fail to find a use case for modules with multiple versions installed concurrently, I also never saw it in the RHEL world.

The whole reasoning behind this, is to have the opportunity to rebuild nvidia-peermem once the Mellanox components change. We're planning to integrate the Mellanox repositories and bits into the CUDA repository and I need this functionality.

If something is going to introduce a circular dependency, can we detect and reject that at dkms add time?

That's actually a good idea.

Since I don't see any particular objections on the general idea behind this, I'll fix the current tests, add more tests and then
add the logic & tests for the circular detection issue.

@anbe42
Copy link
Collaborator

anbe42 commented Apr 11, 2025

There is also #499 that still needs to be investigated ...

Let's for now assume we only have simple dependency graphs out in the wild (only short linear lists, no trees or dags), but still have the code ready for general graphs. For complicated graphs I'm afraid I'measily able to come up with command orders where things break.

Right now when upgrading a module A, we unbuild that (which may make module B with BUILD_DEPENDS=A temporarily unbuildable), so if the package manager triggers building B, we may run into failures ...

When rebuilding modules, we should preserve the original state they were in: added/built/installed.

@anbe42
Copy link
Collaborator

anbe42 commented Apr 11, 2025

I'm not sure I understand. I fail to find a use case for modules with multiple versions installed concurrently, I also never saw it in the RHEL world.

Good. Lets try not to break this functionality, without trying to understand the need for it ;-)

@anbe42
Copy link
Collaborator

anbe42 commented Apr 11, 2025

The whole reasoning behind this, is to have the opportunity to rebuild nvidia-peermem once the Mellanox components change. We're planning to integrate the Mellanox repositories and bits into the CUDA repository and I need this functionality.

That sounds interesting. How do you plan to model this dependency, since the driver (which contains -peermem) has no dependencies yet? Currently it builds a a dummy -peermem module if the mellanox bits are missing from the kernel (and in Debian we have a patch which at least says so when attempting to load the module). Right now this sounds more like you need an "Enhances" (a package relationship supported by the .deb format, roughly a reverse Recommends). Or an OPTIONAL_BUILD_DEPENDS which does no harm when missing, but can cause rebuilds when available ...

@scaronni
Copy link
Member Author

There is also #499 that still needs to be investigated ...

Never happened to me, so I can't comment.

Let's for now assume we only have simple dependency graphs out in the wild (only short linear lists, no trees or dags), but still have the code ready for general graphs. For complicated graphs I'm afraid I'measily able to come up with command orders where things break.

Right now when upgrading a module A, we unbuild that (which may make module B with BUILD_DEPENDS=A temporarily unbuildable), so if the package manager triggers building B, we may run into failures ...

You mean when the package manager updates both packages? The RPM scriptlets and the i-don't-know-the-correct-name in the deb packages get executed anyway in sequence, so at the worst case the module is built two times. Or am I missing something?

When rebuilding modules, we should preserve the original state they were in: added/built/installed.

Interesting. Never thought of the use case of having a module just "added" on the system. In my mind that's just an interim state because you want to have the modules installed, otherwise you would not be fiddling with DKMS packages at all.

@scaronni
Copy link
Member Author

How do you plan to model this dependency, since the driver (which contains -peermem) has no dependencies yet? Currently it builds a a dummy -peermem module if the mellanox bits are missing from the kernel (and in Debian we have a patch which at least says so when attempting to load the module). Right now this sounds more like you need an "Enhances" (a package relationship supported by the .deb format, roughly a reverse Recommends). Or an OPTIONAL_BUILD_DEPENDS which does no harm when missing, but can cause rebuilds when available ...

I don't think we need all of this. What I was actually thinking is way simpler, in that particular case (which seems to be only the DGX stations from NVIDIA so far), it's up to the admin (or the DGX iso installer) to add the dependencies.

The point of the extra dkms.conf fragments is to customize the behavior for each site.

Example:

echo "BUILD_DEPENDS=<module>" > /etc/dkms/<module>.conf

Custom per-installation configuration is actually already happening for a few other modules that have different parameters for building depending on the context.

@scaronni
Copy link
Member Author

Let's for now assume we only have simple dependency graphs out in the wild (only short linear lists, no trees or dags), but still have the code ready for general graphs. For complicated graphs I'm afraid I'measily able to come up with command orders where things break.

You have one example with a flat single BUILD_DEPENDS and I have another one now, I don't think we should extend all the logic now for complicated graphs. Let's defer this for when we actually have the need for it.

@scaronni
Copy link
Member Author

scaronni commented Apr 11, 2025

Another completely different approach would be an installation specific script that is pulled in by DKMS in one of the various stages. This way nothing changes in DKMS but you can still achieve the same,

I'll prototype that as well internally and see what's the best/cleaner approach.

@scaronni
Copy link
Member Author

Another option would be to leave the logic in DKMS but make it happen by another toggle. ex. BUILD_DEPENDS_REBUILD=true, which would go along with BUILD_DEPENDS=<module> on a site specific configuration. Maybe that's the best approach.

@scaronni scaronni changed the title Add logic for rebuilding modules when dependencies change DRAFT: Add logic for rebuilding modules when dependencies change Apr 11, 2025
@scaronni scaronni force-pushed the rebuild_deps branch 6 times, most recently from d61882c to b6efa51 Compare April 11, 2025 18:46
@scaronni scaronni force-pushed the rebuild_deps branch 9 times, most recently from 5cf8b84 to 96612d9 Compare April 13, 2025 10:35
@scaronni scaronni force-pushed the rebuild_deps branch 15 times, most recently from 4bbf196 to c69ec20 Compare April 13, 2025 19:50
@scaronni
Copy link
Member Author

scaronni commented Apr 13, 2025

Added tests to check:

  • Updates & rebuild of dependencies
  • Checking the dependency after each state
  • Checking that with --force it all behaves as if BUILD_DEPENDS_REBUILD is not set (dependencies ignored, dependency version not saved).

Now it will have a few days of testing in NVIDIA with various use cases before marking it ready for review.

@scaronni
Copy link
Member Author

Btw, I'm open to rename BUILD_DEPENDS_REBUILD to something else which is less of a mouthful, ideas accepted

@scaronni scaronni force-pushed the rebuild_deps branch 4 times, most recently from c6660b0 to 529a2a8 Compare April 17, 2025 08:22
@scaronni
Copy link
Member Author

Merging this. Added another safeguard when --force is passed and added also downgrade tests. Internal tests in NVIDIA on a DGX system with the Mellanox pile (https://linux.mellanox.com/public/repo/doca/DGX_latest_DOCA/ubuntu24.04/) are all good.

@scaronni scaronni marked this pull request as ready for review April 17, 2025 13:09
@scaronni scaronni merged commit 94c4d27 into main Apr 17, 2025
52 checks passed
@scaronni scaronni deleted the rebuild_deps branch April 17, 2025 13:10
@scaronni scaronni changed the title DRAFT: Add logic for rebuilding modules when dependencies change Add logic for rebuilding modules when dependencies change Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants