add support for --filter-rpath-sanity-libs to skip RPATH sanity check for designated libraries#4119
Conversation
…ch as libcuda.so.1
…ty check configurable via a command line argument. Also, use only one regular expression to immediately capture the name of the library that is missing. This simplifies the check to see if it is present in the list if ignoreable libraries.
… using --rpath-filter and 2) checks if the rpath sanity check passes if that same library isn't only filtered while setting RPATH (with --rpath-filter), but also ignored when checking in the RPATH sanity check (with -filter-rpath-sanity-libs)
|
For completeness: this is an example of what the drivers look like i.e. here, the |
…ed in a prefix that contains libtoy. Thus, make is filter the pattenr .*ltoy.* so that this always matches, regardless of ltoy.so's prefix
…d libtoy.so. There must be another reason why the test suite was failing to filter this from the linkeage before...
|
I'm puzzled by the CI failures of the unit tests... I added this to the # test sanity error when --rpath-filter is used to filter a required library
toy_ec_txt = read_file(os.path.join(test_ecs, 't', 'toy', 'toy-0.0.eb'))
toy_ec_txt += "\ndependencies = [('libtoy', '0.0', '', SYSTEM)]"
toy_ec_txt += "\nbuildopts = '-ltoy'"
toy_ec = os.path.join(self.test_prefix, 'toy.eb')
write_file(toy_ec, toy_ec_txt)
error_pattern = r"Sanity check failed\: Library libtoy\.so not found"
self.assertErrorRegex(EasyBuildError, error_pattern, self.test_toy_build, ec_file=toy_ec,
extra_args=['--rpath', '--experimental', '--rpath-filter=.*libtoy.*'],
raise_error=True, verbose=False)What this is meant to do is:
On my local machine, I run Yet, in the EasyBuild CI, this produces a failure. If I understand it correctly, this failure tells me that the last line in the above addition did not actually produce an error at all. That puzzles me: how can a |
…this change should only be temporary, to diagnose the issue
|
Hm, for some reason it doesn't seem to pick up on I guess, since the Yet, this looks as intended, which suggest it is picked up: |
|
Ok, this confuses me: It says EDIT: ah, so there is a |
… right after the compilation, and if that matches our expectations
… that could be a reason for this test failure
…r doesn't pick up on static versions
…s --as-needed by default by explicitely disabling this
…lied in any case...
… the possibilities is that it picks this up from LDFLAGS
|
Found the issue: Ubuntu passes on an Ubuntu system. Most systems will end with a Of course, I can pass the |
… This makes sure that regardless of whether --as-needed is passed or not, the library will always be linked by the linker. For now, we still print the full output of the build, so that we can check (once) of it all makes sense now. If it does, I'll do one more commit to limit the test to only the check that parses the error for a certain pattern
…ripped all debugging comments.
…e driver as default libs to be filtered from the RPATH sanity check
…r this index test
| error_pattern = r"Sanity check failed\: Library libtoy\.so not found" | ||
| self.assertErrorRegex(EasyBuildError, error_pattern, self.test_toy_build, ec_file=toy_ec, | ||
| extra_args=['--rpath', '--experimental', '--rpath-filter=.*libtoy.*'], | ||
| name='toy2', raise_error=True, verbose=False) |
There was a problem hiding this comment.
Also run again without --rpath-filter, for completeness sake?
It's generally a good idea to make sure that the test setup is done fully correctly...
| name='toy2', raise_error=True, verbose=False) | |
| args = ['--rpath', '--experimental', '--rpath-filter=.*libtoy.*'] | |
| self.assertErrorRegex(EasyBuildError, error_pattern, self.test_toy_build, ec_file=toy_ec, | |
| extra_args=args, name='toy2', raise_error=True, verbose=False) | |
| # works fine if --rpath-filter is not used (since then libtoy is RPATH'ed) | |
| self.test_toy_build(ec_file=toy_ec, name='toy2', extra_args=args, raise_error=True) |
You will need to clean up the install dir to make sure the next part of the test still works as expected.
There was a problem hiding this comment.
Why do we need to clean? A rebuild will clean the installdir anyway, doesn't it?
| # not rpath-ed. Then, we use --filter-rpath-sanity-libs to make sure the RPATH sanity checks ignores | ||
| # the fact that libtoy.so is not found. Thus, this build should complete succesfully | ||
| args = ['--rpath', '--experimental', '--rpath-filter=.*libtoy.*', '--filter-rpath-sanity-libs=libtoy.so'] | ||
| self.test_toy_build(ec_file=toy_ec, name='toy2', extra_args=args, raise_error=True) |
There was a problem hiding this comment.
Maybe here we should also do a check on the output of readelf -d on the binary, and check the RPATH section with a regex (to make sure it doesn't include libtoy)?
That mainly makes sense if my suggestion above to also run once without --rpath-filter is applied.
There was a problem hiding this comment.
Hm, good idea, but I need some help with this. How do I retrieve the installdir in a test? And what is the appropriate way to run readelf -d in a test, do we use the run_cmd function for that? (if so, I need to find where that is defined again... :))
…lter_rpath_sanity_libs, make default RPATH filter as list and have a comment explaining the default
308d40c to
d52be3b
Compare
…'ldd' and 'readelf -d' on toy-app binary
d52be3b to
fa7375a
Compare
fixes #4095
This PR will allow a (configurable) list of libraries to be ignored during the RPATH sanity check. This is e.g. needed when cross-compiling an executable that links against
libcuda.so.1. The way NVIDIA supports cross-compilation is by compiling against a stub library, provided by the CUDA toolkit. The stub library only provides the symbols, but not the imlementation of the actual function that should be used at runtime. For example,<prefix>/CUDA/11.7.0/lib/stubs/libcuda.so.1contains all the symbols of thelibcuda.so.1, but no implementations. Thus, it should never be used at runtime: thelibcuda.so.1that comes with the driver installation is supposed to be used at runtime.That's also why we don't
RPATHanything that's in astubsdirectory (see e.g. #2683 and this snippet of theeasybuild-frameworkcode).The issue is that the
sanity_check_rpath()step will still try to unsetLD_LIBRARY_PATHand see if all libraries are found correctly. On nodes that have the CUDA driver installed, that works fine, aslibcuda.so.1is picked up from a default location (/usr/lib64/libcuda.so.1). However, on CPU nodes, this fails. The fact thatlibcuda.so.1is not found on CPU nodes when cross compiling is entirely legitimate though.A point of discussion is what the default behaviour should be for EasyBuild, i.e. for which default list of libraries it will be 'accepted' if they are not found in the
sanity_check_rpath()step. In my opinion, it makes sense to includelibcuda.so,libcuda.so.1,libnvidia-ml.soandlibnvidia-ml.so.1here, as these are the libraries for which only stubs are provided in the CUDA toolkit and are thus supposed to be provided by the driver. For some other libraries (e.g.libcublas.so), there are stubs in thestubsdirectory, but also real ones in the<prefix>/CUDA/11.7.0/lib/directory. Thus, I assume one would want those RPATH-ed against the latter locations.