Switch back to static linking for coverage tests by krinkinmu · Pull Request #39030 · envoyproxy/envoy

krinkinmu · 2025-04-07T11:29:42Z

Commit Message:

This is a workaround for the LLVM issue that affects coverage (see llvm/llvm-project#32849).

TL;DR: basically dynamic linking + definitions of functions in the headers result in mismatching checksums for coverage counters which in turn results in incorrect coverage reports.

The effects of this issue are not very deterministic (e.g., when the issue happens is affected by way to many factors to reliably predict or avoid the issue), that's why we haven't noticed it before, but during clang-18 migration the issue caused some coverage failures.

Switching to static linking should workardound the issue completely. Before migration to clang-18 we couldn't do it, because coverage tests just didn't build because we hit some relocation overflows. However since version 17 LLVM lld changed the order of sections in the produced binaries, which made relocations overflows way less likely in the code model used by default (which I think is "small").

NOTE: we could aslo try and switch to large code model, but not only it will result in slower code, I also don't think it's as well tested and supported as small code model, so we probably should not.

I removed coverage exception introduced by #38898, because with this change it's not needed anymore.

Additional Description: this change was made possible by #37911
Risk Level: low
Testing: running CI coverage tests
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

repokitteh-read-only · 2025-04-07T11:29:47Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #39030 was opened by krinkinmu.

see: more, trace.

repokitteh-read-only · 2025-04-07T11:29:50Z

CC @envoyproxy/coverage-shephards: FYI only for changes made to (test/per_file_coverage.sh).
envoyproxy/coverage-shephards assignee is @RyanTheOptimist

🐱

Caused by: #39030 was opened by krinkinmu.

see: more, trace.

krinkinmu · 2025-04-07T12:04:59Z

/retest weird auth error, let's see if it's transient

krinkinmu · 2025-04-07T12:07:47Z

/re-test weird auth error, let's see if it's transient

phlax · 2025-04-07T12:26:03Z

its not - its an infra fail - fixing ...

phlax · 2025-04-07T12:44:32Z

should be fixed - probs easiest to repush to trigger

This is a workaround for the LLVM issue that affects coverage (see llvm/llvm-project#32849). TL;DR: basically dynamic linking + definitions of functions in the headers result in mismatching checksums for coverage counters which in turn results in incorrect coverage reports. The effects of this issue are not very deterministic (e.g., when the issue happens is affected by way to many factors to reliably predict or avoid the issue), that's why we haven't noticed it before, but during clang-18 migration the issue caused some coverage failures. Switching to static linking should workardound the issue completely. Before migration to clang-18 we couldn't do it, because coverage tests just didn't build because we hit some relocation overflows. However since version 17 LLVM lld changed the order of sections in the produced binaries, which made relocations overflows way less likely in the code model used by default (which I think is "small"). NOTE: we could aslo try and switch to large code model, but not only it will result in slower code, I also don't think it's as well tested and supported as small code model, so we probably should not. Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

krinkinmu · 2025-04-07T14:29:55Z

So I see two errors, one is related to the change I made in test/per_file_coverage.sh. I thought that the comment issue might be related to the llvm coverage bug, but it does not seem like this change actually helped. The other issue is in fuzz-coverage and seem to be some weird linking issue.

It says that the linker got killed by a signal, but does not give any information about what signal it was (is it OOM? did the worker restarted? something else?).

I'm going to rollback the changes I made in test/per_file_coverage.sh for now and re-try, just to confirm that indeed linking issue is caused by switching to the static linking. And if it's the case, I will go and spend some time figuring out what to do about it.

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

krinkinmu · 2025-04-08T14:20:40Z

I found a workaround for the missing coverage issue, but I still don't really understand what causes the issue in the first place. I started a thread on LLVM discussions, but given that I couldn't come up with a small reproducer I don't think that it will yield any results. I extracted that change into a separate PR: #39041.

As for the the fuzz coverage build failures, basically it looks like with static linking we end up with a much higher memory footprint for linking the test binary - 6g of memory just isn't enough. We can potentially shave some memory here and there, but ultimately linking memory consumption scales with the size of the inputs and for coverage tests the size of the inputs is very large with static linking.

@phlax is working on enabling flexible memory limits on CI (basically scaling up after OOMs), so once that lands I can re-try the PR (or will try to beat @phlax to the punch and change affected tests - don't have much context at this point to know what it would take).

phlax · 2025-04-08T14:39:46Z

it should grow higher than 6g if it fails - check the engflow clusters page

yanavlasov · 2025-04-16T01:07:05Z

During coverage builds linker used to fail hard with static linking because it was creating static allocation in the binary with coverage instrumentation and it exceeded 2Gb. I see coverage build now is succeeding so I guess linker have improved since then.

I'm unsure about the fuzz_coverage issue. //test/server/config_validation:config_fuzz_test is likely to link all extensions.

My take is that fuzz coverage is a very imperfect way to measure fuzz coverage. If this particular target is causing the build problem I would be ok to just disable it for fuzz_coverage build if it improves the regular coverage build.

krinkinmu · 2025-04-16T17:18:48Z

During coverage builds linker used to fail hard with static linking because it was creating static allocation in the binary with coverage instrumentation and it exceeded 2Gb. I see coverage build now is succeeding so I guess linker have improved since then.

I'm unsure about the fuzz_coverage issue. //test/server/config_validation:config_fuzz_test is likely to link all extensions.

My take is that fuzz coverage is a very imperfect way to measure fuzz coverage. If this particular target is causing the build problem I would be ok to just disable it for fuzz_coverage build if it improves the regular coverage build.

@yanavlasov and @phlax sorry for the luck of updates, I got distracted by some other work, but this is still on my radar. I would like to try and migrate the fuzz coverage to RBE if possible and see if it helps.

In the meantime, I'm not completely up-to-date with the motivation behind the fuzz coverage, @yanavlasov can you elaboreate a bit what are we trying to achieve with fuzz coverage in the first place? E.g., I do get the general idea of why fuzzing is useful, but I'm not entirely sure about fuzz coverage TBH. Are we just trying to see what bits are covered by fuzzing and what are not?

Side note, yes, starting with LLVM 17 lld changed the order of sections in the binary a bit and that's what allowed to reduce the likelyhood of the relocation overflows.

yanavlasov · 2025-04-21T22:34:42Z

The idea was have some way to track fuzz coverage and prevent from going lower than some threshold. However as designed it does not provide very good signal, since it measures just coverage from the seed corpus, not the actual coverage achieved by fuzzer. The actual coverage can be obtained from cluster fuzz, however we do not have a way to track the value and check if it dipped below some minimum.

krinkinmu · 2025-04-22T13:20:44Z

The idea was have some way to track fuzz coverage and prevent from going lower than some threshold. However as designed it does not provide very good signal, since it measures just coverage from the seed corpus, not the actual coverage achieved by fuzzer. The actual coverage can be obtained from cluster fuzz, however we do not have a way to track the value and check if it dipped below some minimum.

Aha... I checked the implementation now and it seems like the custom main function we have currently just feeds the corpus as the input to the fuzz test target without making any changes to it (https://github.com/envoyproxy/envoy/blob/main/test/fuzz/main.cc#L48-L50). If my understanding is correct, then I aggree that's not super useful for checking fuzzing coverage and maybe we are better off disabling fuzz coverage for now.

@phlax or @adisuissa are you ok if I sent a PR to disable fuzz coverage on CI until we change it to track the results actually achieved with fuzzing?

Do we have any open tickets for improving fuzzing coverage situation? I can't currently commit to it, but it would be good to have it somewhere on the radar for when our priorities/situation changes.

NOTE: @phlax refered me to @adisuissa for the fuzz coverage question above.

krinkinmu · 2025-04-28T08:51:27Z

Did search over open issues about fuzz_coverage improvement and it does not seem like we have an open issue for that, so I took a liberty to file one: #39248.

I will work on a PR to disable fuzz_coverage target on our CI for now, I will add @adisuissa in the review of the PR and we can discuss there if we want to pick another direction and try to move fuzz_coverage to EngFlow instead of current setup and maybe work around the issue this way.

@phlax

I'm trying to switch back to static linking for coverage tests in Envoy CI to work around a bug in Clang/LLVM (see llvm/llvm-project#32849). However the change that switches to static linking fails `fuzz_coverage` with what seems like some kind of resource constraint (e.g., linker gets killed, I suspect, as a result of OOM, since static linking requires more resources). @yanavalsov suggested that fuzz_coverage target might not be the best way to measure fuzzing coverage (see envoyproxy#39030 (comment)) and given that I think we should at least consider disabling it until we can make it better. That's what this PR does. NOTE: @phlax also mention that he did some work to migrate coverage to EngFlow builds in the past and there might be PR that we could finish that will migrate fuzz_coverage to EngFlow. There is a chance that it will address the issue - I'm mentioning it here to give a complete overview of the possible alternatives here. Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

adisuissa · 2025-04-28T12:39:12Z

To the best of my understanding the libFuzzer is linked and it should execute the fuzzers in fuzzing mode. Can you provide the reference that shows that only the test cases are executed?

I wonder if it is possible to change the fuzzing build arguments to eliminate the OOM. This will allow continuing the work on #39248.

krinkinmu · 2025-04-28T13:31:43Z

To the best of my understanding the libFuzzer is linked and it should execute the fuzzers in fuzzing mode. Can you provide the reference that shows that only the test cases are executed?

@adisuissa that's my understanding of what @yanavlasov said and how I read the code, but to be sure, I will double check under debugger which main is used under fuzz_coverage that CI is using - libFuzzer or https://github.com/envoyproxy/envoy/blob/main/test/fuzz/main.cc#L48-L50. Will take some time to do a build, I will update this thread once I'm done.

I will work on a PR to disable fuzz_coverage target on our CI for now, I will add @adisuissa in the review of the PR and we can discuss there if we want to pick another direction and try to move fuzz_coverage to EngFlow instead of current setup and maybe work around the issue this way.

I played with some linker flags (i.e., --gc-sections and related -fdata-sections and -ffunction-sections). However, emberassingly, I missed that:

those are already applied
while they may affect the size of the output binary, they don't actually reduce the memory footprint of the linker itself.

GNU ld does offer some command line flags that could be used to reduce its memory footprint (e.g., --hash-sizes), but we don't use that, and I don't think we should switch, since lld is generally better, not to mention that GNU toolchain is lagging behind in some aspects (i.e., fission).

My opinion that by just changing linker/compiler options we will not win much here. So if we want to preserve fuzz_coverage CI target, a more promising option is to just migrate fuzz_coverage to EngFlow which may allow us to just throw more memory at the problem and work around it this way.

adisuissa · 2025-04-28T13:43:04Z

I played with some linker flags (i.e., --gc-sections and related -fdata-sections and -ffunction-sections). However, emberassingly, I missed that:

those are already applied

while they may affect the size of the output binary, they don't actually reduce the memory footprint of the linker itself.

GNU ld does offer some command line flags that could be used to reduce its memory footprint (e.g., --hash-sizes), but we don't use that, and I don't think we should switch, since lld is generally better, not to mention that GNU toolchain is lagging behind in some aspects (i.e., fission).

My opinion that by just changing linker/compiler options we will not win much here. So if we want to preserve fuzz_coverage CI target, a more promising option is to just migrate fuzz_coverage to EngFlow which may allow us to just throw more memory at the problem and work around it this way.

TBH it's been a while since I've looked at this code, but willing to take a look if you want to supply some pointers.

We've previously seen cases where building against AFL caused the binary size to be too large, but linking against honggfuzz or libfuzzer worked.

In addition, I suggest looking at building options to prevent some of the instrumentation in the code
(see example).

krinkinmu · 2025-04-28T13:48:09Z

To the best of my understanding the libFuzzer is linked and it should execute the fuzzers in fuzzing mode. Can you provide the reference that shows that only the test cases are executed?

@adisuissa that's my understanding of what @yanavlasov said and how I read the code, but to be sure, I will double check under debugger which main is used under fuzz_coverage that CI is using - libFuzzer or https://github.com/envoyproxy/envoy/blob/main/test/fuzz/main.cc#L48-L50. Will take some time to do a build, I will update this thread once I'm done.

I didn't finish with the testing under debugger, but I think I now see how the switch to the libfuzzer main is actually done. It looks like this is enabled via the --define=FUZZING_ENGINE=libfuzzer, which is enabled by --config=plain-fuzzer which in turn is enabled by --config=fuzz-coverage. So it looks like I was wrong and https://github.com/envoyproxy/envoy/blob/main/test/fuzz/main.cc#L48-L50 is not actually being used.

krinkinmu · 2025-04-29T14:00:07Z

To the best of my understanding the libFuzzer is linked and it should execute the fuzzers in fuzzing mode. Can you provide the reference that shows that only the test cases are executed?

@adisuissa that's my understanding of what @yanavlasov said and how I read the code, but to be sure, I will double check under debugger which main is used under fuzz_coverage that CI is using - libFuzzer or https://github.com/envoyproxy/envoy/blob/main/test/fuzz/main.cc#L48-L50. Will take some time to do a build, I will update this thread once I'm done.

I didn't finish with the testing under debugger, but I think I now see how the switch to the libfuzzer main is actually done. It looks like this is enabled via the --define=FUZZING_ENGINE=libfuzzer, which is enabled by --config=plain-fuzzer which in turn is enabled by --config=fuzz-coverage. So it looks like I was wrong and https://github.com/envoyproxy/envoy/blob/main/test/fuzz/main.cc#L48-L50 is not actually being used.

Took me a while, but yes, I can see that main from libfuzz is being used in my tests, so I think we do actually collect coverage with the fuzzing applied and my understanding wasn't correct.

On the topic of reducing footprint, in my local rather rough tests, it seems like linker consumes about 15g of RAM. That's not a small amount, but it's not super outrageous amount either. Nevertheless, let's say we want to reduce that footprint, where would we start, I think the right place to start is to understand how much we want to cut in the first place?

If I just randomly cut a few things here and there to make this PR pass the tests, there is no guarantees that in a week after a bunch of changes we will not hit the same limit again.

So I asked @phlax to help figure out how large are the machines used to build (I don't really have access, I think), but we couldn't find it, so I'm not really sure what resource constraint I should target.

I'm sure we can figure out what our resource constraint is if we spend more time, but @phlax also mentioned that the general direction is to move to EngFlow backends anyways, so I will try to move fuzz coverage to EngFlow and see if that is going to help.

FWIW, we can consider reducing linker footprint after moving to EngFlow regardless, for example, it seems like we could disable wasm in our fuzz tests (quick grep suggested that there aren't any wasm fuzzing tests and I don't think that config validation test needs the whole v8 runtime to check the proto).

adisuissa · 2025-04-29T14:26:36Z

FWIW, we can consider reducing linker footprint after moving to EngFlow regardless, for example, it seems like we could disable wasm in our fuzz tests (quick grep suggested that there aren't any wasm fuzzing tests and I don't think that config validation test needs the whole v8 runtime to check the proto).

That's a great, low hanging fruit, find.
IMHO WASM linking should be disabled, at least from the fuzz-coverage targets, if it helps decreasing the amount of resources used.

krinkinmu · 2025-04-29T18:03:48Z

I think I got it working in staging repo now. Will do one more small test and will send out PRs to:

Switch coverage runs to EngFlow
Disable WASM for fuzz tests

And once those land I will update this PR and will send it for review.

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer (e.g., around 30m) with this change. This PR switches to static linking for just test coverage and does not do the same for fuzz coverage. That's because Envoy CI is hitting some resource constraints when building fuzz targets with coverage instrumentation. We will fix that by striping the fuzz targets of some unncessary dependencies and switching to EngFlow backend for coverage builds and tests, but that will require addressing a couple of hard to understand issues, so, for now, I'm just switching the coverage tests to static linking and will follow up with the fuzz tests later once we addressed all the blockers and switched to EngFlow backend for coverage. Additional Description: envoyproxy#39030 provides some context for EngFlow migration and envoyproxy#39248 is a tracking bug. Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

Commit Message: Simple grep over the codebase suggests that we don't have any WASM specific fuzz tests defined. And existing fuzz tests don't need a full WASM runtime. On top of that in general we don't really want to fuzz test our dependencies (e.g., we would like the dependencies to have their own infrastructure and be scrupulous when new dependencies are added). Disabling WASM reduces the build time and resources required for fuzz-coverage. One particular reason to try and optimize fuzz-coverage is that I want to move it to static linking to work around a bug in Clang/LLVM (see llvm/llvm-project#32849) and static linking produces much large binaries and requires a larger linker footprint, which currently hits the limits of the RBE backend used. Additional Description: Some relevant discussions can be found in envoyproxy#39030 which prompted me to work on this in the first place. And I will use envoyproxy#39248 as a tracking bug for the coverage changes. Risk Level: low Testing: running fuzz-coverage on local machine with the changes included, I also confirmed that disabling wasm + moving fuzz-coverage to EngFlow + removing explicit RBE pool attributes from fuzz targets make it possible to successfully statically link fuzz tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a --------- Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer with this change. We already did this change for the regular test coverage, but we delayed the switch for the fuzz coverage because fuzz coverage hit RBE backend resource constraints and was OOMing during linking. Since then we did a few things to mitigate the issue with resources: 1. We switched coverage runs from Google RBE backend to EngFlow which allows for somewhat larger workers 2. We started building fuzz targets without WASM - there are not fuzz targets that actually need a full WASM runtime and elimnating it speeds things up and reduces the footprint noticably. With the above two changes I think we are ready to completely switch to static linking for all coverage tests. It also has a benefit of better cache re-use since we will be linking both coverage and fuzz coverage targets statically. Additional Description: envoyproxy#39030 provides some context for EngFlow migration and envoyproxy#39248 is a tracking bug. Risk Level: low Testing: regular CI tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer (e.g., around 30m) with this change. This PR switches to static linking for just test coverage and does not do the same for fuzz coverage. That's because Envoy CI is hitting some resource constraints when building fuzz targets with coverage instrumentation. We will fix that by striping the fuzz targets of some unncessary dependencies and switching to EngFlow backend for coverage builds and tests, but that will require addressing a couple of hard to understand issues, so, for now, I'm just switching the coverage tests to static linking and will follow up with the fuzz tests later once we addressed all the blockers and switched to EngFlow backend for coverage. Additional Description: envoyproxy#39030 provides some context for EngFlow migration and envoyproxy#39248 is a tracking bug. Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Commit Message: Simple grep over the codebase suggests that we don't have any WASM specific fuzz tests defined. And existing fuzz tests don't need a full WASM runtime. On top of that in general we don't really want to fuzz test our dependencies (e.g., we would like the dependencies to have their own infrastructure and be scrupulous when new dependencies are added). Disabling WASM reduces the build time and resources required for fuzz-coverage. One particular reason to try and optimize fuzz-coverage is that I want to move it to static linking to work around a bug in Clang/LLVM (see llvm/llvm-project#32849) and static linking produces much large binaries and requires a larger linker footprint, which currently hits the limits of the RBE backend used. Additional Description: Some relevant discussions can be found in envoyproxy#39030 which prompted me to work on this in the first place. And I will use envoyproxy#39248 as a tracking bug for the coverage changes. Risk Level: low Testing: running fuzz-coverage on local machine with the changes included, I also confirmed that disabling wasm + moving fuzz-coverage to EngFlow + removing explicit RBE pool attributes from fuzz targets make it possible to successfully statically link fuzz tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a --------- Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer with this change. We already did this change for the regular test coverage, but we delayed the switch for the fuzz coverage because fuzz coverage hit RBE backend resource constraints and was OOMing during linking. Since then we did a few things to mitigate the issue with resources: 1. We switched coverage runs from Google RBE backend to EngFlow which allows for somewhat larger workers 2. We started building fuzz targets without WASM - there are not fuzz targets that actually need a full WASM runtime and elimnating it speeds things up and reduces the footprint noticably. With the above two changes I think we are ready to completely switch to static linking for all coverage tests. It also has a benefit of better cache re-use since we will be linking both coverage and fuzz coverage targets statically. Additional Description: envoyproxy#39030 provides some context for EngFlow migration and envoyproxy#39248 is a tracking bug. Risk Level: low Testing: regular CI tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Commit Message: Simple grep over the codebase suggests that we don't have any WASM specific fuzz tests defined. And existing fuzz tests don't need a full WASM runtime. On top of that in general we don't really want to fuzz test our dependencies (e.g., we would like the dependencies to have their own infrastructure and be scrupulous when new dependencies are added). Disabling WASM reduces the build time and resources required for fuzz-coverage. One particular reason to try and optimize fuzz-coverage is that I want to move it to static linking to work around a bug in Clang/LLVM (see llvm/llvm-project#32849) and static linking produces much large binaries and requires a larger linker footprint, which currently hits the limits of the RBE backend used. Additional Description: Some relevant discussions can be found in envoyproxy#39030 which prompted me to work on this in the first place. And I will use envoyproxy#39248 as a tracking bug for the coverage changes. Risk Level: low Testing: running fuzz-coverage on local machine with the changes included, I also confirmed that disabling wasm + moving fuzz-coverage to EngFlow + removing explicit RBE pool attributes from fuzz targets make it possible to successfully statically link fuzz tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a --------- Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer with this change. We already did this change for the regular test coverage, but we delayed the switch for the fuzz coverage because fuzz coverage hit RBE backend resource constraints and was OOMing during linking. Since then we did a few things to mitigate the issue with resources: 1. We switched coverage runs from Google RBE backend to EngFlow which allows for somewhat larger workers 2. We started building fuzz targets without WASM - there are not fuzz targets that actually need a full WASM runtime and elimnating it speeds things up and reduces the footprint noticably. With the above two changes I think we are ready to completely switch to static linking for all coverage tests. It also has a benefit of better cache re-use since we will be linking both coverage and fuzz coverage targets statically. Additional Description: envoyproxy#39030 provides some context for EngFlow migration and envoyproxy#39248 is a tracking bug. Risk Level: low Testing: regular CI tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer (e.g., around 30m) with this change. This PR switches to static linking for just test coverage and does not do the same for fuzz coverage. That's because Envoy CI is hitting some resource constraints when building fuzz targets with coverage instrumentation. We will fix that by striping the fuzz targets of some unncessary dependencies and switching to EngFlow backend for coverage builds and tests, but that will require addressing a couple of hard to understand issues, so, for now, I'm just switching the coverage tests to static linking and will follow up with the fuzz tests later once we addressed all the blockers and switched to EngFlow backend for coverage. Additional Description: #39030 provides some context for EngFlow migration and #39248 is a tracking bug. Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Commit Message: Simple grep over the codebase suggests that we don't have any WASM specific fuzz tests defined. And existing fuzz tests don't need a full WASM runtime. On top of that in general we don't really want to fuzz test our dependencies (e.g., we would like the dependencies to have their own infrastructure and be scrupulous when new dependencies are added). Disabling WASM reduces the build time and resources required for fuzz-coverage. One particular reason to try and optimize fuzz-coverage is that I want to move it to static linking to work around a bug in Clang/LLVM (see llvm/llvm-project#32849) and static linking produces much large binaries and requires a larger linker footprint, which currently hits the limits of the RBE backend used. Additional Description: Some relevant discussions can be found in #39030 which prompted me to work on this in the first place. And I will use #39248 as a tracking bug for the coverage changes. Risk Level: low Testing: running fuzz-coverage on local machine with the changes included, I also confirmed that disabling wasm + moving fuzz-coverage to EngFlow + removing explicit RBE pool attributes from fuzz targets make it possible to successfully statically link fuzz tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a --------- Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>

Static linking helps to work around the issue with LLVM/Clang source-based coverage (see llvm/llvm-project#32849). On the flip side, the coverage build and test run will take a bit longer with this change. We already did this change for the regular test coverage, but we delayed the switch for the fuzz coverage because fuzz coverage hit RBE backend resource constraints and was OOMing during linking. Since then we did a few things to mitigate the issue with resources: 1. We switched coverage runs from Google RBE backend to EngFlow which allows for somewhat larger workers 2. We started building fuzz targets without WASM - there are not fuzz targets that actually need a full WASM runtime and elimnating it speeds things up and reduces the footprint noticably. With the above two changes I think we are ready to completely switch to static linking for all coverage tests. It also has a benefit of better cache re-use since we will be linking both coverage and fuzz coverage targets statically. Additional Description: #39030 provides some context for EngFlow migration and #39248 is a tracking bug. Risk Level: low Testing: regular CI tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com> Signed-off-by: Ryan Northey <ryan@synca.io>