-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
So, a small story:
- Two years ago, I volunteered to implement a
fast_atanfunction, much like we hadfast_sinandfast_cos. - I opened a PR, good feedback came, and I sank a lot of time into it to validate performance on all backends (GPU, and CPU), and measure bit-exact precision.
- This revealed we had issues with strict_float that basically made some of this impossible to do correctly.
- Andrew helps out, and overhauls some of the strict_float handling and intrinsics of Halide, such that I can continue (thanks a lot!).
- Working on this, I found a bug in the CodeGen_Vulkan_Dev regarding vector shuffles.
- Before the PR can be merged, I thought, let's fix the Vulkan shuffle bug: that's an easy 4 line fix.
- Feedback was that we need a test for this, so I add a test.
- The test reveals more issues: the Simplifier rewrites shuffles of shuffles sometimes, which blows up the maximal legal vector lane count on some GPU compute backends (Metal has limit 4, OpenCL limit 8, etc).
- Discussing the best solution to this, I volunteered to make a vector legalization pass.
- I sank a lot of time into it.
- Good feedback comes and requests more changes.
- Here, Andrew steps in and helps out and writes a fuzz tester to verify my work, which is a great idea, and I appreciated the help a lot!
- My work was mostly correct, and the fuzz tester showed a few small problems, which Andrew fixed (thanks again!).
- I merge in the fuzz tester and fixes into the PR, and let the buildbots have a go at it.
- The fuzz tester reveals failures related to at least three unique bugs in the ARM backend of Halide.
Hal fixing a light bulb (from Malcolm in the Middle S03E06 - Health Scare)
I am now like 17 levels deep of fixing this just one more other thing before I can continue with the thing above it. I just wanted to contribute a fast_atan. The fast_atan works already for more than a year, but it is blocked by several levels of issues waiting to be resolved, each one revealing more issues.
The vector legalization pass works, but it is blocked now by random ARM codegen issues. The point is I'm tired of it. On top of that, the legalization pass I wrote only targets GPU backends, because LLVM does vector legalization for us on x86 and ARM.
So I'm writing this down, because I want to open a discussion: is there anything we can do to avoid these seemingly endless rabbit holes? In my opinion, the desired outcome is that we should avoid blocking simple bug fixes or new correctly functioning code because a new accompanying test reveals unrelated bugs. Perhaps there are some CI tricks we can do, to avoid having to remove the test (because the test is good and reveals a bug), but still be able to move forward with good and finished work.
