-
Notifications
You must be signed in to change notification settings - Fork 701
Description
@sunfishcode Suggests an interesting and seemingly useful memory access optimization for loops, in #442 The idea as I understand it is to exploit a known loop variable stride, plus a known protected guard zone beyond the end of the linear memory, to be able to eliminate bounds check for a pointer that approaches the end of memory by the known stride. Where the stride is less than the size of the guard zone. I also recall @lukewagner mentioning adding a loop operator that might declare the loop variables and stride.
While an optimizing compiler should be able to use loop analysis to compute the stride, I note that to date Odin only handles a stride of one, and does v8 optimize this at all? It seems in the spirit of wasm to offload some of the loop analysis to the producer and adding an operation that captures a loop with variables with a constant stride might a useful pattern. This could give the producer more confidence that if it emitted code in this form that it could be well optimized by a good range of wasm runtimes, and the burden on the consumer might not be high.
This might also anticipate the case of a loop variable added to a base to make the access pointer. It is also not uncommon to have multiple bases, for example copying between two arrays. For this bounds check optimize it might be necessary to be sure that these bases are constant within the loop, and might it be useful to allow them to be explicitly named in the loop operator and validation to check that they are not written to within the loop. The rule would then be that an access pointer derived from an expression using just these know loop variables and bases could optimize away the bounds check - a simple enough rule for the the producer and consumer to have confidence in. The runtime compiler would then have a range of options for code optimization, such as adding the index to each base on each iteration, or hoisting pointer and increasing them all on each iteration.
For added utility allow a base alignment to be specified so that the access alignment can also be known (using the stride). The base low bits would be masked before the loop to ensure this alignment.