Improve performance of Cartesian indexing

Metal GPUs suffer from the way we encode Cartesian indices, presumably because of the integer division that happens when mapping a linear index to a Cartesian, but there may be other causes. In https://github.com/JuliaGPU/Metal.jl/pull/100 and https://github.com/JuliaGPU/GPUArrays.jl/pull/454, we worked around some of the more egregious performance issues by putting the indices in the type domain such that are known to LLVM, allowing the back-end compiler to optimize code (again, presumably avoiding the division by a constant integer by mapping them onto a bunch of bit operations).

This isn't ideal because it results in significantly more kernels being compiled. Ideally we figure out a way to better encode Cartesian indices, although it's obviously hard to avoid the integer division at all.

Alternatively, we might want to improve https://github.com/maleadt/StaticCartesian.jl, or something similar, so that we can perform this optimization ourselves instead of relying on the Metal back-end compiler, because relying on such an optimization might be fragile (as observed in https://github.com/JuliaGPU/GPUArrays.jl/pull/454 where we needed additional bounds information for the optimization to trigger).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of Cartesian indexing #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of Cartesian indexing #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions