Skip to content

Improve performance of Cartesian indexing #101

@maleadt

Description

@maleadt

Metal GPUs suffer from the way we encode Cartesian indices, presumably because of the integer division that happens when mapping a linear index to a Cartesian, but there may be other causes. In #100 and JuliaGPU/GPUArrays.jl#454, we worked around some of the more egregious performance issues by putting the indices in the type domain such that are known to LLVM, allowing the back-end compiler to optimize code (again, presumably avoiding the division by a constant integer by mapping them onto a bunch of bit operations).

This isn't ideal because it results in significantly more kernels being compiled. Ideally we figure out a way to better encode Cartesian indices, although it's obviously hard to avoid the integer division at all.

Alternatively, we might want to improve https://github.com/maleadt/StaticCartesian.jl, or something similar, so that we can perform this optimization ourselves instead of relying on the Metal back-end compiler, because relying on such an optimization might be fragile (as observed in JuliaGPU/GPUArrays.jl#454 where we needed additional bounds information for the optimization to trigger).

Metadata

Metadata

Assignees

No one assigned

    Labels

    kernelsThings about kernels and how they are compiled.performanceGotta go fast.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions