Skip to content

[FEA] Investigate NvRTC / Jitify for compiling / instantiating expensive kernels at runtime. #4

@cjnolet

Description

@cjnolet

Two of the largest complaints about RAFT's vector search layer is that libraft is 1) huge, and 2) takes forever to compile. There's been a lot of great progress to reduce the compile time and binary size but some of that has been simply from reducing the amount of supported types.

We should investigate JIT compiling some of these things. Especially for index builds, which usually take some time anyways, the user could be willing to suffer a one-time penalty to compile the necessary kernels so that we can shrink the compile time and footprint of the binary as much as possible.

The other benefit to this more subtle but extremely powerful, as the user would only need to JIT compile for their local architecture, all the while cuVS would still be portable across compute architectures. This means for these code paths, we can avoid the 6+ architectures needing to be pre-compiled, and so the binary and compile times shrink by roughly that factor.

I propose we explore JIT compiling entire code paths from end-to-end (like CAGRA index build vs search for a specific set of types). Of course, it's important that these instantiations be available for the tests, but as most users are not downloading the test packages, it gives us some leeway to transfer compile times and binary footprint to the tests while still keeping the main conda package very lightweight.

Another option we can explore is providing 2 conda packages-

  1. one that is very lightweight and uses primarily JIT
  2. Another that has a lot of the expensive bits pre-compiled but leaves some of the custom type specializations to JIT

This would further allow users to determine which is more important to them.

A few folks have brought this up in the past and I wanted to capture this in an issue to start collecting thoughts from the team and the community.

The GraphBLAS has used this pattern successfully, and has adopted the pattern to fully JIT all of the kernels.

Tagging some of the folks who I've discussed this with in the past:

@jeaton32 @tfeher @divyegala @benfred @lowener @achirkin @robertmaynard @jrhemstad @leofang @vyasr @bdice @dantegd @trxcllnt @

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions