Introduce UDF Architecture by divyegala · Pull Request #1804 · rapidsai/cuvs

divyegala · 2026-02-15T01:21:58Z

This PR introduces User-Defined-Functions supporting architecture in cuVS and uses JIT LTO to achieve it. The initial example is written for passing a metric UDF to IVF Flat search kernels.

When tested with native L2 metric and UDF L2 metric, we get native performance.

dantegd · 2026-04-07T14:53:36Z

+// ============================================================================
+
+// Custom L2 (squared Euclidean) metric - should match built-in L2
+CUVS_METRIC(custom_l2, { acc += squared_diff(x, y); })


After looking again here, hadn't though thtat the test only covers float and int8_t with a custom L2 metric. We could consider adding:

A genuinely novel metric (e.g., Chebyshev/L-infinity) that differs from any built-in, to prove the UDF produces correct results

An error case test (bad CUDA code should give a clear error, not a crash)

uint8_t and half types

Testing that the cache actually works (second search with same UDF should be faster)

I do not know how to write case 1, as the build cannot use UDFs.

And for case 2, how do you verify that a clear error was received?

For case 1 that's true, but was thinking about search, not build. A test could look like:

Build the index normally with L2 (no UDF needed at build time)

Search with a Chebyshev UDF (e.g., acc = max(acc, abs(x - y)); with acc initialized to 0)

To verify correctness, do a brute-force search on CPU using the same Chebyshev metric, and compare results

The tricky part is that the index clusters were built with L2, so recall won't be optimal for Chebyshev. But you can work around this by using n_probes = n_lists (search all clusters), which should guarantees you scan the entire dataset, making it equivalent to brute force. At that point, the returned distances should match a CPU brute-force Chebyshev computation within tolerance.

The value of this test is that a custom L2 UDF can accidentally "pass" even if the UDF plumbing is partially broken (since the native L2 path might help over bugs). A genuinely different metric is the only way to prove the UDF code is actually being executed and producing correct results.

For case 2, couldn't we do something like

// Bad CUDA code, should produce a clear NVRTC compilation error, not a segfault std::string bad_udf = "this is not valid CUDA;"; search_params.metric_udf = bad_udf; EXPECT_THROW(cuvs::neighbors::ivf_flat::search(...), raft::exception);

You could also optionally catch the exception and verify the message contains something like "nvrtc compile error" to confirm the error is descriptive.

Done! All great ideas, thanks a lot.

…ivf-flat-search-udf

KyleFromNVIDIA

Approved with a small comment

KyleFromNVIDIA · 2026-04-15T19:33:14Z

+  } else if constexpr (std::is_same_v<U, int64_t>) {
+    return "int64_t";
+  } else {
+    static_assert(type_name_always_false_v<U>, "Unsupported type to create UDF");


Ugh, I wish C++ would allow you to static_assert(false) and only have it trigger when that constexpr branch is actually hit.

Honestly I was surprised too, I went with static_assert(false) to start with.

KyleFromNVIDIA · 2026-04-15T19:36:00Z

+inline std::string instantiate_udf(char const* data_type, char const* acc_type, int veclen)
+{
+  std::ostringstream oss;
+  oss << "\nnamespace cuvs { namespace neighbors { namespace ivf_flat { namespace detail {\n"


Aren't we using C++20? Couldn't this be flattened into namespace cuvs::neighbors::ivf_flat::detail?

Oh yes, and I think C++17 already had the nicer namespace convention.

divyegala · 2026-04-16T00:14:05Z

/merge

divyegala added 30 commits October 2, 2025 18:33

jit lto interleaved scan

a024f61

fix dependencies.yaml

45da4aa

generate files at build time, use tags to avoid compilation of types

a7c8621

passing tests

eb2d74b

update gitignore

d2318e8

separate out distance function from main kernel

5e6afcd

fix deps

6eee4da

add filters as jit device functions, rework caching logic

1de8f28

lto post lambda, cleanup files, generate cmake in build dir

84c6020

don't read hardcoded kernels, use generator properly

22680c8

random cmake changes carried over from 25.10

37f1163

cmake format

0ae5383

remove dep on kernel list

fe56aec

attempt to solve overlinking problem

40c8fd6

reorder if-else in compiler check

e87a8c7

Merge branch 'branch-25.12' into jit-lto-ivf-flat-interleaved

179d733

use cudart apis

32a67bd

merge

c27612e

attempt to link cudart

a4b48b1

revert cudart link, try all arch build of jit lto fatbin sources

d5d692e

cmake format

1c6dd94

missing shared mem setting

30f5ab6

separate cuda 12 and 13 compilation

9674969

merge upstream

24fc47d

remove bench

db9a487

c include directory

aa9294f

style check

2eb77fe

merge upstream

6c685fa

guard cuda calls and use shared_ptr

3e35b99

add AlgorithmPlanner to main target

d0ff62c

Merge branch 'main' into ivf-flat-search-udf

6d83226

dantegd requested changes Apr 7, 2026

View reviewed changes

divyegala added 6 commits April 8, 2026 17:13

Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf

569706b

address review feedback part 1

ffadc27

address reviews for tests, try to add fp16 embedding with AI

2994c62

no embedding headers

9b24f05

no embedding headers

dea79bd

update ivf pq

2756334

divyegala requested a review from a team as a code owner April 13, 2026 22:26

divyegala added 2 commits April 13, 2026 22:49

comprehensive type checks for wheel builds

852684b

exclude nvrtc

d1c5d29

divyegala requested a review from a team as a code owner April 13, 2026 23:28

divyegala added 10 commits April 14, 2026 04:44

clean up recipe

33db1d8

Merge branch 'main' into ivf-flat-search-udf

27abacc

ignore run export

56de24a

Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf

b6c1d85

Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …

564f91f

…ivf-flat-search-udf

attempt to fix dp4a link issues; other reviews

6749933

add chebyshev

906bd98

add expect throw test

4a76e2f

try different nvrtc version get

f63d40a

brackets

f0cc886

dantegd approved these changes Apr 15, 2026

View reviewed changes

divyegala added 2 commits April 15, 2026 19:20

merge upstream

9a346c6

comments

b52ce36

KyleFromNVIDIA approved these changes Apr 15, 2026

View reviewed changes

review comments

3fde027

rapids-bot bot merged commit a626f60 into rapidsai:main Apr 16, 2026
226 of 230 checks passed

github-project-automation bot moved this from In Progress to Done in Unstructured Data Processing Apr 16, 2026

Conversation

divyegala commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dantegd Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

divyegala Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dantegd Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

divyegala Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

Uh oh!

KyleFromNVIDIA Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

divyegala Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

KyleFromNVIDIA Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

divyegala Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

divyegala commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

divyegala commented Feb 15, 2026 •

edited

Loading

divyegala Apr 9, 2026 •

edited

Loading