Skip to content

Introduce UDF Architecture#1804

Merged
rapids-bot[bot] merged 185 commits intorapidsai:mainfrom
divyegala:ivf-flat-search-udf
Apr 16, 2026
Merged

Introduce UDF Architecture#1804
rapids-bot[bot] merged 185 commits intorapidsai:mainfrom
divyegala:ivf-flat-search-udf

Conversation

@divyegala
Copy link
Copy Markdown
Member

@divyegala divyegala commented Feb 15, 2026

This PR introduces User-Defined-Functions supporting architecture in cuVS and uses JIT LTO to achieve it. The initial example is written for passing a metric UDF to IVF Flat search kernels.

When tested with native L2 metric and UDF L2 metric, we get native performance.
image

Comment thread cpp/src/neighbors/ivf_flat/ivf_flat_search.cuh Outdated
Comment thread cpp/src/neighbors/ivf_flat_index.cpp Outdated
Comment thread cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cpp
Comment thread cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cpp Outdated
Comment thread cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan_jit.cuh Outdated
Comment thread cpp/src/neighbors/refine/refine_device.cuh Outdated
Comment thread cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cpp Outdated
Comment thread cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan_jit.cuh
// ============================================================================

// Custom L2 (squared Euclidean) metric - should match built-in L2
CUVS_METRIC(custom_l2, { acc += squared_diff(x, y); })
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking again here, hadn't though thtat the test only covers float and int8_t with a custom L2 metric. We could consider adding:

  • A genuinely novel metric (e.g., Chebyshev/L-infinity) that differs from any built-in, to prove the UDF produces correct results
  • An error case test (bad CUDA code should give a clear error, not a crash)
  • uint8_t and half types
  • Testing that the cache actually works (second search with same UDF should be faster)

Copy link
Copy Markdown
Member Author

@divyegala divyegala Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know how to write case 1, as the build cannot use UDFs.

And for case 2, how do you verify that a clear error was received?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For case 1 that's true, but was thinking about search, not build. A test could look like:

  • Build the index normally with L2 (no UDF needed at build time)
  • Search with a Chebyshev UDF (e.g., acc = max(acc, abs(x - y)); with acc initialized to 0)
  • To verify correctness, do a brute-force search on CPU using the same Chebyshev metric, and compare results

The tricky part is that the index clusters were built with L2, so recall won't be optimal for Chebyshev. But you can work around this by using n_probes = n_lists (search all clusters), which should guarantees you scan the entire dataset, making it equivalent to brute force. At that point, the returned distances should match a CPU brute-force Chebyshev computation within tolerance.

The value of this test is that a custom L2 UDF can accidentally "pass" even if the UDF plumbing is partially broken (since the native L2 path might help over bugs). A genuinely different metric is the only way to prove the UDF code is actually being executed and producing correct results.

For case 2, couldn't we do something like

// Bad CUDA code, should produce a clear NVRTC compilation error, not a segfault
std::string bad_udf = "this is not valid CUDA;";
search_params.metric_udf = bad_udf;
EXPECT_THROW(cuvs::neighbors::ivf_flat::search(...), raft::exception);

You could also optionally catch the exception and verify the message contains something like "nvrtc compile error" to confirm the error is descriptive.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! All great ideas, thanks a lot.

Comment thread cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cpp Outdated
@divyegala divyegala requested a review from a team as a code owner April 13, 2026 22:26
@divyegala divyegala requested a review from a team as a code owner April 13, 2026 23:28
Copy link
Copy Markdown
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a small comment

} else if constexpr (std::is_same_v<U, int64_t>) {
return "int64_t";
} else {
static_assert(type_name_always_false_v<U>, "Unsupported type to create UDF");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I wish C++ would allow you to static_assert(false) and only have it trigger when that constexpr branch is actually hit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I was surprised too, I went with static_assert(false) to start with.

Comment thread cpp/include/cuvs/neighbors/ivf_flat.hpp Outdated
inline std::string instantiate_udf(char const* data_type, char const* acc_type, int veclen)
{
std::ostringstream oss;
oss << "\nnamespace cuvs { namespace neighbors { namespace ivf_flat { namespace detail {\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we using C++20? Couldn't this be flattened into namespace cuvs::neighbors::ivf_flat::detail?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, and I think C++17 already had the nicer namespace convention.

@divyegala
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot bot merged commit a626f60 into rapidsai:main Apr 16, 2026
226 of 230 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Unstructured Data Processing Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Introduce UDF Architecture and apply to interleaved_scan_kernel metric functions

9 participants