Introduce UDF Architecture#1804
Conversation
| // ============================================================================ | ||
|
|
||
| // Custom L2 (squared Euclidean) metric - should match built-in L2 | ||
| CUVS_METRIC(custom_l2, { acc += squared_diff(x, y); }) |
There was a problem hiding this comment.
After looking again here, hadn't though thtat the test only covers float and int8_t with a custom L2 metric. We could consider adding:
- A genuinely novel metric (e.g., Chebyshev/L-infinity) that differs from any built-in, to prove the UDF produces correct results
- An error case test (bad CUDA code should give a clear error, not a crash)
- uint8_t and half types
- Testing that the cache actually works (second search with same UDF should be faster)
There was a problem hiding this comment.
I do not know how to write case 1, as the build cannot use UDFs.
And for case 2, how do you verify that a clear error was received?
There was a problem hiding this comment.
For case 1 that's true, but was thinking about search, not build. A test could look like:
- Build the index normally with L2 (no UDF needed at build time)
- Search with a Chebyshev UDF (e.g., acc = max(acc, abs(x - y)); with acc initialized to 0)
- To verify correctness, do a brute-force search on CPU using the same Chebyshev metric, and compare results
The tricky part is that the index clusters were built with L2, so recall won't be optimal for Chebyshev. But you can work around this by using n_probes = n_lists (search all clusters), which should guarantees you scan the entire dataset, making it equivalent to brute force. At that point, the returned distances should match a CPU brute-force Chebyshev computation within tolerance.
The value of this test is that a custom L2 UDF can accidentally "pass" even if the UDF plumbing is partially broken (since the native L2 path might help over bugs). A genuinely different metric is the only way to prove the UDF code is actually being executed and producing correct results.
For case 2, couldn't we do something like
// Bad CUDA code, should produce a clear NVRTC compilation error, not a segfault
std::string bad_udf = "this is not valid CUDA;";
search_params.metric_udf = bad_udf;
EXPECT_THROW(cuvs::neighbors::ivf_flat::search(...), raft::exception);You could also optionally catch the exception and verify the message contains something like "nvrtc compile error" to confirm the error is descriptive.
There was a problem hiding this comment.
Done! All great ideas, thanks a lot.
…ivf-flat-search-udf
KyleFromNVIDIA
left a comment
There was a problem hiding this comment.
Approved with a small comment
| } else if constexpr (std::is_same_v<U, int64_t>) { | ||
| return "int64_t"; | ||
| } else { | ||
| static_assert(type_name_always_false_v<U>, "Unsupported type to create UDF"); |
There was a problem hiding this comment.
Ugh, I wish C++ would allow you to static_assert(false) and only have it trigger when that constexpr branch is actually hit.
There was a problem hiding this comment.
Honestly I was surprised too, I went with static_assert(false) to start with.
| inline std::string instantiate_udf(char const* data_type, char const* acc_type, int veclen) | ||
| { | ||
| std::ostringstream oss; | ||
| oss << "\nnamespace cuvs { namespace neighbors { namespace ivf_flat { namespace detail {\n" |
There was a problem hiding this comment.
Aren't we using C++20? Couldn't this be flattened into namespace cuvs::neighbors::ivf_flat::detail?
There was a problem hiding this comment.
Oh yes, and I think C++17 already had the nicer namespace convention.
|
/merge |
This PR introduces User-Defined-Functions supporting architecture in cuVS and uses JIT LTO to achieve it. The initial example is written for passing a metric UDF to IVF Flat search kernels.
When tested with native L2 metric and UDF L2 metric, we get native performance.
