[REVIEW] cuVS bench: Fix cudaFuncSetAttribute not being called when CAGRA search switches kernel variants by irina-resh-nvda · Pull Request #1851 · rapidsai/cuvs

irina-resh-nvda · 2026-02-25T15:32:46Z

Fix a bug in safely_launch_kernel_with_smem_size where cudaFuncSetAttribute was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but cudaFuncSetAttribute applies per function pointer value — and the single-CTA CAGRA search dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip cudaFuncSetAttribute, causing cudaErrorInvalidValue. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

Error in question

$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
#4 in libcuvs.so(+0x18fd0f1)
#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
#6-#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"

Config

{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

copy-pr-bot · 2026-02-25T15:32:50Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

achirkin

Oh, this was indeed an oversight in the original design. Thanks for working on this!

achirkin · 2026-02-25T16:56:24Z

-      }
+  // current_smem_size is a monotonically growing high-water mark across all kernel pointers.
+  // current_kernel tracks which kernel pointer was last used.
+  static uint32_t current_smem_size{0};


Would it be possible to retain the atomic-fast-path semantics (perhaps stronger memory order and two atomic variables)?

in this case, since we are only tracking the watermark, there is no danger in reading an inconsistent state with two atomics, but what will be the benefit of doing it this way vs a mutex?

i withdraw my question given that smem_utils is performance critical functionality

mythrocks · 2026-02-25T22:42:08Z

+    // When the kernel function pointer changes, bring the new kernel up to the global high-water
+    // mark. This is necessary because cudaFuncSetAttribute applies to a specific function pointer,
+    // not to the pointer type — different template instantiations may share the same KernelT.


👏 Great catch.

I'm feeling a little silly for not having thought of this, actually.

I thought we have exactly one pointer per type, but apparently we're not (non-type template parameters).

mythrocks · 2026-02-25T22:54:32Z

+    if (kernel != last_kernel) {
+      current_kernel = kernel;
+      auto launch_status =
+        cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, last_smem_size);
+      RAFT_EXPECTS(launch_status == cudaSuccess,
+                   "Failed to set max dynamic shared memory size to %u bytes",
+                   last_smem_size);


I've a silly question: Why aren't these two conditions combined into one block?

if (smem_size > last_smem_size || kernel != last_kernel) { // 1. Record high-watermark, current kernel. // 2. Call cudaFuncSetAttribute(). }

Come to think of it, we should probably have put this in a double-checked lock, no?

// For the first check, no mutex. if (smem_size > current_smem_size || kernel != current_kernel) { // Something's changed. Grab the mutex, and examine. auto guard = std::lock_guard<std::mutex>{mutex}; auto call_set_attribute = false; if (smem_size > current_smem_size) { current_smem_size = smem_size; call_set_attribute = true; } if (kernel != current_kernel) { current_kernel = kernel; call_set_attribute = true; } if (call_set_attribute) { auto launch_status = cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size); RAFT_EXPECTS(launch_status == cudaSuccess, "Failed to set max dynamic shared memory size to %u bytes", smem_size); } }

Apologies if this is too naive, or I'm missing something.

no you are right this can be trimmed

mythrocks · 2026-02-25T23:04:53Z

+    auto last_kernel    = current_kernel;
+    auto last_smem_size = current_smem_size;


Sorry, why is it necessary to make copies of the current high-watermark and the current_kernel? Why not just use current_kernel directly? We're holding the lock_guard when these are modified, so it should be safe.

What am I missing?

You are right, it's an artefact from when these were two atomics =)

divyegala

Can you account for the case where KernelT is just a cudaKernel_t or cudaFunc_t?

Found fix.

mythrocks · 2026-02-26T23:24:19Z

Actually, I rather like @divyegala's approach of tracking mem-sizes per kernel, via a std::unordered_map. But @achirkin might know best about whether we want to persist the current smem_max across all kernels evenly, or track them separately. (In that case, we might consider a std::unordered_set instead.)

The map/set version will likely work for both function pointers and cudaKernel_t alike, so we might not even need a template specialization for the latter.

achirkin · 2026-03-02T08:00:07Z

I'm thinking whether it's still possible to maintain compile-time dictionary of the kernels and smem sizes rather than run-time. What if we just propagate/add the template parameters from the outer scope to ensure there's always one template per kernel instantiation? These host functions are small, so we won't be blowing up the binary size while also avoiding the runtime costs for the locks and dictionaries.

irina-resh-nvda · 2026-03-02T09:00:25Z

I'm thinking whether it's still possible to maintain compile-time dictionary of the kernels and smem sizes rather than run-time. What if we just propagate/add the template parameters from the outer scope to ensure there's always one template per kernel instantiation? These host functions are small, so we won't be blowing up the binary size while also avoiding the runtime costs for the locks and dictionaries.

This is the benchmark launcher functionality, not a performance-critical algorithmic part. Do you think it's worth it to try and optimise out the run-time overhead?

irina-resh-nvda · 2026-03-02T10:23:33Z

Can you account for the case where KernelT is just a cudaKernel_t or cudaFunc_t?

Is this still relevant for you?
It's unclear to me how will KernelLauncherT behave if given cudaKernel_t or cudaFunc_t

achirkin · 2026-03-02T15:35:42Z

This is the benchmark launcher functionality, not a performance-critical algorithmic part. Do you think it's worth it to try and optimise out the run-time overhead?

No, the smem helper in cpp/src/neighbors/detail/smem_utils.cuh is in a performance-critical path, it's invoked during search. It's critical for the case of launching many concurrent small-batch searches.

irina-resh-nvda · 2026-03-02T15:40:44Z

This is the benchmark launcher functionality, not a performance-critical algorithmic part. Do you think it's worth it to try and optimise out the run-time overhead?

No, the smem helper in cpp/src/neighbors/detail/smem_utils.cuh is in a performance-critical path, it's invoked during search. It's critical for the case of launching many concurrent small-batch searches.

Oh I completely missed that, I thought I fixed a cuvs bench bug. Then for sure

irina-resh-nvda · 2026-03-02T18:17:48Z

I updated the implementation to use two atomics (order_relaxed because of monotonic smem_size)
However, this approach looks a little slower in some cases when running cuvs bench:
one-mutex approach:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_q_iterative/0/process_time/real_time        4.27 ms         4.27 ms          164   4.26053m   4.27194m    0.84972   0.700598       2.34086M/s         64         10              8        10k            1            2         1.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time        4.27 ms         4.27 ms          164   4.25854m   4.26998m    0.84972   0.700277       2.34194M/s         64         10              8        10k            1            2         1.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time        3.57 ms         3.57 ms          196   3.55504m   3.56633m    0.84494      0.699       2.80401M/s        128         10             12        10k            1            1         1.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time        4.47 ms         4.47 ms          156   4.45523m   4.46646m    0.85445   0.696768       2.23891M/s        128         10             16        10k            1            1         1.56M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time        4.92 ms         4.92 ms          140   4.90958m   4.92073m    0.85754   0.688902       2.03222M/s        256         10             16        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time        7.47 ms         7.47 ms           93   7.45551m   7.46701m    0.85994   0.694432       1.33923M/s        512         10             10        10k            1            2          930k dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time        6.52 ms         6.52 ms          106   6.51184m   6.52313m    0.86124   0.691451       1.53301M/s        256         10             12        10k            1            2         1060k dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time        8.45 ms         8.45 ms           82     8.437m   8.44948m    0.85983   0.692857       1.18351M/s         32         10             32        10k            1            1          820k dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time        10.6 ms         10.6 ms           65  0.0106346  0.0106464    0.86101   0.692016        939.29k/s         32         10             64        10k            1            1          650k dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time        11.3 ms         11.3 ms           61  0.0112994  0.0113108    0.86274   0.689959       884.112k/s        192         10             12        10k            1            4          610k dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time       11.6 ms         11.6 ms           60  0.0115922  0.0116036    0.86268   0.696217       861.802k/s        256         10             12        10k            1            4          600k dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time       36.9 ms         36.9 ms           19  0.0368664  0.0368782    0.86319   0.700685       271.164k/s        256         10             10        10k            1           16          190k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time        117 ms          117 ms            6   0.116596   0.116613    0.86334   0.699677       85.7542k/s        512         10             32        10k            1           16           60k dataset_memory_type="device"

two atomics + mutex approach:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_q_iterative/0/process_time/real_time        4.28 ms         4.27 ms          164   4.26642m   4.27792m    0.85078   0.701578        2.3376M/s         64         10              8        10k            1            2         1.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time        4.27 ms         4.27 ms          164   4.26101m    4.2725m    0.85078   0.700691       2.34056M/s         64         10              8        10k            1            2         1.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time        3.58 ms         3.57 ms          189   3.56572m   3.57723m    0.84674   0.676096       2.79547M/s        128         10             12        10k            1            1         1.89M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time        4.46 ms         4.47 ms          156   4.45332m   4.46465m     0.8556   0.696485       2.23982M/s        128         10             16        10k            1            1         1.56M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time        5.02 ms         4.93 ms          139   5.01347m   5.02475m    0.85859   0.698441       1.99015M/s        256         10             16        10k            1            1         1.39M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time        7.48 ms         7.47 ms           83   7.46776m   7.47937m    0.86108   0.620787       1.33702M/s        512         10             10        10k            1            2          830k dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time        6.52 ms         6.52 ms          106   6.50945m   6.52083m    0.86156   0.691208       1.53355M/s        256         10             12        10k            1            2         1060k dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time        8.43 ms         8.43 ms           82   8.41571m   8.42704m    0.86001   0.691018       1.18666M/s         32         10             32        10k            1            1          820k dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time        10.1 ms         10.0 ms           61  0.0100925  0.0101043    0.86112   0.616365       989.678k/s         32         10             64        10k            1            1          610k dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time        11.3 ms         11.3 ms           60   0.011294  0.0113053     0.8634    0.67832       884.541k/s        192         10             12        10k            1            4          600k dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time       11.6 ms         11.6 ms           60  0.0115935   0.011605    0.86336   0.696301       861.698k/s        256         10             12        10k            1            4          600k dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time       37.0 ms         36.9 ms           19  0.0369513   0.036963    0.86332   0.702297       270.541k/s        256         10             10        10k            1           16          190k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time        118 ms          117 ms            6   0.117519   0.117536    0.86364   0.705215       85.0806k/s        512         10             32        10k            1           16           60k dataset_memory_type="device"

divyegala · 2026-03-02T18:29:31Z

Is this still relevant for you?

Yes. But I'll fix it on my own if your PR does not account for that case, although I do prefer the solution to be more generic.

achirkin

Thanks for exploring the less-locking approach!
Could you please expand your benchmarks to also test the throughput mode (--mode=throughput --threads=1:1024) and increase the benchmark case time for more stable results (--benchmark_min_time=3s)?

achirkin · 2026-03-03T05:03:54Z

+    auto launch_status =
+      cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, cur_smem_size);
+    RAFT_EXPECTS(launch_status == cudaSuccess,
+                 "Failed to set max dynamic shared memory size to %u bytes",
+                 cur_smem_size);


There are couple issues here:

by the time the mutex is locked, another thread may have already called cudaFuncSetAttribute, so the update wouldn't be needed anymore - leads to doing the work twice. So, you'd need to repeat the atomic check to avoid it.

By the time smem_size > cur_smem_size checked, another thread may have already increased the last_smem_size and changed the last_kernel, so the update_needed may be incorrectly set to false. To fix this, you'd need to reorder the checks, introduce a loop for checking both atomics, or expand the locked section.

Cherry-picked from upstream PR rapidsai#1851. Tracks kernel function pointer changes and re-applies shared memory attribute when CAGRA search switches between kernel variants, preventing silent performance degradation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

irina-resh-nvda · 2026-03-23T16:49:43Z

@achirkin
New benchmarks (using the newest commit) with the flags you requested

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                              Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_q_iterative/0/process_time/real_time/threads:1           4.26 ms         4.27 ms          983   4.25264m   4.26404m    0.84747    4.19155        2.3452M/s         64         10              8        10k            1            2         9.83M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:2           3.76 ms         7.52 ms         1114   7.51068m   7.52681m    0.84747     4.1925       2.65835M/s         64         10              8        10k            1            2        11.14M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:4           3.76 ms         15.0 ms         1116  0.0150265  0.0150596    0.84747    4.20166       2.65966M/s         64         10              8        10k            1            2        11.16M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:8           3.75 ms         29.8 ms         1120  0.0300058   0.030161    0.84747    4.22251       2.66495M/s         64         10              8        10k            1            2         11.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:16          3.74 ms         59.2 ms         1264  0.0304797  0.0604478    0.84747    4.77536       2.67167M/s         64         10              8        10k            1            2        12.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:32          3.69 ms          101 ms         1120  0.0325192   0.120937    0.84747    4.23278       2.71309M/s         64         10              8        10k            1            2         11.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:64          3.74 ms          100 ms         1152  0.0462351   0.250309    0.84747    4.50583       2.67661M/s         64         10              8        10k            1            2        11.52M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:128         3.70 ms         96.8 ms         1408   0.093142   0.511545    0.84747    5.62735       2.70053M/s         64         10              8        10k            1            2        14.08M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:256         3.11 ms         72.6 ms         1024   0.415988    1.07977    0.84747    4.31994       3.21259M/s         64         10              8        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:512         3.39 ms         59.5 ms         1024    1.42318    2.54438    0.84747     5.0871       2.94647M/s         64         10              8        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/0/process_time/real_time/threads:1024        4.26 ms         52.7 ms         1024    4.36568    6.30396    0.84747    6.30608       2.34491M/s         64         10              8        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:1           4.27 ms         4.27 ms          982   4.25389m    4.2655m    0.84747    4.18873       2.34439M/s         64         10              8        10k            1            2         9.82M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:2           3.76 ms         7.52 ms         1114   7.50801m   7.52447m    0.84747    4.19121       2.65918M/s         64         10              8        10k            1            2        11.14M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:4           3.76 ms         15.0 ms         1120   0.015034  0.0150745    0.84747    4.22091       2.65826M/s         64         10              8        10k            1            2         11.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:8           3.75 ms         29.9 ms         1120  0.0299985  0.0301534    0.84747    4.22145       2.66559M/s         64         10              8        10k            1            2         11.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:16          3.75 ms         59.2 ms         1264  0.0305015  0.0604605    0.84747    4.77631        2.6698M/s         64         10              8        10k            1            2        12.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:32          3.52 ms         94.2 ms         1248  0.0328032   0.122089    0.84747     4.7613       2.83944M/s         64         10              8        10k            1            2        12.48M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:64          3.60 ms         93.2 ms         1152  0.0528436    0.24902    0.84747    4.48214       2.77583M/s         64         10              8        10k            1            2        11.52M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:128         3.69 ms         92.6 ms          896   0.127664    0.53028    0.84747    3.71223       2.70972M/s         64         10              8        10k            1            2         8.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:256         3.30 ms         78.3 ms         1024   0.399409     1.0932    0.84747    4.37352       3.03257M/s         64         10              8        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:512         2.94 ms         57.9 ms         1024    1.23099    2.36729    0.84747    4.73089       3.39818M/s         64         10              8        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/1/process_time/real_time/threads:1024        4.17 ms         54.5 ms         1024    4.24469    6.28381    0.84747    6.28045       2.39902M/s         64         10              8        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:1           3.56 ms         3.56 ms         1177   3.54751m   3.55908m    0.84298    4.18904       2.80972M/s        128         10             12        10k            1            1        11.77M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:2           3.12 ms         6.25 ms         1340    6.2352m   6.25054m    0.84298    4.18787       3.20089M/s        128         10             12        10k            1            1         13.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:4           3.12 ms         12.5 ms         1344  0.0124838  0.0125113    0.84298    4.20376       3.20072M/s        128         10             12        10k            1            1        13.44M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:8           3.12 ms         24.8 ms         1352  0.0249172  0.0250403    0.84298    4.23178       3.20885M/s        128         10             12        10k            1            1        13.52M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:16          3.12 ms         49.2 ms         1504  0.0253534  0.0502439    0.84298    4.72287       3.20994M/s        128         10             12        10k            1            1        15.04M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:32          2.98 ms         81.4 ms         1408  0.0268179   0.100126    0.84298    4.40549        3.3534M/s        128         10             12        10k            1            1        14.08M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:64          3.11 ms         84.4 ms         1536  0.0350969   0.205721    0.84298    4.93739       3.21547M/s        128         10             12        10k            1            1        15.36M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:128         2.71 ms         69.3 ms         1152   0.077869   0.403269    0.84298    3.62965       3.68494M/s        128         10             12        10k            1            1        11.52M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:256         2.60 ms         64.6 ms         1280   0.305413   0.876562    0.84298    4.38333       3.84575M/s        128         10             12        10k            1            1         12.8M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:512         2.08 ms         51.0 ms         1536   0.755542    1.70121    0.84298    5.10463       4.81926M/s        128         10             12        10k            1            1        15.36M dataset_memory_type="device"
cuvs_cagra_q_iterative/2/process_time/real_time/threads:1024        3.88 ms         46.7 ms         1024    3.93342    5.72016    0.84298    5.72221       2.58026M/s        128         10             12        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:1           4.46 ms         4.46 ms          940   4.44947m   4.46149m    0.85236     4.1938        2.2414M/s        128         10             16        10k            1            1          9.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:2           3.91 ms         7.82 ms         1070    7.8121m   7.83973m    0.85236    4.19936       2.55537M/s        128         10             16        10k            1            1         10.7M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:4           3.91 ms         15.6 ms         1072   0.015634  0.0156694    0.85236     4.1993       2.55633M/s        128         10             16        10k            1            1        10.72M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:8           3.90 ms         31.1 ms         1120  0.0312072  0.0313691    0.85236    4.39151       2.56232M/s        128         10             16        10k            1            1         11.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:16          3.54 ms         52.5 ms         1104  0.0313415  0.0626231    0.85236    4.32099       2.82166M/s        128         10             16        10k            1            1        11.04M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:32          3.75 ms          102 ms         1088  0.0340829   0.126965    0.85236    4.31684       2.66808M/s        128         10             16        10k            1            1        10.88M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:64          3.65 ms         98.6 ms         1088  0.0452013   0.251998    0.85236    4.28387       2.74043M/s        128         10             16        10k            1            1        10.88M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:128         3.48 ms         94.1 ms          896   0.110273   0.515823    0.85236    3.61101       2.87137M/s        128         10             16        10k            1            1         8.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:256         3.44 ms         87.1 ms         1792   0.196212    1.03583    0.85236    7.25145       2.90419M/s        128         10             16        10k            1            1        17.92M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:512         3.16 ms         66.8 ms         1536    1.05771    2.34923    0.85236    7.04905       3.16891M/s        128         10             16        10k            1            1        15.36M dataset_memory_type="device"
cuvs_cagra_q_iterative/3/process_time/real_time/threads:1024        4.57 ms         55.0 ms         1024    4.68035    6.71024    0.85236    6.70689       2.18642M/s        128         10             16        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:1           4.94 ms         4.92 ms          849   4.92898m   4.94083m    0.85682    4.19477       2.02395M/s        256         10             16        10k            1            1         8.49M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:2           4.61 ms         9.22 ms          910   9.21324m   9.23171m    0.85682     4.2005       2.16766M/s        256         10             16        10k            1            1          9.1M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:4           4.61 ms         18.4 ms          912  0.0184098  0.0184539    0.85682     4.2075       2.17117M/s        256         10             16        10k            1            1         9.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:8           4.59 ms         36.6 ms          952  0.0367164  0.0369396    0.85682    4.39582       2.17804M/s        256         10             16        10k            1            1         9.52M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:16          4.55 ms         71.7 ms          992  0.0371738  0.0738172    0.85682    4.57662        2.1999M/s        256         10             16        10k            1            1         9.92M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:32          4.42 ms          120 ms          992  0.0399174   0.147912    0.85682    4.58531       2.26497M/s        256         10             16        10k            1            1         9.92M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:64          4.38 ms          120 ms          896  0.0552238    0.29989    0.85682    4.19823       2.28228M/s        256         10             16        10k            1            1         8.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:128         3.72 ms          101 ms          896   0.165797   0.601902    0.85682    4.21346       2.68946M/s        256         10             16        10k            1            1         8.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:256         3.59 ms         82.7 ms         1024   0.380164    1.27415    0.85682    5.09742       2.78569M/s        256         10             16        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:512         3.83 ms         75.1 ms         1024    1.44992    2.89466    0.85682    5.79061       2.61049M/s        256         10             16        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/4/process_time/real_time/threads:1024        5.01 ms         71.1 ms         1024    4.85979    7.23408    0.85682    7.23652       1.99689M/s        256         10             16        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:1           7.46 ms         7.46 ms          562   7.44789m   7.45966m    0.85861    4.19233       1.34055M/s        512         10             10        10k            1            2         5.62M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:2           7.23 ms         14.5 ms          580  0.0144516  0.0144779   0.858745    4.19865       1.38263M/s        512         10             10        10k            1            2          5.8M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:4           7.23 ms         28.8 ms          580  0.0288943  0.0290363    0.85886     4.2128       1.38368M/s        512         10             10        10k            1            2          5.8M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:8           7.19 ms         57.1 ms          600   0.057466  0.0580009    0.85878    4.34999       1.39178M/s        512         10             10        10k            1            2            6M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:16          7.17 ms          113 ms          640  0.0586955   0.116338   0.858862    4.65344       1.39531M/s        512         10             10        10k            1            2          6.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:32          6.82 ms          185 ms          608  0.0640125   0.232856   0.858876    4.42427       1.46523M/s        512         10             10        10k            1            2         6.08M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:64          6.76 ms          186 ms          704  0.0889401   0.468263   0.858906      5.151       1.47852M/s        512         10             10        10k            1            2         7.04M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:128         6.18 ms          170 ms          896   0.190995   0.936132   0.858862    6.55317         1.617M/s        512         10             10        10k            1            2         8.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:256         6.19 ms          148 ms          768   0.753871    2.07761   0.858888    6.23345       1.61559M/s        512         10             10        10k            1            2         7.68M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:512         6.03 ms         99.4 ms          512    3.08049    4.94092   0.858895    4.94255       1.65903M/s        512         10             10        10k            1            2         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/5/process_time/real_time/threads:1024        5.80 ms          102 ms         1024    5.81449    9.57967   0.858864    9.57797       1.72475M/s        512         10             10        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:1           6.52 ms         6.52 ms          643   6.50481m   6.51671m    0.86104    4.19024       1.53452M/s        256         10             12        10k            1            2         6.43M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:2           6.13 ms         12.2 ms          686  0.0122414  0.0122641    0.86098    4.20653       1.63198M/s        256         10             12        10k            1            2         6.86M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:4           6.12 ms         24.4 ms          688  0.0244572  0.0245248    0.86099    4.21827       1.63459M/s        256         10             12        10k            1            2         6.88M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:8           6.11 ms         48.7 ms          720  0.0488587  0.0491128   0.860976    4.41995        1.6369M/s        256         10             12        10k            1            2          7.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:16          6.08 ms         95.9 ms          704  0.0498208  0.0986288   0.860988    4.33962        1.6438M/s        256         10             12        10k            1            2         7.04M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:32          5.93 ms          160 ms          736   0.054321   0.199185    0.86098    4.58153       1.68618M/s        256         10             12        10k            1            2         7.36M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:64          5.77 ms          158 ms          768  0.0736453   0.398476   0.860983    4.78181       1.73448M/s        256         10             12        10k            1            2         7.68M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:128         5.88 ms          152 ms          768   0.185618   0.840867   0.860976    5.04553       1.70199M/s        256         10             12        10k            1            2         7.68M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:256         4.72 ms          129 ms         1024   0.446326    1.59564   0.860982    6.38193       2.11695M/s        256         10             12        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:512         5.34 ms          115 ms         1024    1.67939    3.71042   0.860981    7.42286       1.87246M/s        256         10             12        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/6/process_time/real_time/threads:1024        5.18 ms         85.5 ms         1024    5.24696    8.49105   0.860982    8.48756       1.93035M/s        256         10             12        10k            1            2        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:1           8.42 ms         8.42 ms          498    8.4087m   8.42075m    0.85952    4.19353       1.18754M/s         32         10             32        10k            1            1         4.98M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:2           7.48 ms         15.0 ms          560  0.0149531  0.0149797    0.85952    4.19426        1.3363M/s         32         10             32        10k            1            1          5.6M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:4           7.48 ms         29.8 ms          564  0.0298902  0.0299831    0.85952    4.22761       1.33761M/s         32         10             32        10k            1            1         5.64M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:8           7.45 ms         59.3 ms          576  0.0596019  0.0599793    0.85952     4.3185       1.34192M/s         32         10             32        10k            1            1         5.76M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:16          7.41 ms          117 ms          592  0.0607554   0.120359    0.85952    4.45325       1.35018M/s         32         10             32        10k            1            1         5.92M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:32          7.36 ms          201 ms          576   0.067084   0.242665    0.85952    4.36792       1.35932M/s         32         10             32        10k            1            1         5.76M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:64          7.28 ms          196 ms          640  0.0964646   0.494021    0.85952     4.9403        1.3743M/s         32         10             32        10k            1            1          6.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:128         6.85 ms          179 ms          640   0.210187   0.999995    0.85952    5.00005       1.45947M/s         32         10             32        10k            1            1          6.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:256         6.69 ms          168 ms         1024     0.5987    2.09405    0.85952    8.37688       1.49451M/s         32         10             32        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:512         6.19 ms          105 ms          512    3.15013    5.02772    0.85952    5.02654       1.61673M/s         32         10             32        10k            1            1         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/7/process_time/real_time/threads:1024        6.09 ms          109 ms         1024    6.07929     10.054    0.85952    10.0565        1.6432M/s         32         10             32        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:1           9.98 ms         9.98 ms          420    9.9673m   9.97944m    0.86074    4.19136       1002.06k/s         32         10             64        10k            1            1          4.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:2           8.59 ms         17.2 ms          490  0.0171592  0.0171897    0.86074    4.21143       1.16462M/s         32         10             64        10k            1            1          4.9M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:4           8.57 ms         34.2 ms          496   0.034272  0.0343877    0.86074    4.26413       1.16666M/s         32         10             64        10k            1            1         4.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:8           8.54 ms         68.0 ms          504  0.0683426  0.0688308    0.86074    4.33631       1.17032M/s         32         10             64        10k            1            1         5.04M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:16          8.08 ms          122 ms          512   0.069489   0.138153    0.86074    4.42081       1.23816M/s         32         10             64        10k            1            1         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:32          8.10 ms          220 ms          512  0.0766018   0.276059    0.86074    4.41675         1.234M/s         32         10             64        10k            1            1         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:64          8.02 ms          217 ms          640   0.103561   0.553866    0.86074    5.53882       1.24647M/s         32         10             64        10k            1            1          6.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:128         7.80 ms          205 ms          640   0.228515     1.1366    0.86074    5.68335       1.28231M/s         32         10             64        10k            1            1          6.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:256         6.31 ms          173 ms          768   0.700406    2.23698    0.86074     6.7114        1.5852M/s         32         10             64        10k            1            1         7.68M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:512         7.31 ms          173 ms         1024    1.82101    4.88175    0.86074    9.76496        1.3681M/s         32         10             64        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/8/process_time/real_time/threads:1024        7.06 ms          130 ms         1024    6.70106    11.1416    0.86074    11.1378       1.41653M/s         32         10             64        10k            1            1        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:1           11.3 ms         11.3 ms          371  0.0112863  0.0112984    0.86233    4.19169       885.085k/s        192         10             12        10k            1            4         3.71M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:2           10.6 ms         21.3 ms          394  0.0212824  0.0213243    0.86233     4.2009       939.104k/s        192         10             12        10k            1            4         3.94M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:4           10.6 ms         42.4 ms          404  0.0424963  0.0426695   0.862405    4.30958       940.934k/s        192         10             12        10k            1            4         4.04M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:8           10.6 ms         84.2 ms          416   0.084674  0.0854072   0.862399    4.44098       944.633k/s        192         10             12        10k            1            4         4.16M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:16          9.78 ms          149 ms          416  0.0844253   0.171457   0.862386    4.45782       1022.16k/s        192         10             12        10k            1            4         4.16M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:32          9.87 ms          262 ms          320   0.101333   0.348161   0.862402     3.4819       1013.54k/s        192         10             12        10k            1            4          3.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:64          9.85 ms          269 ms          512   0.134169   0.686461   0.862388     5.4914       1014.98k/s        192         10             12        10k            1            4         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:128         9.39 ms          251 ms          640   0.276207    1.38894   0.862392    6.94418       1064.68k/s        192         10             12        10k            1            4          6.4M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:256         8.51 ms          226 ms          768   0.664331    2.74991   0.862395    8.25052       1.17532M/s        192         10             12        10k            1            4         7.68M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:512         8.26 ms          163 ms          512    4.09056    6.62565   0.862392    6.62668        1.2111M/s        192         10             12        10k            1            4         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/9/process_time/real_time/threads:1024        8.25 ms          165 ms         1024    7.79676    13.2496   0.862393    13.2492        1.2124M/s        192         10             12        10k            1            4        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:1          11.6 ms         11.6 ms          362  0.0115814  0.0115936    0.86233    4.19689       862.544k/s        256         10             12        10k            1            4         3.62M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:2          10.9 ms         21.8 ms          384   0.021822  0.0218655    0.86231    4.19811       915.875k/s        256         10             12        10k            1            4         3.84M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:4          10.9 ms         43.5 ms          396   0.043599  0.0437801   0.862337    4.33426       917.138k/s        256         10             12        10k            1            4         3.96M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:8          10.8 ms         85.7 ms          400  0.0863382  0.0874482   0.862325    4.37236       926.423k/s        256         10             12        10k            1            4            4M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:16         10.6 ms          167 ms          400  0.0883175   0.174934   0.862319    4.37312       939.601k/s        256         10             12        10k            1            4            4M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:32         10.5 ms          283 ms          320   0.104093   0.356726   0.862339    3.56713       949.441k/s        256         10             12        10k            1            4          3.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:64         10.2 ms          280 ms          576   0.129415   0.703408   0.862337    6.33077        982.85k/s        256         10             12        10k            1            4         5.76M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:128        8.80 ms          233 ms          512   0.343449    1.40981   0.862322    5.63972       1.13646M/s        256         10             12        10k            1            4         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:256        7.12 ms          192 ms          512    1.15827    2.86344   0.862328    5.72749       1.40387M/s        256         10             12        10k            1            4         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:512        8.79 ms          172 ms          512    4.17641    6.97207   0.862327      6.973       1.13779M/s        256         10             12        10k            1            4         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/10/process_time/real_time/threads:1024       6.62 ms          152 ms         1024    6.57826    12.2257   0.862327    12.2229        1.5111M/s        256         10             12        10k            1            4        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:1          36.9 ms         36.9 ms          114  0.0368472  0.0368597    0.86287    4.20201       271.299k/s        256         10             10        10k            1           16         1.14M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:2          36.2 ms         72.1 ms          118  0.0722891  0.0726126    0.86286    4.28421       276.606k/s        256         10             10        10k            1           16         1.18M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:4          35.9 ms          143 ms          120   0.143505   0.145338    0.86302    4.36023       278.704k/s        256         10             10        10k            1           16          1.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:8          35.3 ms          278 ms          120   0.282388   0.290878   0.862964    4.36313        283.28k/s        256         10             10        10k            1           16          1.2M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:16         35.0 ms          550 ms          208   0.290946   0.580669   0.862944    7.54849       285.853k/s        256         10             10        10k            1           16         2.08M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:32         33.5 ms          904 ms          192   0.349906    1.17199   0.862968    7.03196       298.529k/s        256         10             10        10k            1           16         1.92M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:64         29.9 ms          814 ms          192   0.622259    2.33434   0.862967    7.00307       334.476k/s        256         10             10        10k            1           16         1.92M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:128        26.4 ms          724 ms          256    1.50934    4.68885    0.86296    9.37717       379.142k/s        256         10             10        10k            1           16         2.56M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:256        18.6 ms          484 ms          256    4.73442    9.36847   0.862968    9.36776        539.03k/s        256         10             10        10k            1           16         2.56M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:512        19.2 ms          511 ms          512    9.78025    19.0511   0.862962    19.0503       520.948k/s        256         10             10        10k            1           16         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/11/process_time/real_time/threads:1024       19.2 ms          516 ms         1024    18.9658    37.4673   0.862964     37.465        519.95k/s        256         10             10        10k            1           16        10.24M dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:1           117 ms          117 ms           36   0.116497    0.11651    0.86331    4.19438       85.8293k/s        512         10             32        10k            1           16          360k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:2           113 ms          225 ms           38   0.226888   0.229925   0.863095    4.36864       88.1424k/s        512         10             32        10k            1           16          380k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:4           111 ms          435 ms           40   0.442771   0.460008   0.863245    4.60008       90.3363k/s        512         10             32        10k            1           16          400k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:8           109 ms          849 ms           64   0.869759   0.920014   0.863194    7.36009       91.9771k/s        512         10             32        10k            1           16          640k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:16          102 ms         1543 ms           64   0.927517    1.84272   0.863165    7.37078       98.3341k/s        512         10             32        10k            1           16          640k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:32         87.5 ms         2213 ms           64    1.42289    3.70456    0.86317    7.40899       114.339k/s        512         10             32        10k            1           16          640k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:64         60.8 ms         1527 ms           64    3.88789     7.5046   0.863181    7.50444       164.586k/s        512         10             32        10k            1           16          640k dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:128        58.8 ms         1594 ms          128    7.52457    14.8276   0.863174    14.8279       170.075k/s        512         10             32        10k            1           16         1.28M dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:256        59.3 ms         1608 ms          256    15.1861    29.8326   0.863175    29.8238       168.527k/s        512         10             32        10k            1           16         2.56M dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:512        58.7 ms         1619 ms          512    29.9228    59.2306   0.863167     59.232       170.282k/s        512         10             32        10k            1           16         5.12M dataset_memory_type="device"
cuvs_cagra_q_iterative/12/process_time/real_time/threads:1024       65.8 ms         1772 ms         1024    66.6036    118.757   0.863165    118.754       151.918k/s        512         10             32        10k            1           16        10.24M dataset_memory_type="device"

review-notebook-app · 2026-03-24T08:39:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

robertmaynard

Work needs to be rebased on release/26.04 and remove pulling in changes from main as of 26.06

…ze by moving state checks and updates inside the mutex with acquire/release ordering on the lock-free fast path.

achirkin

Thanks for the update and especially for the code comments! The atomic+mutex logic looks good to me.
Also regarding the benchmarks - how the atomic+mutex variant looks against mutex-only variant now?

irina-resh-nvda · 2026-03-25T13:06:30Z

Thanks for the update and especially for the code comments! The atomic+mutex logic looks good to me. Also regarding the benchmarks - how the atomic+mutex variant looks against mutex-only variant now?

@achirkin
the mutex+atomics+acquire/release is on average 0.07% faster (essentially no change). But my benchmarks don't really measure the change, since there are not that many divergent kernel signatures that get swapped

achirkin · 2026-03-25T15:25:18Z

/merge

Rebased successfully

…AGRA search switches kernel variants (rapidsai#1851) Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown. ## Error in question ```c++ $ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json [I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin' [I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin' 2026-02-25T12:28:52+00:00 Running CUVS_CAGRA_ANN_BENCH Run on (224 X 800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x112) L1 Instruction 32 KiB (x112) L2 Unified 2048 KiB (x112) L3 Unified 307200 KiB (x2) Load Average: 0.70, 0.44, 0.28 dataset: laion_1M dim: 768 distance: euclidean ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/0/process_time/real_time 5.70 ms 5.70 ms 121 5.68808m 5.69994m 0.96424 0.689692 1.75441M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/1/process_time/real_time 5.70 ms 5.70 ms 121 5.6863m 5.69879m 0.96424 0.689553 1.75477M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/2/process_time/real_time 4.92 ms 4.92 ms 140 4.90351m 4.91567m 0.96046 0.688193 2.03432M/s 128 10 12 10k 1 1 1.4M dataset_memory_type="device" cuvs_cagra_iterative/3/process_time/real_time 5.99 ms 5.99 ms 115 5.97476m 5.98617m 0.97519 0.688409 1.67052M/s 128 10 16 10k 1 1 1.15M dataset_memory_type="device" cuvs_cagra_iterative/4/process_time/real_time 6.97 ms 6.97 ms 99 6.95873m 6.9703m 0.98129 0.690059 1.43466M/s 256 10 16 10k 1 1 990k dataset_memory_type="device" cuvs_cagra_iterative/5/process_time/real_time 10.5 ms 10.5 ms 66 0.010479 0.0104908 0.98548 0.692391 953.222k/s 512 10 10 10k 1 2 660k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/6/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument Obtained 19 stack frames rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...) rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...) rapidsai#4 in libcuvs.so(+0x18fd0f1) rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...) rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6 ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/7/process_time/real_time 10.5 ms 10.5 ms 66 0.0105088 0.0105202 0.98663 0.694332 950.555k/s 32 10 32 10k 1 1 660k dataset_memory_type="device" cuvs_cagra_iterative/8/process_time/real_time 12.8 ms 12.8 ms 54 0.012796 0.0128079 0.98807 0.691628 780.768k/s 32 10 64 10k 1 1 540k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/9/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/11/process_time/real_time 46.1 ms 46.2 ms 15 0.0461323 0.0461439 0.99131 0.692158 216.714k/s 256 10 10 10k 1 16 150k dataset_memory_type="device" cuvs_cagra_iterative/12/process_time/real_time 142 ms 142 ms 5 0.141713 0.141725 0.99198 0.708627 70.5591k/s 512 10 32 10k 1 16 50k dataset_memory_type="device" ``` ## Config ``` { "dataset": { "name": "laion_1M", "base_file": "laion_1M/base.1M.fbin", "subset_size": 1000000, "query_file": "laion_1M/queries.fbin", "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin", "distance": "euclidean" }, "search_basic_param": { "batch_size": 10000, "k": 10 }, "index": [ { "name": "cuvs_cagra_iterative", "algo": "cuvs_cagra", "build_param": { "graph_degree": 64, "intermediate_graph_degree": 128, "search_width": 1 }, "file": "laion_1M/cagra/q_coarse_iterative.ibin", "search_params": [ {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1}, {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1} ] } ] } ``` Authors: - https://github.com/irina-resh-nvda Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#1851

github-project-automation bot added this to Unstructured Data Processing Feb 25, 2026

irina-resh-nvda self-assigned this Feb 25, 2026

irina-resh-nvda added bug Something isn't working non-breaking Introduces a non-breaking change labels Feb 25, 2026

irina-resh-nvda marked this pull request as ready for review February 25, 2026 15:37

irina-resh-nvda requested a review from a team as a code owner February 25, 2026 15:37

irina-resh-nvda changed the title ~~[REVIEW] Fix cudaFuncSetAttribute not being called when CAGRA search switches kernel variants~~ [REVIEW] cuVS bench: Fix cudaFuncSetAttribute not being called when CAGRA search switches kernel variants Feb 25, 2026

divyegala mentioned this pull request Feb 25, 2026

JIT LTO Cagra Search #1807

Open

8 tasks

achirkin reviewed Feb 25, 2026

View reviewed changes

mythrocks reviewed Feb 25, 2026

View reviewed changes

divyegala previously requested changes Feb 26, 2026

View reviewed changes

achirkin requested changes Mar 3, 2026

View reviewed changes

mohanprasand-nuvai mentioned this pull request Mar 23, 2026

feat: Rust API safety, serialization, filtering, and upstream bug fixes Nuvai/cuvs#1

Merged

7 tasks

irina-resh-nvda requested a review from achirkin March 23, 2026 21:13

achirkin changed the base branch from main to release/26.04 March 24, 2026 08:17

achirkin requested a review from a team as a code owner March 24, 2026 08:17

achirkin requested review from a team as code owners March 24, 2026 08:17

achirkin requested a review from KyleFromNVIDIA March 24, 2026 08:17

robertmaynard previously requested changes Mar 24, 2026

View reviewed changes

Fix double-checked locking races in safely_launch_kernel_with_smem_si…

962c3b4

…ze by moving state checks and updates inside the mutex with acquire/release ordering on the lock-free fast path.

irina-resh-nvda force-pushed the cuvsbench_smem_size_bug branch from c8de24a to 962c3b4 Compare March 25, 2026 11:35

irina-resh-nvda requested a review from robertmaynard March 25, 2026 11:36

achirkin requested review from robertmaynard and removed request for a team, KyleFromNVIDIA and robertmaynard March 25, 2026 12:44

achirkin approved these changes Mar 25, 2026

View reviewed changes

rapids-bot bot merged commit dbd29a6 into rapidsai:release/26.04 Mar 25, 2026
80 checks passed

github-project-automation bot moved this to Done in Unstructured Data Processing Mar 25, 2026

		auto last_kernel = current_kernel;
		auto last_smem_size = current_smem_size;

Conversation

irina-resh-nvda commented Feb 25, 2026

Error in question

Config

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irina-resh-nvda Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyegala left a comment

Choose a reason for hiding this comment

Uh oh!

mythrocks commented Feb 26, 2026

Uh oh!

achirkin commented Mar 2, 2026

Uh oh!

irina-resh-nvda commented Mar 2, 2026

Uh oh!

irina-resh-nvda commented Mar 2, 2026

Uh oh!

achirkin commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

irina-resh-nvda commented Mar 2, 2026

Uh oh!

irina-resh-nvda commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divyegala commented Mar 2, 2026

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irina-resh-nvda commented Mar 23, 2026

Uh oh!

review-notebook-app bot commented Mar 24, 2026

Uh oh!

robertmaynard left a comment

Choose a reason for hiding this comment

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

irina-resh-nvda commented Mar 25, 2026

Uh oh!

achirkin commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

irina-resh-nvda Mar 2, 2026 •

edited

Loading

achirkin commented Mar 2, 2026 •

edited

Loading

irina-resh-nvda commented Mar 2, 2026 •

edited

Loading