Update actions/labeler to v4#1
Merged
raydouglass merged 1 commit intorapidsai:branch-24.02from Dec 4, 2023
Merged
Conversation
ajschmidt8
approved these changes
Dec 4, 2023
Stardust-SJF
pushed a commit
to Stardust-SJF/cuvs_rabitq
that referenced
this pull request
Mar 2, 2026
* Download Eigen automatically by rapids-cmake * Disable FAISS and DISKANN benchmarks * add config files and update readme * Update Readme and openai_1M config * Update python bench command line * update README * update README --------- Co-authored-by: James Xia <jamxia@nvidia.com>
rapids-bot bot
pushed a commit
that referenced
this pull request
Mar 25, 2026
…AGRA search switches kernel variants (#1851) Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown. ## Error in question ```c++ $ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json [I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin' [I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin' 2026-02-25T12:28:52+00:00 Running CUVS_CAGRA_ANN_BENCH Run on (224 X 800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x112) L1 Instruction 32 KiB (x112) L2 Unified 2048 KiB (x112) L3 Unified 307200 KiB (x2) Load Average: 0.70, 0.44, 0.28 dataset: laion_1M dim: 768 distance: euclidean ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/0/process_time/real_time 5.70 ms 5.70 ms 121 5.68808m 5.69994m 0.96424 0.689692 1.75441M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/1/process_time/real_time 5.70 ms 5.70 ms 121 5.6863m 5.69879m 0.96424 0.689553 1.75477M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/2/process_time/real_time 4.92 ms 4.92 ms 140 4.90351m 4.91567m 0.96046 0.688193 2.03432M/s 128 10 12 10k 1 1 1.4M dataset_memory_type="device" cuvs_cagra_iterative/3/process_time/real_time 5.99 ms 5.99 ms 115 5.97476m 5.98617m 0.97519 0.688409 1.67052M/s 128 10 16 10k 1 1 1.15M dataset_memory_type="device" cuvs_cagra_iterative/4/process_time/real_time 6.97 ms 6.97 ms 99 6.95873m 6.9703m 0.98129 0.690059 1.43466M/s 256 10 16 10k 1 1 990k dataset_memory_type="device" cuvs_cagra_iterative/5/process_time/real_time 10.5 ms 10.5 ms 66 0.010479 0.0104908 0.98548 0.692391 953.222k/s 512 10 10 10k 1 2 660k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/6/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument Obtained 19 stack frames #1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) #2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...) #3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...) #4 in libcuvs.so(+0x18fd0f1) #5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...) #6-#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6 ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/7/process_time/real_time 10.5 ms 10.5 ms 66 0.0105088 0.0105202 0.98663 0.694332 950.555k/s 32 10 32 10k 1 1 660k dataset_memory_type="device" cuvs_cagra_iterative/8/process_time/real_time 12.8 ms 12.8 ms 54 0.012796 0.0128079 0.98807 0.691628 780.768k/s 32 10 64 10k 1 1 540k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/9/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/11/process_time/real_time 46.1 ms 46.2 ms 15 0.0461323 0.0461439 0.99131 0.692158 216.714k/s 256 10 10 10k 1 16 150k dataset_memory_type="device" cuvs_cagra_iterative/12/process_time/real_time 142 ms 142 ms 5 0.141713 0.141725 0.99198 0.708627 70.5591k/s 512 10 32 10k 1 16 50k dataset_memory_type="device" ``` ## Config ``` { "dataset": { "name": "laion_1M", "base_file": "laion_1M/base.1M.fbin", "subset_size": 1000000, "query_file": "laion_1M/queries.fbin", "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin", "distance": "euclidean" }, "search_basic_param": { "batch_size": 10000, "k": 10 }, "index": [ { "name": "cuvs_cagra_iterative", "algo": "cuvs_cagra", "build_param": { "graph_degree": 64, "intermediate_graph_degree": 128, "search_width": 1 }, "file": "laion_1M/cagra/q_coarse_iterative.ibin", "search_params": [ {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1}, {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1} ] } ] } ``` Authors: - https://github.com/irina-resh-nvda Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: #1851
jrbourbeau
pushed a commit
to jrbourbeau/cuvs
that referenced
this pull request
Mar 25, 2026
…AGRA search switches kernel variants (rapidsai#1851) Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown. ## Error in question ```c++ $ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json [I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin' [I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin' 2026-02-25T12:28:52+00:00 Running CUVS_CAGRA_ANN_BENCH Run on (224 X 800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x112) L1 Instruction 32 KiB (x112) L2 Unified 2048 KiB (x112) L3 Unified 307200 KiB (x2) Load Average: 0.70, 0.44, 0.28 dataset: laion_1M dim: 768 distance: euclidean ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/0/process_time/real_time 5.70 ms 5.70 ms 121 5.68808m 5.69994m 0.96424 0.689692 1.75441M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/1/process_time/real_time 5.70 ms 5.70 ms 121 5.6863m 5.69879m 0.96424 0.689553 1.75477M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/2/process_time/real_time 4.92 ms 4.92 ms 140 4.90351m 4.91567m 0.96046 0.688193 2.03432M/s 128 10 12 10k 1 1 1.4M dataset_memory_type="device" cuvs_cagra_iterative/3/process_time/real_time 5.99 ms 5.99 ms 115 5.97476m 5.98617m 0.97519 0.688409 1.67052M/s 128 10 16 10k 1 1 1.15M dataset_memory_type="device" cuvs_cagra_iterative/4/process_time/real_time 6.97 ms 6.97 ms 99 6.95873m 6.9703m 0.98129 0.690059 1.43466M/s 256 10 16 10k 1 1 990k dataset_memory_type="device" cuvs_cagra_iterative/5/process_time/real_time 10.5 ms 10.5 ms 66 0.010479 0.0104908 0.98548 0.692391 953.222k/s 512 10 10 10k 1 2 660k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/6/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument Obtained 19 stack frames rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...) rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...) rapidsai#4 in libcuvs.so(+0x18fd0f1) rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...) rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6 ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/7/process_time/real_time 10.5 ms 10.5 ms 66 0.0105088 0.0105202 0.98663 0.694332 950.555k/s 32 10 32 10k 1 1 660k dataset_memory_type="device" cuvs_cagra_iterative/8/process_time/real_time 12.8 ms 12.8 ms 54 0.012796 0.0128079 0.98807 0.691628 780.768k/s 32 10 64 10k 1 1 540k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/9/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/11/process_time/real_time 46.1 ms 46.2 ms 15 0.0461323 0.0461439 0.99131 0.692158 216.714k/s 256 10 10 10k 1 16 150k dataset_memory_type="device" cuvs_cagra_iterative/12/process_time/real_time 142 ms 142 ms 5 0.141713 0.141725 0.99198 0.708627 70.5591k/s 512 10 32 10k 1 16 50k dataset_memory_type="device" ``` ## Config ``` { "dataset": { "name": "laion_1M", "base_file": "laion_1M/base.1M.fbin", "subset_size": 1000000, "query_file": "laion_1M/queries.fbin", "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin", "distance": "euclidean" }, "search_basic_param": { "batch_size": 10000, "k": 10 }, "index": [ { "name": "cuvs_cagra_iterative", "algo": "cuvs_cagra", "build_param": { "graph_degree": 64, "intermediate_graph_degree": 128, "search_width": 1 }, "file": "laion_1M/cagra/q_coarse_iterative.ibin", "search_params": [ {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1}, {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1} ] } ] } ``` Authors: - https://github.com/irina-resh-nvda Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#1851
jrbourbeau
pushed a commit
to jrbourbeau/cuvs
that referenced
this pull request
Mar 25, 2026
…AGRA search switches kernel variants (rapidsai#1851) Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown. ## Error in question ```c++ $ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json [I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin' [I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin' 2026-02-25T12:28:52+00:00 Running CUVS_CAGRA_ANN_BENCH Run on (224 X 800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x112) L1 Instruction 32 KiB (x112) L2 Unified 2048 KiB (x112) L3 Unified 307200 KiB (x2) Load Average: 0.70, 0.44, 0.28 dataset: laion_1M dim: 768 distance: euclidean ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/0/process_time/real_time 5.70 ms 5.70 ms 121 5.68808m 5.69994m 0.96424 0.689692 1.75441M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/1/process_time/real_time 5.70 ms 5.70 ms 121 5.6863m 5.69879m 0.96424 0.689553 1.75477M/s 64 10 8 10k 1 2 1.21M dataset_memory_type="device" cuvs_cagra_iterative/2/process_time/real_time 4.92 ms 4.92 ms 140 4.90351m 4.91567m 0.96046 0.688193 2.03432M/s 128 10 12 10k 1 1 1.4M dataset_memory_type="device" cuvs_cagra_iterative/3/process_time/real_time 5.99 ms 5.99 ms 115 5.97476m 5.98617m 0.97519 0.688409 1.67052M/s 128 10 16 10k 1 1 1.15M dataset_memory_type="device" cuvs_cagra_iterative/4/process_time/real_time 6.97 ms 6.97 ms 99 6.95873m 6.9703m 0.98129 0.690059 1.43466M/s 256 10 16 10k 1 1 990k dataset_memory_type="device" cuvs_cagra_iterative/5/process_time/real_time 10.5 ms 10.5 ms 66 0.010479 0.0104908 0.98548 0.692391 953.222k/s 512 10 10 10k 1 2 660k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/6/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument Obtained 19 stack frames rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...) rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...) rapidsai#4 in libcuvs.so(+0x18fd0f1) rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...) rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6 ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/7/process_time/real_time 10.5 ms 10.5 ms 66 0.0105088 0.0105202 0.98663 0.694332 950.555k/s 32 10 32 10k 1 1 660k dataset_memory_type="device" cuvs_cagra_iterative/8/process_time/real_time 12.8 ms 12.8 ms 54 0.012796 0.0128079 0.98807 0.691628 780.768k/s 32 10 64 10k 1 1 540k dataset_memory_type="device" ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- cuvs_cagra_iterative/9/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument [same stack trace as above] ' ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k max_iterations n_queries refine_ratio search_width total_queries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuvs_cagra_iterative/11/process_time/real_time 46.1 ms 46.2 ms 15 0.0461323 0.0461439 0.99131 0.692158 216.714k/s 256 10 10 10k 1 16 150k dataset_memory_type="device" cuvs_cagra_iterative/12/process_time/real_time 142 ms 142 ms 5 0.141713 0.141725 0.99198 0.708627 70.5591k/s 512 10 32 10k 1 16 50k dataset_memory_type="device" ``` ## Config ``` { "dataset": { "name": "laion_1M", "base_file": "laion_1M/base.1M.fbin", "subset_size": 1000000, "query_file": "laion_1M/queries.fbin", "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin", "distance": "euclidean" }, "search_basic_param": { "batch_size": 10000, "k": 10 }, "index": [ { "name": "cuvs_cagra_iterative", "algo": "cuvs_cagra", "build_param": { "graph_degree": 64, "intermediate_graph_degree": 128, "search_width": 1 }, "file": "laion_1M/cagra/q_coarse_iterative.ibin", "search_params": [ {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1}, {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1}, {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1}, {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1}, {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1}, {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1} ] } ] } ``` Authors: - https://github.com/irina-resh-nvda Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#1851
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RAPIDS repos are using the
mainbranch of https://github.com/actions/labeler which recently introduced breaking changes.This PR pins to the latest v4 release of the labeler action until we can evaluate the changes required for v5.