Skip to content

Update actions/labeler to v4#1

Merged
raydouglass merged 1 commit intorapidsai:branch-24.02from
raydouglass:labeler-v4-fix
Dec 4, 2023
Merged

Update actions/labeler to v4#1
raydouglass merged 1 commit intorapidsai:branch-24.02from
raydouglass:labeler-v4-fix

Conversation

@raydouglass
Copy link
Copy Markdown
Contributor

RAPIDS repos are using the main branch of https://github.com/actions/labeler which recently introduced breaking changes.

This PR pins to the latest v4 release of the labeler action until we can evaluate the changes required for v5.

@raydouglass raydouglass requested a review from a team as a code owner December 4, 2023 18:54
@raydouglass raydouglass added non-breaking Introduces a non-breaking change bug Something isn't working labels Dec 4, 2023
@raydouglass raydouglass merged commit 7c47b16 into rapidsai:branch-24.02 Dec 4, 2023
@raydouglass raydouglass deleted the labeler-v4-fix branch December 4, 2023 19:09
copy-pr-bot bot pushed a commit that referenced this pull request Sep 4, 2024
Stardust-SJF pushed a commit to Stardust-SJF/cuvs_rabitq that referenced this pull request Mar 2, 2026
* Download Eigen automatically by rapids-cmake

* Disable FAISS and DISKANN benchmarks

* add config files and update readme

* Update Readme and openai_1M config

* Update python bench command line

* update README

* update README

---------

Co-authored-by: James Xia <jamxia@nvidia.com>
rapids-bot bot pushed a commit that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
#4 in libcuvs.so(+0x18fd0f1)
#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
#6-#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: #1851
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (rapidsai#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
rapidsai#4 in libcuvs.so(+0x18fd0f1)
rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#1851
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (rapidsai#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
rapidsai#4 in libcuvs.so(+0x18fd0f1)
rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#1851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants