Skip to content

Remove usages of rapids-env-update#3

Merged
ajschmidt8 merged 1 commit intorapidsai:branch-24.02from
KyleFromNVIDIA:remove-rapids-env-update
Jan 12, 2024
Merged

Remove usages of rapids-env-update#3
ajschmidt8 merged 1 commit intorapidsai:branch-24.02from
KyleFromNVIDIA:remove-rapids-env-update

Conversation

@KyleFromNVIDIA
Copy link
Copy Markdown
Member

Reference: https://github.com/rapidsai/ops/issues/2766

Replace rapids-env-update with rapids-configure-conda-channels,
rapids-configure-sccache, and rapids-date-string.

@KyleFromNVIDIA KyleFromNVIDIA requested a review from a team as a code owner January 12, 2024 15:16
@github-actions github-actions bot added the ci label Jan 12, 2024
Reference: rapidsai/ops#2766

Replace rapids-env-update with rapids-configure-conda-channels,
rapids-configure-sccache, and rapids-date-string.
@KyleFromNVIDIA KyleFromNVIDIA force-pushed the remove-rapids-env-update branch from 48df5c8 to 2eb48bc Compare January 12, 2024 15:30
@ajschmidt8 ajschmidt8 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 12, 2024
@ajschmidt8
Copy link
Copy Markdown
Member

from these failures, it looks like the conda recipes for this project might have some issues.

@cjnolet, any thoughts here? I know this is a relatively new repo so I'm not sure about its current status.

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jan 12, 2024

@ajschmidt8 @KyleFromNVIDIA this is a new library and we are in the process of preparing for release in the spring. CI and the packages aren't quite ready yet (we are essentially porting libraft and pylibraft over here from raft)

@ajschmidt8
Copy link
Copy Markdown
Member

@ajschmidt8 @KyleFromNVIDIA this is a new library and we are in the process of preparing for release in the spring. CI and the packages aren't quite ready yet (we are essentially porting libraft and pylibraft over here from raft)

Ok, I will just admin merge this PR then since the changes are benign

@ajschmidt8 ajschmidt8 merged commit d8fd59e into rapidsai:branch-24.02 Jan 12, 2024
copy-pr-bot bot pushed a commit that referenced this pull request Sep 4, 2024
rapids-bot bot pushed a commit that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
#4 in libcuvs.so(+0x18fd0f1)
#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
#6-#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: #1851
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (rapidsai#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
rapidsai#4 in libcuvs.so(+0x18fd0f1)
rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#1851
jrbourbeau pushed a commit to jrbourbeau/cuvs that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (rapidsai#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
rapidsai#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
rapidsai#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
rapidsai#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
rapidsai#4 in libcuvs.so(+0x18fd0f1)
rapidsai#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
rapidsai#6-rapidsai#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#1851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants