Fix fp16 overflow by hhy3 · Pull Request #1084 · rapidsai/cuvs

hhy3 · 2025-07-04T05:51:08Z

This PR fixes issue #914 that accumulation using fp16 causes overflow

copy-pr-bot · 2025-07-04T05:51:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

achirkin

Thank you for the contribution. I see the PR changes the accumulation type from fp16 to fp32 to avoid fp overflow. However I'm not sure if this is desirable in general and whether the speed drop is acceptable for the cases when the overflow doesn't happen.
Maybe we'd better just advise the user to switch to fp32 variant of the algorithm?
Please support the PR with the benchmark results (using cuvs ann-bench) before and after the PR for fp16 and fp32 if you decide to proceed with this approach.

achirkin · 2025-07-04T06:35:07Z

 template <>
 struct config<half> {
-  using value_t                    = half;
+  using value_t                    = float;


Please check whether this is used outside the IVF-Flat. Changing the accumulation type like this can have a drastic impact on performance.

I did simple benchmark and it showed no significant differences. I'll use cuvs ann-bench to get a more detailed benchmark results later

@hhy3 any updates here? We're about to begin burndown for 25.08 release. Should we consider this for 25.08 or push to 25.10 (October)?

@cjnolet hi, push it to 25.10, thx

@cjnolet sorry for my late reply. I just benchmark the performance, and it shows some regression on performance, especially when nprobe is small, but significantly improves recall:

IVF-Flat FP16 AccT=float vs AccT=half Benchmark Results

Dataset: cohere-768-angular-fp16 (1M vectors, 768 dims, inner_product)

GPU: NVIDIA A100-PCIE-40GB

nlist=4096, ratio=10, niter=20, k=100

Latency (bs=64)

nprobe half latency (ms) float latency (ms) change half recall float recall

32 4.02 5.04 +25% 0.815 0.849

64 6.22 6.89 +11% 0.867 0.913

128 9.87 10.2 +3% 0.897 0.955

Throughput (bs=1000, threads:1)

nprobe half QPS float QPS change half recall float recall

32 24.7k 22.8k -8% 0.816 0.850

64 13.8k 13.2k -4% 0.868 0.913

128 7.87k 7.84k -0.4% 0.898 0.956

Cherry-picked from upstream PR rapidsai#1084. Uses float accumulator for FP16 distance computation to prevent overflow when distance values exceed FP16 range. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

review-notebook-app · 2026-04-11T16:21:08Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

hhy3 requested a review from a team as a code owner July 4, 2025 05:51

github-actions bot added the cpp label Jul 4, 2025

achirkin requested changes Jul 4, 2025

View reviewed changes

cjnolet assigned hhy3 Jul 11, 2025

cjnolet added bug Something isn't working non-breaking Introduces a non-breaking change labels Jul 11, 2025

cjnolet added this to Unstructured Data Processing Jul 11, 2025

github-project-automation bot moved this to Todo in Unstructured Data Processing Jul 11, 2025

cjnolet moved this from Todo to In Progress in Unstructured Data Processing Jul 11, 2025

mohanprasand-nuvai mentioned this pull request Mar 23, 2026

feat: Rust API safety, serialization, filtering, and upstream bug fixes Nuvai/cuvs#1

Merged

7 tasks

Fix fp16 overflow

4115eaf

hhy3 force-pushed the fix_fp16_overflow branch from babf7bd to 4115eaf Compare April 11, 2026 16:20

hhy3 requested review from a team as code owners April 11, 2026 16:20

hhy3 requested a review from msarahan April 11, 2026 16:20

hhy3 changed the base branch from branch-25.08 to main April 11, 2026 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fp16 overflow#1084

Fix fp16 overflow#1084
hhy3 wants to merge 1 commit intorapidsai:mainfrom
hhy3:fix_fp16_overflow

hhy3 commented Jul 4, 2025

Uh oh!

copy-pr-bot bot commented Jul 4, 2025

Uh oh!

achirkin left a comment

Uh oh!

achirkin Jul 4, 2025

Uh oh!

hhy3 Jul 4, 2025

Uh oh!

cjnolet Jul 22, 2025

Uh oh!

hhy3 Jul 23, 2025

Uh oh!

hhy3 Apr 11, 2026

Uh oh!

review-notebook-app bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nprobe	half latency (ms)	float latency (ms)	change	half recall	float recall
32	4.02	5.04	+25%	0.815	0.849
64	6.22	6.89	+11%	0.867	0.913
128	9.87	10.2	+3%	0.897	0.955

nprobe	half QPS	float QPS	change	half recall	float recall
32	24.7k	22.8k	-8%	0.816	0.850
64	13.8k	13.2k	-4%	0.868	0.913
128	7.87k	7.84k	-0.4%	0.898	0.956

Conversation

hhy3 commented Jul 4, 2025

Uh oh!

copy-pr-bot bot commented Jul 4, 2025

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

achirkin Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

hhy3 Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

cjnolet Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

hhy3 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

hhy3 Apr 11, 2026

Choose a reason for hiding this comment

IVF-Flat FP16 AccT=float vs AccT=half Benchmark Results

Latency (bs=64)

Throughput (bs=1000, threads:1)

Uh oh!

review-notebook-app bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants