Skip to content

ggml : prevent builds with -ffinite-math-only#7726

Merged
mofosyne merged 3 commits intomasterfrom
gg/error-on-finite-math-only
Jun 4, 2024
Merged

ggml : prevent builds with -ffinite-math-only#7726
mofosyne merged 3 commits intomasterfrom
gg/error-on-finite-math-only

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Jun 4, 2024

This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu #7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref #7154 (comment)

@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 4, 2024
Copy link
Copy Markdown
Collaborator

@mofosyne mofosyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that gg's changes matches the intent of #7154

@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. labels Jun 4, 2024
@mofosyne mofosyne merged commit 6d16169 into master Jun 4, 2024
@github-actions github-actions Bot added the build Compilation issues label Jun 4, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 552 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8540.34ms p(95)=20501.86ms fails=, finish reason: stop=495 truncated=57
  • Prompt processing (pp): avg=99.34tk/s p(95)=453.53tk/s
  • Token generation (tg): avg=46.88tk/s p(95)=45.53tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/error-on-finite-math-only commit=771cc3a6b4425b02bcb88f07aba74d685cddb018

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 531.88, 531.88, 531.88, 531.88, 531.88, 518.27, 518.27, 518.27, 518.27, 518.27, 522.93, 522.93, 522.93, 522.93, 522.93, 574.92, 574.92, 574.92, 574.92, 574.92, 647.41, 647.41, 647.41, 647.41, 647.41, 648.6, 648.6, 648.6, 648.6, 648.6, 671.43, 671.43, 671.43, 671.43, 671.43, 690.53, 690.53, 690.53, 690.53, 690.53, 692.81, 692.81, 692.81, 692.81, 692.81, 711.66, 711.66, 711.66, 711.66, 711.66, 736.93, 736.93, 736.93, 736.93, 736.93, 776.88, 776.88, 776.88, 776.88, 776.88, 781.33, 781.33, 781.33, 781.33, 781.33, 811.32, 811.32, 811.32, 811.32, 811.32, 825.76, 825.76, 825.76, 825.76, 825.76, 828.18, 828.18, 828.18, 828.18, 828.18, 824.39, 824.39, 824.39, 824.39, 824.39, 831.22, 831.22, 831.22, 831.22, 831.22, 838.33, 838.33, 838.33, 838.33, 838.33, 845.32, 845.32, 845.32, 845.32, 845.32, 845.12, 845.12, 845.12, 845.12, 845.12, 849.05, 849.05, 849.05, 849.05, 849.05, 863.26, 863.26, 863.26, 863.26, 863.26, 854.68, 854.68, 854.68, 854.68, 854.68, 856.07, 856.07, 856.07, 856.07, 856.07, 865.48, 865.48, 865.48, 865.48, 865.48, 865.75, 865.75, 865.75, 865.75, 865.75, 865.85, 865.85, 865.85, 865.85, 865.85, 869.2, 869.2, 869.2, 869.2, 869.2, 868.87, 868.87, 868.87, 868.87, 868.87, 867.6, 867.6, 867.6, 867.6, 867.6, 869.35, 869.35, 869.35, 869.35, 869.35, 878.27, 878.27, 878.27, 878.27, 878.27, 881.61, 881.61, 881.61, 881.61, 881.61, 885.27, 885.27, 885.27, 885.27, 885.27, 881.76, 881.76, 881.76, 881.76, 881.76, 880.08, 880.08, 880.08, 880.08, 880.08, 883.55, 883.55, 883.55, 883.55, 883.55, 885.13, 885.13, 885.13, 885.13, 885.13, 894.87, 894.87, 894.87, 894.87, 894.87, 897.14, 897.14, 897.14, 897.14, 897.14, 890.78, 890.78, 890.78, 890.78, 890.78, 875.36, 875.36, 875.36, 875.36, 875.36, 874.06, 874.06, 874.06, 874.06, 874.06, 876.83, 876.83, 876.83, 876.83, 876.83, 879.01, 879.01, 879.01, 879.01, 879.01, 877.48, 877.48, 877.48, 877.48, 877.48, 874.25, 874.25, 874.25, 874.25, 874.25, 878.36, 878.36, 878.36, 878.36, 878.36, 877.88, 877.88, 877.88, 877.88, 877.88, 878.73, 878.73, 878.73, 878.73, 878.73, 868.25, 868.25, 868.25, 868.25, 868.25, 867.58, 867.58, 867.58, 867.58, 867.58, 859.32, 859.32, 859.32, 859.32, 859.32, 859.89, 859.89, 859.89, 859.89, 859.89, 860.04, 860.04, 860.04, 860.04, 860.04, 860.85, 860.85, 860.85, 860.85, 860.85, 862.06, 862.06, 862.06, 862.06, 862.06, 862.04, 862.04, 862.04, 862.04, 862.04, 864.67, 864.67, 864.67, 864.67, 864.67, 863.67, 863.67, 863.67, 863.67, 863.67, 863.67]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.68, 38.68, 38.68, 38.68, 38.68, 39.5, 39.5, 39.5, 39.5, 39.5, 30.42, 30.42, 30.42, 30.42, 30.42, 33.48, 33.48, 33.48, 33.48, 33.48, 34.72, 34.72, 34.72, 34.72, 34.72, 34.11, 34.11, 34.11, 34.11, 34.11, 34.19, 34.19, 34.19, 34.19, 34.19, 34.45, 34.45, 34.45, 34.45, 34.45, 34.67, 34.67, 34.67, 34.67, 34.67, 34.21, 34.21, 34.21, 34.21, 34.21, 33.95, 33.95, 33.95, 33.95, 33.95, 33.81, 33.81, 33.81, 33.81, 33.81, 33.35, 33.35, 33.35, 33.35, 33.35, 32.54, 32.54, 32.54, 32.54, 32.54, 31.12, 31.12, 31.12, 31.12, 31.12, 30.56, 30.56, 30.56, 30.56, 30.56, 29.96, 29.96, 29.96, 29.96, 29.96, 30.16, 30.16, 30.16, 30.16, 30.16, 30.05, 30.05, 30.05, 30.05, 30.05, 30.12, 30.12, 30.12, 30.12, 30.12, 30.15, 30.15, 30.15, 30.15, 30.15, 30.48, 30.48, 30.48, 30.48, 30.48, 30.62, 30.62, 30.62, 30.62, 30.62, 30.59, 30.59, 30.59, 30.59, 30.59, 30.96, 30.96, 30.96, 30.96, 30.96, 31.02, 31.02, 31.02, 31.02, 31.02, 30.95, 30.95, 30.95, 30.95, 30.95, 31.21, 31.21, 31.21, 31.21, 31.21, 31.46, 31.46, 31.46, 31.46, 31.46, 31.57, 31.57, 31.57, 31.57, 31.57, 31.78, 31.78, 31.78, 31.78, 31.78, 31.86, 31.86, 31.86, 31.86, 31.86, 31.65, 31.65, 31.65, 31.65, 31.65, 31.61, 31.61, 31.61, 31.61, 31.61, 31.34, 31.34, 31.34, 31.34, 31.34, 31.27, 31.27, 31.27, 31.27, 31.27, 31.33, 31.33, 31.33, 31.33, 31.33, 31.4, 31.4, 31.4, 31.4, 31.4, 31.54, 31.54, 31.54, 31.54, 31.54, 31.66, 31.66, 31.66, 31.66, 31.66, 31.46, 31.46, 31.46, 31.46, 31.46, 31.24, 31.24, 31.24, 31.24, 31.24, 30.92, 30.92, 30.92, 30.92, 30.92, 29.74, 29.74, 29.74, 29.74, 29.74, 29.63, 29.63, 29.63, 29.63, 29.63, 29.59, 29.59, 29.59, 29.59, 29.59, 29.41, 29.41, 29.41, 29.41, 29.41, 29.43, 29.43, 29.43, 29.43, 29.43, 29.51, 29.51, 29.51, 29.51, 29.51, 29.5, 29.5, 29.5, 29.5, 29.5, 29.43, 29.43, 29.43, 29.43, 29.43, 29.41, 29.41, 29.41, 29.41, 29.41, 29.34, 29.34, 29.34, 29.34, 29.34, 29.36, 29.36, 29.36, 29.36, 29.36, 29.35, 29.35, 29.35, 29.35, 29.35, 29.51, 29.51, 29.51, 29.51, 29.51, 29.66, 29.66, 29.66, 29.66, 29.66, 29.76, 29.76, 29.76, 29.76, 29.76, 29.82, 29.82, 29.82, 29.82, 29.82, 29.87, 29.87, 29.87, 29.87, 29.87, 29.89, 29.89, 29.89, 29.89, 29.89, 30.06]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.37, 0.37, 0.37, 0.37, 0.37, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.38, 0.38, 0.38, 0.38, 0.38, 0.39, 0.39, 0.39, 0.39, 0.39, 0.35, 0.35, 0.35, 0.35, 0.35, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.21, 0.21, 0.21, 0.21, 0.21, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55, 0.55, 0.55, 0.55, 0.5, 0.5, 0.5, 0.5, 0.5, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.31, 0.31, 0.31, 0.31, 0.31, 0.28, 0.28, 0.28, 0.28, 0.28, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.18, 0.18, 0.18, 0.18, 0.18, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.25]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0]
                    
Loading

@JohannesGaessler
Copy link
Copy Markdown
Contributor

JohannesGaessler commented Jun 4, 2024

The issue that I reported in #7154 (comment) has not been fixed by this PR. For the minimal reproduction I was not using LLAMA_FAST and therefore not -ffinite-math-only anyways.

@ggerganov
Copy link
Copy Markdown
Member Author

Yes, I didn't expect it to be fixed. We need a non-unified KV cache implementation to have deterministic results for n_slots > 1

@jart
Copy link
Copy Markdown
Contributor

jart commented Jun 4, 2024

Could it be a memory barrier issue? x86 guarantees about acquire / release semantics change when you switch to xmm/ymm ops.

@mofosyne mofosyne deleted the gg/error-on-finite-math-only branch June 5, 2024 01:38
@ggerganov
Copy link
Copy Markdown
Member Author

The problem is not a data race or a race condition. Rather the same set of tokens can produce slightly different floating point results, depending on the position they get assigned in the unified KV cache due to the reduce operations in the attention

@yunginnanet
Copy link
Copy Markdown

btw, this breaks LLAMA_FAST=1

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu ggml-org#7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some 
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref ggml-org#7154 (comment)
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu ggml-org#7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some 
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref ggml-org#7154 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Compilation issues ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants