Skip to content

cuda : clear error after buffer allocation failure#7376

Merged
slaren merged 2 commits intomasterfrom
sl/cudamalloc-clear-error
May 19, 2024
Merged

cuda : clear error after buffer allocation failure#7376
slaren merged 2 commits intomasterfrom
sl/cudamalloc-clear-error

Conversation

@slaren
Copy link
Copy Markdown
Member

@slaren slaren commented May 18, 2024

Buffer allocation should be a recoverable error, but the CUDA error was not cleared, which may cause the next operation to fail.

Copy link
Copy Markdown
Contributor

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this error either occurs when ooming or when there is no CUDA device available.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 537 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8696.17ms p(95)=21079.86ms fails=, finish reason: stop=484 truncated=53
  • Prompt processing (pp): avg=100.53tk/s p(95)=472.82tk/s
  • Token generation (tg): avg=32.15tk/s p(95)=45.02tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sl/cudamalloc-clear-error commit=f3803dcc9692623f3200a28ac03917710bc5f711

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 655.88, 655.88, 655.88, 655.88, 655.88, 650.66, 650.66, 650.66, 650.66, 650.66, 649.74, 649.74, 649.74, 649.74, 649.74, 669.21, 669.21, 669.21, 669.21, 669.21, 712.54, 712.54, 712.54, 712.54, 712.54, 712.32, 712.32, 712.32, 712.32, 712.32, 737.67, 737.67, 737.67, 737.67, 737.67, 754.74, 754.74, 754.74, 754.74, 754.74, 771.44, 771.44, 771.44, 771.44, 771.44, 773.11, 773.11, 773.11, 773.11, 773.11, 793.8, 793.8, 793.8, 793.8, 793.8, 792.92, 792.92, 792.92, 792.92, 792.92, 806.23, 806.23, 806.23, 806.23, 806.23, 811.01, 811.01, 811.01, 811.01, 811.01, 828.09, 828.09, 828.09, 828.09, 828.09, 822.58, 822.58, 822.58, 822.58, 822.58, 829.5, 829.5, 829.5, 829.5, 829.5, 826.96, 826.96, 826.96, 826.96, 826.96, 813.63, 813.63, 813.63, 813.63, 813.63, 814.84, 814.84, 814.84, 814.84, 814.84, 822.65, 822.65, 822.65, 822.65, 822.65, 822.11, 822.11, 822.11, 822.11, 822.11, 823.74, 823.74, 823.74, 823.74, 823.74, 821.55, 821.55, 821.55, 821.55, 821.55, 819.79, 819.79, 819.79, 819.79, 819.79, 821.87, 821.87, 821.87, 821.87, 821.87, 832.53, 832.53, 832.53, 832.53, 832.53, 837.01, 837.01, 837.01, 837.01, 837.01, 833.83, 833.83, 833.83, 833.83, 833.83, 834.44, 834.44, 834.44, 834.44, 834.44, 839.72, 839.72, 839.72, 839.72, 839.72, 839.39, 839.39, 839.39, 839.39, 839.39, 838.61, 838.61, 838.61, 838.61, 838.61, 840.74, 840.74, 840.74, 840.74, 840.74, 833.2, 833.2, 833.2, 833.2, 833.2, 837.05, 837.05, 837.05, 837.05, 837.05, 834.53, 834.53, 834.53, 834.53, 834.53, 832.28, 832.28, 832.28, 832.28, 832.28, 832.76, 832.76, 832.76, 832.76, 832.76, 835.6, 835.6, 835.6, 835.6, 835.6, 838.22, 838.22, 838.22, 838.22, 838.22, 843.78, 843.78, 843.78, 843.78, 843.78, 820.92, 820.92, 820.92, 820.92, 820.92, 811.71, 811.71, 811.71, 811.71, 811.71, 811.6, 811.6, 811.6, 811.6, 811.6, 811.04, 811.04, 811.04, 811.04, 811.04, 813.92, 813.92, 813.92, 813.92, 813.92, 815.93, 815.93, 815.93, 815.93, 815.93, 815.58, 815.58, 815.58, 815.58, 815.58, 822.16, 822.16, 822.16, 822.16, 822.16, 821.01, 821.01, 821.01, 821.01, 821.01, 825.85, 825.85, 825.85, 825.85, 825.85, 819.05, 819.05, 819.05, 819.05, 819.05, 823.55, 823.55, 823.55, 823.55, 823.55, 825.37, 825.37, 825.37, 825.37, 825.37, 826.43, 826.43, 826.43, 826.43, 826.43, 826.35, 826.35, 826.35, 826.35, 826.35, 826.73, 826.73, 826.73, 826.73, 826.73, 825.92, 825.92, 825.92, 825.92, 825.92, 827.77, 827.77, 827.77, 827.77, 827.77, 827.78]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.6, 41.6, 41.6, 41.6, 41.6, 33.32, 33.32, 33.32, 33.32, 33.32, 28.81, 28.81, 28.81, 28.81, 28.81, 27.42, 27.42, 27.42, 27.42, 27.42, 28.76, 28.76, 28.76, 28.76, 28.76, 30.7, 30.7, 30.7, 30.7, 30.7, 31.46, 31.46, 31.46, 31.46, 31.46, 32.05, 32.05, 32.05, 32.05, 32.05, 32.58, 32.58, 32.58, 32.58, 32.58, 32.52, 32.52, 32.52, 32.52, 32.52, 32.27, 32.27, 32.27, 32.27, 32.27, 31.89, 31.89, 31.89, 31.89, 31.89, 31.26, 31.26, 31.26, 31.26, 31.26, 31.22, 31.22, 31.22, 31.22, 31.22, 30.39, 30.39, 30.39, 30.39, 30.39, 29.33, 29.33, 29.33, 29.33, 29.33, 28.77, 28.77, 28.77, 28.77, 28.77, 29.14, 29.14, 29.14, 29.14, 29.14, 29.19, 29.19, 29.19, 29.19, 29.19, 29.17, 29.17, 29.17, 29.17, 29.17, 29.18, 29.18, 29.18, 29.18, 29.18, 29.13, 29.13, 29.13, 29.13, 29.13, 29.16, 29.16, 29.16, 29.16, 29.16, 29.35, 29.35, 29.35, 29.35, 29.35, 29.21, 29.21, 29.21, 29.21, 29.21, 29.36, 29.36, 29.36, 29.36, 29.36, 29.54, 29.54, 29.54, 29.54, 29.54, 29.65, 29.65, 29.65, 29.65, 29.65, 29.85, 29.85, 29.85, 29.85, 29.85, 30.2, 30.2, 30.2, 30.2, 30.2, 30.19, 30.19, 30.19, 30.19, 30.19, 30.28, 30.28, 30.28, 30.28, 30.28, 30.33, 30.33, 30.33, 30.33, 30.33, 30.51, 30.51, 30.51, 30.51, 30.51, 30.55, 30.55, 30.55, 30.55, 30.55, 30.36, 30.36, 30.36, 30.36, 30.36, 30.3, 30.3, 30.3, 30.3, 30.3, 29.84, 29.84, 29.84, 29.84, 29.84, 29.98, 29.98, 29.98, 29.98, 29.98, 30.15, 30.15, 30.15, 30.15, 30.15, 30.33, 30.33, 30.33, 30.33, 30.33, 30.39, 30.39, 30.39, 30.39, 30.39, 30.36, 30.36, 30.36, 30.36, 30.36, 30.18, 30.18, 30.18, 30.18, 30.18, 29.93, 29.93, 29.93, 29.93, 29.93, 28.85, 28.85, 28.85, 28.85, 28.85, 28.76, 28.76, 28.76, 28.76, 28.76, 28.74, 28.74, 28.74, 28.74, 28.74, 28.8, 28.8, 28.8, 28.8, 28.8, 28.89, 28.89, 28.89, 28.89, 28.89, 28.94, 28.94, 28.94, 28.94, 28.94, 29.07, 29.07, 29.07, 29.07, 29.07, 29.05, 29.05, 29.05, 29.05, 29.05, 29.06, 29.06, 29.06, 29.06, 29.06, 28.89, 28.89, 28.89, 28.89, 28.89, 28.93, 28.93, 28.93, 28.93, 28.93, 29.04, 29.04, 29.04, 29.04, 29.04, 29.11, 29.11, 29.11, 29.11, 29.11, 29.19, 29.19, 29.19, 29.19, 29.19, 29.24, 29.24, 29.24, 29.24, 29.24, 29.31]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.27, 0.27, 0.27, 0.27, 0.27, 0.4, 0.4, 0.4, 0.4, 0.4, 0.29, 0.29, 0.29, 0.29, 0.29, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.3, 0.3, 0.3, 0.3, 0.3, 0.35, 0.35, 0.35, 0.35, 0.35, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.34, 0.34, 0.34, 0.34, 0.34, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.39, 0.39, 0.39, 0.39, 0.39, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.51, 0.51, 0.51, 0.51, 0.51, 0.6, 0.6, 0.6, 0.6, 0.6, 0.53, 0.53, 0.53, 0.53, 0.53, 0.38, 0.38, 0.38, 0.38, 0.38, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716135551 --> 1716136181
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0]
                    
Loading

@slaren
Copy link
Copy Markdown
Member Author

slaren commented May 19, 2024

It should only happen when oom, but the goal is to let the applications load a different model or with fewer layers offloaded without crashing the process or creating more errors.

@slaren slaren merged commit ab33f7a into master May 19, 2024
@slaren slaren deleted the sl/cudamalloc-clear-error branch May 19, 2024 12:19
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants