cann: refactor ACL graph cache by wangweixuan · Pull Request #17752 · ggml-org/llama.cpp

wangweixuan · 2025-12-04T06:15:10Z

For CANN backend: This PR refactors the LRU cache responsible for mapping GGML graph properties to CANN graphs. We have reorganized the graph property-related code from the lengthy ggml-cann.cpp file into dedicated structures ggml_graph_node_properties, ggml_cann_graph, and ggml_cann_graph_lru_cache.

This change aims to improve code clarity by separating concerns. There are no functional changes. The code has been formatted with clang-format.

CC @noemotiovon

Move the graph property checking code into methods of LRU cache. Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>

noemotiovon · 2025-12-04T12:07:16Z

@TianHao324 , could you help test its actual performance on qwen2.5b–0.5b?
Also, please include the actual test results.

TianHao324 · 2025-12-08T02:34:44Z

@TianHao324 , could you help test its actual performance on qwen2.5b–0.5b? Also, please include the actual test results.

test result:

user
Building a website can be done in 10 simple steps:
assistant
Sure, here are 10 simple steps to build a website:

1. **Define Your Purpose**: Determine the goals and purpose of your website. What problem do you want to solve or what information do you want to share?

2. **Choose a Platform**: Decide whether you want to use a static website built with HTML and CSS, a CMS (Content Management System) like WordPress or Wix, or a framework like Bootstrap for rapid development.

3. **Choose a Theme or Template**: Select a template that matches your brand or the look and feel of your site.

4. **Design Your Site**: Use your chosen platform to design your website. It could be a simple website with just a few pages or a fully functional website with a sophisticated layout and features.

5. **Choose a Domain Name and Hosting**: Select a domain name that reflects your brand and a hosting provider that is reliable and easy to manage.

6. **Choose a CMS or Framework**: Use a CMS like WordPress or a framework like Laravel for more advanced features like user management, plugins, or custom functions.

7. **Select a Programming Language**: Choose a programming language that your site will run on. For websites, Python is a good choice due to its flexibility and community support.

8. **Develop Your Website**: Use your chosen platform to develop your website. This involves coding, testing, and deployment.

9. **Test Your Website**: Use tools like Google Tools or BrowserStack to test your website on different devices and browsers.

10. **Launch Your Website**: Once your website is ready, launch it and promote it through social media, email marketing, and other channels to attract visitors and drive traffic.

These steps should help you get started with building a website. Remember to keep your site accessible and user-friendly, and always ensure that your website complies with web standards and legal requirements.

llama_perf_sampler_print:    sampling time =      22.67 ms /   110 runs   (    0.21 ms per token,  4852.87 tokens per second)
llama_perf_context_print:        load time =    1698.58 ms
llama_perf_context_print: prompt eval time =      16.20 ms /    21 tokens (    0.77 ms per token,  1296.22 tokens per second)
llama_perf_context_print:        eval time =    1048.04 ms /    88 runs   (   11.91 ms per token,    83.97 tokens per second)
llama_perf_context_print:       total time =    2343.02 ms /   109 tokens
llama_perf_context_print:    graphs reused =         85

hipudding · 2025-12-18T08:54:50Z

cc @noemotiovon

noemotiovon · 2025-12-19T02:18:50Z

@TianHao324, Based on your test results, it appears that ACL Graph was not enabled. Please enable the ACL Graph feature and run the test again.

noemotiovon · 2025-12-19T07:26:51Z

cli-test

> 
common_perf_print:    sampling time =     401.10 ms
common_perf_print:    samplers time =     127.81 ms /   389 tokens
common_perf_print:        load time =    7406.25 ms
common_perf_print: prompt eval time =      15.55 ms /    20 tokens (    0.78 ms per token,  1286.01 tokens per second)
common_perf_print:        eval time =    1618.30 ms /   368 runs   (    4.40 ms per token,   227.40 tokens per second)
common_perf_print:       total time =    2685.78 ms /   388 tokens
common_perf_print: unaccounted time =     650.84 ms /  24.2 %      (total - sampling - prompt eval - eval) / (total)
common_perf_print:    graphs reused =        366
llama_memory_breakdown_print: | memory breakdown [MiB]  | total    free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CANN0 (Ascend910B4) | 30196 = 28450 + (1300 =   942 +      48 +     310) +         444 |
llama_memory_breakdown_print: |   - Host                |                   269 =   259 +       0 +       9                |

noemotiovon · 2025-12-19T07:28:41Z

parallel test:

main: clearing the KV cache

run parameters as of 2025-12-19 07:27:54

main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
External prompt file: used built-in defaults
Model and path used:  /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd

Total prompt tokens:  17075, speed: 544.94 t/s
Total gen tokens:     13334, speed: 425.55 t/s
Total speed (AVG):           speed: 970.49 t/s
Cache misses:             0

llama_perf_context_print:        load time =    1695.63 ms
llama_perf_context_print: prompt eval time =   12679.61 ms / 30648 tokens (    0.41 ms per token,  2417.11 tokens per second)
llama_perf_context_print:        eval time =     158.12 ms /    34 runs   (    4.65 ms per token,   215.03 tokens per second)
llama_perf_context_print:       total time =   31337.91 ms / 30682 tokens
llama_perf_context_print:    graphs reused =         32

noemotiovon · 2025-12-19T07:30:06Z

@hipudding LGTM, If you have time, feel free to jump in and help out.

Move the graph property checking code into methods of LRU cache. Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>

cann : refactor ACL graph cache

a60c106

Move the graph property checking code into methods of LRU cache. Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>

wangweixuan mentioned this pull request Dec 4, 2025

CANN: refactor ACL graph cache matching noemotiovon/llama.cpp#18

Closed

1 task

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Dec 4, 2025

loci-dev mentioned this pull request Dec 4, 2025

UPSTREAM PR #17752: cann: refactor ACL graph cache auroralabs-loci/llama.cpp#428

Open

hipudding approved these changes Dec 24, 2025

View reviewed changes

hipudding merged commit ce7a6dc into ggml-org:master Dec 24, 2025
130 of 145 checks passed

rauletorresc mentioned this pull request Dec 24, 2025

CANN: Refactor is_matched_graph for better maintainability #17758

Closed

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

CANN : refactor ACL graph cache (#17752)

90643bc

Move the graph property checking code into methods of LRU cache. Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cann: refactor ACL graph cache#17752

cann: refactor ACL graph cache#17752
hipudding merged 1 commit intoggml-org:masterfrom
wangweixuan:cann_acl_graph_cache_refactor

wangweixuan commented Dec 4, 2025

Uh oh!

noemotiovon commented Dec 4, 2025

Uh oh!

TianHao324 commented Dec 8, 2025 •

edited

Loading

Uh oh!

hipudding commented Dec 18, 2025

Uh oh!

noemotiovon commented Dec 19, 2025

Uh oh!

noemotiovon commented Dec 19, 2025

Uh oh!

noemotiovon commented Dec 19, 2025

Uh oh!

noemotiovon commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wangweixuan commented Dec 4, 2025

Uh oh!

noemotiovon commented Dec 4, 2025

Uh oh!

TianHao324 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hipudding commented Dec 18, 2025

Uh oh!

noemotiovon commented Dec 19, 2025

Uh oh!

noemotiovon commented Dec 19, 2025

cli-test

Uh oh!

noemotiovon commented Dec 19, 2025

parallel test:

Uh oh!

noemotiovon commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TianHao324 commented Dec 8, 2025 •

edited

Loading