Skip to content

cann: refactor ACL graph cache#17752

Merged
hipudding merged 1 commit intoggml-org:masterfrom
wangweixuan:cann_acl_graph_cache_refactor
Dec 24, 2025
Merged

cann: refactor ACL graph cache#17752
hipudding merged 1 commit intoggml-org:masterfrom
wangweixuan:cann_acl_graph_cache_refactor

Conversation

@wangweixuan
Copy link
Contributor

For CANN backend: This PR refactors the LRU cache responsible for mapping GGML graph properties to CANN graphs. We have reorganized the graph property-related code from the lengthy ggml-cann.cpp file into dedicated structures ggml_graph_node_properties, ggml_cann_graph, and ggml_cann_graph_lru_cache.

This change aims to improve code clarity by separating concerns. There are no functional changes. The code has been formatted with clang-format.

CC @noemotiovon

Move the graph property checking code into methods of LRU cache.

Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Dec 4, 2025
@noemotiovon
Copy link
Collaborator

@TianHao324 , could you help test its actual performance on qwen2.5b–0.5b?
Also, please include the actual test results.

@TianHao324
Copy link
Contributor

TianHao324 commented Dec 8, 2025

@TianHao324 , could you help test its actual performance on qwen2.5b–0.5b? Also, please include the actual test results.

test result:

user
Building a website can be done in 10 simple steps:
assistant
Sure, here are 10 simple steps to build a website:

1. **Define Your Purpose**: Determine the goals and purpose of your website. What problem do you want to solve or what information do you want to share?

2. **Choose a Platform**: Decide whether you want to use a static website built with HTML and CSS, a CMS (Content Management System) like WordPress or Wix, or a framework like Bootstrap for rapid development.

3. **Choose a Theme or Template**: Select a template that matches your brand or the look and feel of your site.

4. **Design Your Site**: Use your chosen platform to design your website. It could be a simple website with just a few pages or a fully functional website with a sophisticated layout and features.

5. **Choose a Domain Name and Hosting**: Select a domain name that reflects your brand and a hosting provider that is reliable and easy to manage.

6. **Choose a CMS or Framework**: Use a CMS like WordPress or a framework like Laravel for more advanced features like user management, plugins, or custom functions.

7. **Select a Programming Language**: Choose a programming language that your site will run on. For websites, Python is a good choice due to its flexibility and community support.

8. **Develop Your Website**: Use your chosen platform to develop your website. This involves coding, testing, and deployment.

9. **Test Your Website**: Use tools like Google Tools or BrowserStack to test your website on different devices and browsers.

10. **Launch Your Website**: Once your website is ready, launch it and promote it through social media, email marketing, and other channels to attract visitors and drive traffic.

These steps should help you get started with building a website. Remember to keep your site accessible and user-friendly, and always ensure that your website complies with web standards and legal requirements.

llama_perf_sampler_print:    sampling time =      22.67 ms /   110 runs   (    0.21 ms per token,  4852.87 tokens per second)
llama_perf_context_print:        load time =    1698.58 ms
llama_perf_context_print: prompt eval time =      16.20 ms /    21 tokens (    0.77 ms per token,  1296.22 tokens per second)
llama_perf_context_print:        eval time =    1048.04 ms /    88 runs   (   11.91 ms per token,    83.97 tokens per second)
llama_perf_context_print:       total time =    2343.02 ms /   109 tokens
llama_perf_context_print:    graphs reused =         85

@hipudding
Copy link
Collaborator

cc @noemotiovon

@noemotiovon
Copy link
Collaborator

@TianHao324, Based on your test results, it appears that ACL Graph was not enabled. Please enable the ACL Graph feature and run the test again.

@noemotiovon
Copy link
Collaborator

cli-test

> 
common_perf_print:    sampling time =     401.10 ms
common_perf_print:    samplers time =     127.81 ms /   389 tokens
common_perf_print:        load time =    7406.25 ms
common_perf_print: prompt eval time =      15.55 ms /    20 tokens (    0.78 ms per token,  1286.01 tokens per second)
common_perf_print:        eval time =    1618.30 ms /   368 runs   (    4.40 ms per token,   227.40 tokens per second)
common_perf_print:       total time =    2685.78 ms /   388 tokens
common_perf_print: unaccounted time =     650.84 ms /  24.2 %      (total - sampling - prompt eval - eval) / (total)
common_perf_print:    graphs reused =        366
llama_memory_breakdown_print: | memory breakdown [MiB]  | total    free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CANN0 (Ascend910B4) | 30196 = 28450 + (1300 =   942 +      48 +     310) +         444 |
llama_memory_breakdown_print: |   - Host                |                   269 =   259 +       0 +       9                |

@noemotiovon
Copy link
Collaborator

parallel test:

main: clearing the KV cache

run parameters as of 2025-12-19 07:27:54

main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
External prompt file: used built-in defaults
Model and path used:  /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd

Total prompt tokens:  17075, speed: 544.94 t/s
Total gen tokens:     13334, speed: 425.55 t/s
Total speed (AVG):           speed: 970.49 t/s
Cache misses:             0

llama_perf_context_print:        load time =    1695.63 ms
llama_perf_context_print: prompt eval time =   12679.61 ms / 30648 tokens (    0.41 ms per token,  2417.11 tokens per second)
llama_perf_context_print:        eval time =     158.12 ms /    34 runs   (    4.65 ms per token,   215.03 tokens per second)
llama_perf_context_print:       total time =   31337.91 ms / 30682 tokens
llama_perf_context_print:    graphs reused =         32

@noemotiovon
Copy link
Collaborator

@hipudding LGTM, If you have time, feel free to jump in and help out.

@hipudding hipudding merged commit ce7a6dc into ggml-org:master Dec 24, 2025
130 of 145 checks passed
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
Move the graph property checking code into methods of LRU cache.

Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants