test-backend-ops: allow loading tests from file and parsing model operators into file#19896
test-backend-ops: allow loading tests from file and parsing model operators into file#19896
Conversation
|
@ggerganov @CISC Could one of you take a look at this? |
|
I think using JSON here is not really necessary - would prefer to avoid it. Simple ad-hoc data read/write should be OK. |
|
You mean a simple binary format? Sure, I can do that. |
|
Yes, either simple binary, or even text is fine. |
|
Alright, switched to a simple text file format. |
|
@ggerganov Is it okay like this? |
86c0299 to
201b8e4
Compare
There was a problem hiding this comment.
Yes, this is better.
I'm still wondering if we should mark the new llama_graph_reserve as experimental/unstable in some way? With the upcoming llama.cpp packages (#20042) we should be mindful how we change the public API and minimize the changes. This function seems quite
Sorry, some leftover partial comment from earlier - ignore.
…t-graph-ops to tests/
|
@ggerganov It's failing to build on Windows because llama-ext.h does not export the function with |
|
Yes, add |
5e469b6 to
062e7b1
Compare
|
Does it supposed to be installed as |
|
I suppose it could be called |
|
Thanks for clarifying. |
|
|
|
Ah yeah I now looked deeper, we (for ALT Linux) delete test binaries with Excuse me, llama.cpp build process is so compilated to comprehend everything from the first glance. |
|
Oh, I wasn't aware of that. @ggerganov Should I rename the binary? |
…rators into file (ggml-org#19896) * tests: allow loading test-backend-ops tests from json * add error threshold based on op * add error when file cannot be read * add graph operator json extraction tool * add nb parameter for non-contiguous input tensors * fix view check * only use view if non-contiguous/permuted, use C++ random instead of rand() * replace internal API calls with public llama_graph_reserve call * reduce test description length * fix nb[0] not getting set for view * add name to tests * fix inplace error * use text file instead of json * move llama_graph_reserve function to new llama-ext header, move export-graph-ops to tests/ * fix missing declaration * use pragma once * fix indent * fix Windows build
…rators into file (ggml-org#19896) * tests: allow loading test-backend-ops tests from json * add error threshold based on op * add error when file cannot be read * add graph operator json extraction tool * add nb parameter for non-contiguous input tensors * fix view check * only use view if non-contiguous/permuted, use C++ random instead of rand() * replace internal API calls with public llama_graph_reserve call * reduce test description length * fix nb[0] not getting set for view * add name to tests * fix inplace error * use text file instead of json * move llama_graph_reserve function to new llama-ext header, move export-graph-ops to tests/ * fix missing declaration * use pragma once * fix indent * fix Windows build
When working on backends, I often have the problem that a specific operator inside of a model fails. I have to track that down, add a test and then figure out a fix. This change is supposed to make that easier by allowing you to extract operators from a model file in a way that test-backend-ops can run them directly.
I first wanted to allow test-backend-ops to parse them directly, but since it's GGML-specific, I didn't find a good way to do that. Instead, I added the tool
llama-export-graph-opsto load a model, parse the pp/tg cgraphs and put the operators into a JSON file, which test-backend-ops can load. Let me know if there's a better way to do this that I missed, I had to add llama_graph_reserve to the public API to avoid using internal API headers.A "generic operator" test also didn't fit as neatly into test-backend-ops as I had hoped, I tried to put all the special error threshold and initialization functions into it in the least intrusive way. Let me know if there's a better way to handle this and the graph extraction that I didn't see.
I plan to expand
llama-export-graph-opsto allow other sources for tests, for example HF metadata (#19796) could be useful to avoid downloading a model if a backend issue has been reported with it.Model-specific test-backend-ops may also be useful to identify operators that perform worse than expected inside of models or to compare operator performance across backends.
Example for Qwen3 4B Q8_0:
Claude Code was used to assist, but I wrote and tested the code.