-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[microNPU] Integrate the cascader #10862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This patch supersedes #10377, so that one can be closed. |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ekalda!
It is nice to see this working and green in CI.
I have left few comments around testing and user facing errors.
|
|
||
| def _ethos_u55_cascader() -> Callable: | ||
| flash = MemoryRegion(name="FLASH", size=10 ** 7, read_bandwidth=4, write_bandwidth=4) | ||
| sram = MemoryRegion(name="SRAM", size=10 ** 6, read_bandwidth=16, write_bandwidth=16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Maybe for a subsequent PR] Will it be possible plumb these values using this :
tvm/src/relay/backend/build_module.cc
Lines 414 to 417 in fcdf463
| IRModule func_module = WithAttrs(IRModule::FromExpr(func), | |
| {{tvm::attr::kExecutor, executor_}, | |
| {tvm::attr::kRuntime, runtime_}, | |
| {tvm::attr::kWorkspaceMemoryPools, workspace_memory_pools_}}); |
An example test :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with the caveat that we'll assume that there is one workspace pool in the system - is it ok to assume that for now or should we handle the case where there are several workspace pools where some of them are not accessible for the NPU?
|
|
||
| return conv2d | ||
|
|
||
| infra.compare_tvm_with_tflite(tf_graph, [ifm_shape], accel_type, enable_cascader=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these tests, I think we need to ensure the memory usage is reduced.
We could use calculate_workspace_size TIR analysis utility for that. E.g. :
| assert tvm.tir.analysis.calculate_workspace_bytes(primfunc, alignment) == size |
OR we could use the final memory calculation in a non-USMP flow as follows :
tvm/tests/python/relay/aot/test_crt_aot.py
Lines 937 to 956 in fcdf463
| def test_workspace_calculation(workspace_byte_alignment, main_workspace_size): | |
| mod, params = tvm.relay.testing.synthetic.get_workload() | |
| target = "c" | |
| runtime = Runtime("crt") | |
| executor = Executor( | |
| "aot", | |
| { | |
| "workspace-byte-alignment": workspace_byte_alignment, | |
| }, | |
| ) | |
| with tvm.transform.PassContext( | |
| opt_level=3, | |
| config={ | |
| "tir.disable_vectorize": True, | |
| }, | |
| ): | |
| lib = tvm.relay.build(mod, target, executor=executor, runtime=runtime, params=params) | |
| mlf_memory_map = mlf._build_function_memory_map(lib.function_metadata) | |
| assert mlf_memory_map["main"][0]["workspace_size_bytes"] == main_workspace_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the tests to check for reduction in memory instead of bitwise accuracy with TFLite (I think it would be still good to have some FVP based tests when cascader is enabled, but it looks like that would need quite a bit of infra refactor, so I'll do it in a separate patch, if this is ok)
lhutton1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me modulo @manupa-arm's comments :)
1c20fcd to
2042445
Compare
| return compiler_attrs.accelerator_config | ||
|
|
||
|
|
||
| def enable_cascader(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better is_cascader_enabled, otherwise, it seems you are enabling it with this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I changed the function name to is_cascader_enabled, but kept the flag/variable as enable_cascader to align with the philosophy of enable_usmp
NicolaLancellotti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you @ekalda.
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks great!
just a nit and a question
| mod, params, accel_type, pool_size, enable_cascader=True | ||
| ) | ||
|
|
||
| assert workspace_size_cascader_enabled < workspace_size_cascader_disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, should we not check for exact values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point, I changed it to check for the exact values
| ) | ||
|
|
||
|
|
||
| def _extract_memory_info(memory_pool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better for this utility to be part of the cascader and not get exposed to the codegen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
e764f8c to
094fbcb
Compare
Integrate the cascader into the codegen and optionally enable it with the enable_cascader flag. Includes placeholder MemoryRegions until integration with the PoolInfos provided by a user. Co-authored-by: Matthew Barrett <matthew.barrett@arm.com>
Plumb the workspace memory pools into into the cascader and make the tests to check for the memory reduction.
Change-Id: If2d92846f05a7e8b21be767163841084538805a9
094fbcb to
f212de6
Compare
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a suggestion for a follow up.
| ), "Exactly one workspace pool needs to be provided for the U55 cascader" | ||
|
|
||
| sram = extract_memory_info(workspace_memory_pools.pools[0]) | ||
| tir_mod = LowerToTIR(_ethos_u55_cascader(sram))(mod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a followup : Please consider absorbing the call to extract_memory_info inside the cascader. (Sorry for not being clear before). Ideally, we'd want to remove the "MemoryRegion" construct and to get there in the current direction of travel, we should try to confine the usage of it inside the cascader. Therefore the interface of the cascader should be made to accept MemoryPool(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this would be the right way to go about it. Let's do it in a follow up!
|
Thanks @ekalda @mbaret @lhutton1 @NicolaLancellotti ! This is merged now! |
* [microNPU] Integrate the cascader Integrate the cascader into the codegen and optionally enable it with the enable_cascader flag. Includes placeholder MemoryRegions until integration with the PoolInfos provided by a user. Co-authored-by: Matthew Barrett <matthew.barrett@arm.com> * Fix linting and a docstring * Plumbing and testing improvements Plumb the workspace memory pools into into the cascader and make the tests to check for the memory reduction. * enable_cascader() -> is_cascader_enabled() * Check for the exact value of workspace size * Remove unused ACCEL_TYPES * Linting... Change-Id: If2d92846f05a7e8b21be767163841084538805a9 * Rebasing... Co-authored-by: Matthew Barrett <matthew.barrett@arm.com>
Integrate the cascader into the codegen and optionally enable it
with the enable_cascader flag. Includes placeholder MemoryRegions until
integration with the PoolInfos provided by a user.
Co-authored-by: Matthew Barrett matthew.barrett@arm.com