Skip to content

Conversation

@fzi-peccia
Copy link
Contributor

Added integration to generate C code able to execute neural networks on the Gemmini accelerator. Information about this can be found on this post

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 12, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@mehrdadh mehrdadh changed the title Gemmini code generation using microTVM [microTVM]Gemmini code generation using microTVM Jan 12, 2023
@mehrdadh
Copy link
Member

mehrdadh commented Jan 17, 2023

@fzi-peccia fzi-peccia FYI, I have plan to review this PR this week.

Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass.

@@ -0,0 +1,3 @@
This directory contains code to create code for the Gemmini accelerator using microTVM. These tests are then executed on the Spike RISC-V ISA simulator.

In order to use this correctly, the Spike simulator has to be installed. This can be done by following the steps found on the Chipyard repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to instruction is missing

@@ -0,0 +1,68 @@
include $(abs_top_srcdir)/Makefrag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you consolidate all the Makefiles to a Makefile.template and modify it based on the project type in generate_project step?

with open(source_dir / "model.h", "w") as f:
f.write(model_h_template.substitute(template_values))

# Arduino ONLY recognizes .ino, .ccp, .c, .h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

"""Changes all #include statements in project_dir to be relevant to their
containing file's location.

Arduino only supports includes relative to a file's location, so this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the function description

@@ -0,0 +1,117 @@
if(USE_MICRO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a separate flag which is disabled by default, maybe use USE_GEMMINI

@@ -0,0 +1,395 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tutorial files should move to somewhere under gallery/how_to/. Also you need to change the format to .py file and write it in sphinx format. Now we support notebook generation and google colab, so you can even add cells to install all the dependencies and run it in google colab

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I am on it

Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another pass. Thanks @fzi-peccia!

@@ -0,0 +1,74 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is highly recommended to use CMakeFile instead of Makefile to make it cross-platform compatible.

ENV = Environment.instance()


def create_header_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse the existing create_header_file function in TVM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to use the standard create_header_file from tvm.micro.testing.utils, but I changed a line in it to generate a define, instead of a const. I think this should be a define, but if that is not the case, I will need to continue using my own create_header_file

Krzysztof Parzyszek and others added 24 commits March 27, 2023 08:22
… `requantize` (apache#13578)

* wip

* hack to convert size-1 scale and zp tensors to scalar

* fix to binary op fast path

* check output zp

* add assert

* add comment

* lint

* clean up beta handling

* use regular binary op only for 32 bit add (bias addition)

* do float(beta) when we know that beta is not None

* restore original beta handling code to avoid mul by 1

* add comment on overflow
This fixes `ecr_pull` so that `docker-images.ini` can be updated with Docker images from a previous CI run for testing purposes

Example run: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-cortexm/detail/PR-13590/4/pipeline/#step-80-log-9
…e_write (apache#13510)

Add optional consumer blocks to cache_write.
* add baddbmm conversion

* fix

* suppress lint
* [OpenCL][CI] Enable OpenCL cpp tests in CI

* Add building gtest for OpenCL in GPU build

* Fix CI build

* Change OpenCL cpp tests build approach

* Fix lint

* Try to enable test in CI

* Update version of gpu docker image

* Change script mod
…che#12684)

[Relay] Bug fix in relay.squeeze function. Also added functionality for parameter axis of type int
The current implementation of `CombineParallelDense` is hardcoded to slice along the last axis after the combined dense. I hit an error using this pass on the stable diffusion UNet, since it has a combined group where the dense is followed by `expand_dims` which changes the slicing axis (see https://github.com/masahi/torchscript-to-tvm/blob/master/stable-diffusion/compile.py for repro)

```
  %76 = concatenate(%74) /* ty=Tensor[(20160, 1280), float32] */;
  %79 = concatenate(%77) /* ty=Tensor[(20160), float32] */;
  %78 = nn.dense(%75, %76, units=20160) /* ty=Tensor[(2, 20160), float32] */;
  %80 = nn.bias_add(%78, %79, axis=-1) /* ty=Tensor[(2, 20160), float32] */;
  %81 = expand_dims(%80, axis=2) /* ty=Tensor[(2, 20160, 1), float32] */;
  %82 = expand_dims(%81, axis=3) /* ty=Tensor[(2, 20160, 1, 1), float32] */;
```

The correct way to generate `strided_slice`:
```
  %84 = strided_slice(%82, begin=[0, 0, 0, 0], end=[-1, 320, -1, -1], strides=[1, 1, 1, 1], slice_mode="size", axes=None) /* ty=Tensor[(2, 320, 1, 1), float32] */;
``` 

As I documented in the code, this fix is probably not 100% fail-proof. I think this is a difficult problem, since it requires tracking how the original output-channel axis of the combined dense moves across shape-changing operations like `reshape /transpose / split`. But this is at least "more correct" than the current implementation, so I'm submitting this fix as is for now.

With this fix, `CombineParallelDense` works successfully on the stable diffusion UNet, and it reduces the number of `nn.dense` from 184 to 100.
… buffer (apache#13605)

* Fix PlanAndUpdateBufferAllocationLocation not visiting constant buffer

* add comment
…ache#13414)

Enable depthwise conv2d NHWC with HWIO kernel layout.  The default kernel layout is HWOI, matched to previous behavior.
…che#13602)

* Add support for SequenceAt and SplitToSequence to onnx importer

* Formatting

* Change keepdims comparison

* Only unify non-tuples in If
…#13606)

* introduce LowerToPrimFunc to lower Relay func to TIR prim func

* add doc

* expose to python

* adding test

* another minor doc update

* Verify that the input is a primitive function
…CopyConstants scheduler (apache#13588)

In Ethos-U, CopyConstants scheduler currently copies weights for all operators. But in Vela, there are a number of scenarios where the weights are not buffered in SRAM, and FullyConnected case is one of them.
Pass `std::nullopt` to initialization of `PassBuilder` for `PGOOptions`.
LLVM is moving away from its own `Optional` type to `std::optional`.
…13616)

default_rng was introduced in numpy 1.19, which is not present
even in Ubuntu 20.04 (it comes with 1.17.4).
…abase (apache#13611)

[Metaschedule] Align get_top_k logic in MemoryDatabase and JSONDatabase
…ase (apache#13618)

* fixed tensor core batch_matmul legalize for transpose_b = False case

* add test

* clean up
…che#13615)

In the Relay Matmul shape relation, we are a little over enthusiastic about unifying dynamic shapes. If one of the shapes is static, it does not need to be unified. This change only rewrites dynamic shapes to required static constraints.

* Remove overwriting of matmul shapes when they are static

* Simplify nesting

* Add shape check to dense tests.
[Frontend] [ONNX] Support sequence_lens of GRU.

Support convert sequence_lens input of GRU.
* [ETHOSN] Add support for experimental compiler option

The support library currently supports enabling the experimental
cascading compiler option via an environment variable
`FORCE_EXPERIMENTAL_COMPILER`. This commit exposes the ability to
enable this option through TVMC.
…#13622)

* Fix print round-tripable multi thread env binding

* add unittest
@fzi-peccia
Copy link
Contributor Author

Thanks for the feedback @mehrdadh. I will work on this changes this week and let you know when everything is applied

@fzi-peccia
Copy link
Contributor Author

Hi @mehrdadh, all tests have passed except these two:

  • cortexm/pr-head: some problem with zephyr and mlfperf-tiny, this has nothing to do with my changes.
  • gpu/pr-head: some problem building the tutorials documentation, do you know a workaround to solve this?

@tqchen tqchen closed this Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.