MatMul Benchmark

Run:

FC=gfortran cmake -DMATMUL_BLAS=OpenBLAS .
make
OMP_NUM_THREADS=1 ./matmul

Benchmark results on Apple M1

The theoretical performance peak for matmul is just the cost of fma, which is 0.125 clock cycles per double precision matrix element (fmla.2d v0, v0, v0 takes 0.25 cycles), and 0.0625 per single precision element.

Single precison (f32) matmul

peak = 0.0625 clock cycles

n    OpenBlas
512  0.0768
1024 0.0672
2048 0.0640
4096 0.0632
8192 0.0631

To convert these clock cycles to seconds, multiply by n^3 and divide by 3.2GHz. For example the n=8192 case gives 10.84s:

>>> n = 8192; 0.0631*n**3 / 3.2e9
10.840497455104

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
ci		ci
cmake		cmake
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
environment.yml		environment.yml
linalg_accelerate.c		linalg_accelerate.c
linalg_c.f90		linalg_c.f90
linalg_f.f90		linalg_f.f90
linalg_openblas.c		linalg_openblas.c
m.f90		m.f90
main.f90		main.f90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MatMul Benchmark

Benchmark results on Apple M1

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

certik/matmul

Folders and files

Latest commit

History

Repository files navigation

MatMul Benchmark

Benchmark results on Apple M1

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages