now we hard code the type as float32. because we are trying to avoid dynamic dispatching for performance, need to figure out a way to get this work
now we hard code the type as float32. because we are trying to avoid dynamic dispatching for performance, need to figure out a way to get this work