Hi,
Thank you so much for this amazing work ! I was wondering if you have idea of how to maximize the speed time of inference for the model ? Currently, I am trying it on a NVIDIA RTX 4500 Ada Generation with Flash Attention and quantization and the inference time for one image is around 1-2s. Do you know if we can lower this more, this would really help in making my robotics stack even more reactive.
Additionally, I was considering triangulating the output of the model coming from two different views to obtain a quite accurate 3D depth estimation for my usecase. Therefore do you know if the inference of the model can be batched on two images without losing time ?
Thank you very much for your help !
Best,
Maxime
Hi,
Thank you so much for this amazing work ! I was wondering if you have idea of how to maximize the speed time of inference for the model ? Currently, I am trying it on a NVIDIA RTX 4500 Ada Generation with Flash Attention and quantization and the inference time for one image is around 1-2s. Do you know if we can lower this more, this would really help in making my robotics stack even more reactive.
Additionally, I was considering triangulating the output of the model coming from two different views to obtain a quite accurate 3D depth estimation for my usecase. Therefore do you know if the inference of the model can be batched on two images without losing time ?
Thank you very much for your help !
Best,
Maxime