Time inference and batching

Hi, 

Thank you so much for this amazing work ! I was wondering if you have idea of how to maximize the speed time of inference for the model ? Currently, I am trying it on a NVIDIA RTX 4500 Ada Generation with Flash Attention and quantization and the inference time for one image is around 1-2s. Do you know if we can lower this more, this would really help in making my robotics stack even more reactive. 
Additionally, I was considering triangulating the output of the model coming from two different views to obtain a quite accurate 3D depth estimation for my usecase. Therefore do you know if the inference of the model can be batched on two images without losing time ? 

Thank you very much for your help !

Best, 

Maxime  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time inference and batching #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Time inference and batching #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions