Skip to content

Time inference and batching #12

@MaximeSabbah

Description

@MaximeSabbah

Hi,

Thank you so much for this amazing work ! I was wondering if you have idea of how to maximize the speed time of inference for the model ? Currently, I am trying it on a NVIDIA RTX 4500 Ada Generation with Flash Attention and quantization and the inference time for one image is around 1-2s. Do you know if we can lower this more, this would really help in making my robotics stack even more reactive.
Additionally, I was considering triangulating the output of the model coming from two different views to obtain a quite accurate 3D depth estimation for my usecase. Therefore do you know if the inference of the model can be batched on two images without losing time ?

Thank you very much for your help !

Best,

Maxime

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions