Skip to content

Training GPU Utilization and Other Kind of Dataset Format #18

@JaaackHongggg

Description

@JaaackHongggg

Thanks for your great work!

I am wondering what is the average gpu utilization when you train lorat on your device? It seems that you use V100 and 4090 in the paper.

When I train lorat by using your code without any modification, the GPU utilization is only about 60 by using H100. I have tested with different num_train_workers and num_io_threads_per_worker, the GPU utilization is always 60 for almost all the time. I use the cpfs (Cloud Parallel File Storage by aliyun) as the file system, which is designed for high performance computing.

I think that maybe the bottleneck is the disk io, but the cpfs is quick enough. Thus, maybe we should turn to other kinds of dataset format, such as lmdb, webdataset, or parquet, which is designed for high performance computing, instead of reading the image frequently from disk.

Do you have any plan to support other kinds of dataset format? I can give my help, if you can develop the support for other dataset format.

If you have any question, plz feel free to contact me.

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions