Training GPU Utilization and Other Kind of Dataset Format

Thanks for your great work!

I am wondering what is the average gpu utilization when you train lorat on your device? It seems that you use V100 and 4090 in the paper.

When I train lorat by using your code without any modification, the GPU utilization is only about **60** by using H100. I have tested with different `num_train_workers` and `num_io_threads_per_worker`, the GPU utilization is always **60** for almost all the time. I use the **cpfs** (Cloud Parallel File Storage by aliyun) as the file system, which is designed for high performance computing. 

I think that maybe the bottleneck is the disk io, but the cpfs is quick enough. Thus, maybe we should turn to other kinds of dataset format, such as lmdb, webdataset, or parquet, which is designed for high performance computing, instead of reading the image frequently from disk.

Do you have any plan to support other kinds of dataset format? I can give my help, if you can develop the support for other dataset format.

If you have any question, plz feel free to contact me.

Best regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training GPU Utilization and Other Kind of Dataset Format #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training GPU Utilization and Other Kind of Dataset Format #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions