Non-deterministic data reading of image_data_layer in parallel training

Hi all, I have a question about the deterministic batch input of image_data_layer when doing parallel training. Suppose we have a dataset which only contains four batches named A, B, C, D, respectively. And we have 4 solvers (S1,S2,S3,S4) by using 4 GPUs. We also suppose that the dataset will not be randomly shuffled during training. I have checked the implementation of BasePrefetchingDataLayer to find it is only guaranteed that different solvers get their input batch sequentially but not in fixed order. Then I wonder we may encounter the following problem: at T-th iteration, the input batch for S1~S4 may be A, B,C, D, respectively, but at the next iteration, it is quite probable the input batch for S1~S4 might become B, C, A, D or something else. Such non-deterministic behavior may be dangerous in some cases. Could anyone kindly tell me whether my above doubt are correct?

Besides, could anyone please explain to me why the following sentences "using Params<Dtype>::size_; using Params<Dtype>::data_; using Params<Dtype>::diff_;" are used in the definition of classes: GPUParams and P2PSync (defined in parallel.hpp)? Personally, the using-declarations are generally to solve the problem that members in base class are shadowed in derived class, which however seems not the case for GPUParams and P2PSync. Therefore, I wonder if these sentences are necessary.

Thanks in advance!  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deterministic data reading of image_data_layer in parallel training #4590

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-deterministic data reading of image_data_layer in parallel training #4590

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions