RL reinforcement learning parallel environment procedure, under which one agent could play multiple environments, is employed to speed up the training process