In your blog, you emphasize "We record the state and the probabilities produced by the MCTS" Do you mean we record board state, priors and values? Trainer.exceute_episode ret.append((hist_state, hist_action_probs, reward * ((-1) ** (hist_current_player != current_player))))