Test the effect of the importance sampling correction flags (PR https://github.com/NVIDIA/reinforcer/pull/174) on convergence: - `use_on_policy_kl_approximation=True` - `use_importance_sampling_correction=True`
Test the effect of the importance sampling correction flags (PR #174) on convergence:
use_on_policy_kl_approximation=Trueuse_importance_sampling_correction=True