[coati] add vf_coef argument for PPOTrainer

### Describe the feature

### Describe the feature
Adding vf_coef as shown in original PPO paper:
```
value_loss = vf_coef * 0.5 *(value - pred_value) ** 2
```