{
"tot_epoch": 10000,
"tot_step": 2300000,
"single_training_step": 2000000,
"train_lambda": 2048,
"lr": {
"base": 0.00005,
"decay": [0.3, 0.1, 0.03, 0.01],
"decay_interval": [1900000, 2250000, 2270000, 2290000]
}
}
When I trained a model of 2048 using the training strategy provided by the author, the results were similar to those of the pre-trained model. However, when I switched to 512, the results were significantly different. Does anyone know why?