This guide explains the training process in detail.
The StandardTrainer executes the following loop:
-
Setup:
- Loads data loaders.
- Builds model, optimizer, scheduler.
- Initializes WandB (if configured).
-
Epoch Loop:
- Train Step:
- Load batch.
- Apply augmentations (if not applied in dataset).
- Forward pass through
Detector. - Compute Loss (weighted sum of all losses).
- Backprop & Optimizer Step.
- Logging:
- Every
batch_log_interval, logs running loss to console/WandB.
- Every
- Evaluation:
- Every
eval_every_epochs(or steps). - Computes EER, minDCF on Validation set.
- Saves checkpoint if metric improves.
- Every
- Train Step:
Checkpoints are saved in outputs/EXP_NAME/ckpts/.
ckpt_epochXX_stepYY.pth: Periodic checkpoints.best_model.pth: The best model according tomonitor_metric.
To resume, you currently need to modify the code or config to load a specific checkpoint, or implement a resume_from flag in train.py (see tutorials/extending.md).