Logging support added for HF Trainer stack#938
Logging support added for HF Trainer stack#938quic-abhamidi wants to merge 1 commit intoquic:ft_experimental_v1from
Conversation
9fbba39 to
6b51eec
Compare
quic-akuruvil
left a comment
There was a problem hiding this comment.
Rebase against ft_v1 branch. Not up to date with latest tip of ft_v1
| logger = Logger(__name__) | ||
| # Setting the path for dumping the log file | ||
| output_dir = Path(ConfigManager().config.training["output_dir"]) | ||
| log_file_name = os.path.join(output_dir, f"training_logs_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt") |
There was a problem hiding this comment.
Since default log file is mentioned here. Check the functionality, if user passes custom log_file path, will it be overridden/taken?
|
|
||
| logger = Logger(__name__) | ||
| # Setting the path for dumping the log file | ||
| output_dir = Path(ConfigManager().config.training["output_dir"]) |
There was a problem hiding this comment.
At import time ConfigManager().config may reflect defaults, not the final config, and override will be missed, leading to wrong output_dir
There was a problem hiding this comment.
No, this was checked, when user provides the output_dir in the .yaml file, that path is taken.
There was a problem hiding this comment.
Okay, test the case where output_dir is given as CLI flag as well
6b51eec to
cbea003
Compare
…lidation statistics: 1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity 2. This function also captures number of trainable parameters, number of samples in training and eval dataset 3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file. Signed-off-by: abhamidi <abhamidi@qti.qualcomm.com>
b4feef9 to
ecfdf2b
Compare
quic-akuruvil
left a comment
There was a problem hiding this comment.
I can see AOT files in this PR. It should not come. Avoid rebasing with main. I will do that
|
Closing this PR as it was rebased against main. |
Added the following support for easy visualization of training and validation statistics:
Note: Sample output logger file for reference.
new_stack_dpp_4_device_run_report.txt