Skip to content

Logging support added for HF Trainer stack#938

Closed
quic-abhamidi wants to merge 1 commit intoquic:ft_experimental_v1from
quic-abhamidi:logging_feature
Closed

Logging support added for HF Trainer stack#938
quic-abhamidi wants to merge 1 commit intoquic:ft_experimental_v1from
quic-abhamidi:logging_feature

Conversation

@quic-abhamidi
Copy link
Copy Markdown

Added the following support for easy visualization of training and validation statistics:

  1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity
  2. This function also captures number of trainable parameters, number of samples in training and eval dataset
  3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file.

Note: Sample output logger file for reference.
new_stack_dpp_4_device_run_report.txt

Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase against ft_v1 branch. Not up to date with latest tip of ft_v1

logger = Logger(__name__)
# Setting the path for dumping the log file
output_dir = Path(ConfigManager().config.training["output_dir"])
log_file_name = os.path.join(output_dir, f"training_logs_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since default log file is mentioned here. Check the functionality, if user passes custom log_file path, will it be overridden/taken?


logger = Logger(__name__)
# Setting the path for dumping the log file
output_dir = Path(ConfigManager().config.training["output_dir"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At import time ConfigManager().config may reflect defaults, not the final config, and override will be missed, leading to wrong output_dir

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was checked, when user provides the output_dir in the .yaml file, that path is taken.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, test the case where output_dir is given as CLI flag as well

Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to run all the local FT pytests from the project repo. And it is passed

…lidation statistics:

1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity
2. This function also captures number of trainable parameters, number of samples in training and eval dataset
3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file.

Signed-off-by: abhamidi <abhamidi@qti.qualcomm.com>
Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see AOT files in this PR. It should not come. Avoid rebasing with main. I will do that

@quic-abhamidi
Copy link
Copy Markdown
Author

Closing this PR as it was rebased against main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants