Skip to content

Add Bert HF checkpoint converter#8088

Merged
ericharper merged 12 commits intomainfrom
yuya/add_bert_hf_converter
Jan 31, 2024
Merged

Add Bert HF checkpoint converter#8088
ericharper merged 12 commits intomainfrom
yuya/add_bert_hf_converter

Conversation

@yaoyu-33
Copy link
Collaborator

What does this PR do ?

Add BERT HF checkpoint converter and HF Bert Support in NeMo

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@github-actions github-actions bot added the NLP label Dec 27, 2023
yaoyu-33 and others added 2 commits January 22, 2024 13:38
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 changed the title [DRAFT] Add Bert HF checkpoint converter Add Bert HF checkpoint converter Jan 22, 2024
@yaoyu-33
Copy link
Collaborator Author

jenkins

@yaoyu-33
Copy link
Collaborator Author

jenkins

1 similar comment
@yaoyu-33
Copy link
Collaborator Author

jenkins

restore_from_path: null # used when starting from a .nemo file

trainer:
devices: 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make changes to this file ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default conversion script uses this yaml file and use 1 gpu by default. and other models like https://github.com/NVIDIA/NeMo/blob/36a31c03497985c8f2165d4f9be788b574443391/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml uses 1 gpu as well. Is there a reason to do 2?

# - /raid/data/pile/my-gpt3_00_text_document
# - .5
# - /raid/data/pile/my-gpt3_01_text_document
data_prefix: ???
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above reason, try to make conversion work easier, since it's directly load this yaml. If put ??? it will raise an error while loading I think

ffn_hidden_size: 3072 # Transformer FFN hidden size. Usually 4 * hidden_size.
num_attention_heads: 12
skip_head: False
transformer_block_type: post_ln
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is preln?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous default is preln. you want me to change it back? I am okay with it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericharper What do you think ?

activations_checkpoint_layers_per_pipeline=None,
layernorm_epsilon=1e-5,
normalization='layernorm',
transformer_block_type='pre_ln',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to change, but should we consider making these values enums ? So that people dont do typos ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that will affect all models, maybe add in nemo refactor plan


if skip_head:
self.post_process = False
if self.post_process:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not skip_head and if self.post_process : ?? Or maybe if self.post_process is used in other areas just leave it as such.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it affect other places as well. But I can change it all to if not skip_head and if self.post_process. Do you want me to change? I remember current way is cleaner in terms of code but not necessarily more readable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you think its cleaner then leave it as such.

shanmugamr1992
shanmugamr1992 previously approved these changes Jan 23, 2024
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Collaborator Author

jenkins

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Collaborator Author

jenkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants