Skip to content

Hf tok pipeline#2

Merged
TevenLeScao merged 4 commits intobigscience-workshop:mainfrom
sbmaruf:hf-tok-pipeline
Jul 17, 2021
Merged

Hf tok pipeline#2
TevenLeScao merged 4 commits intobigscience-workshop:mainfrom
sbmaruf:hf-tok-pipeline

Conversation

@sbmaruf
Copy link
Copy Markdown
Collaborator

@sbmaruf sbmaruf commented Jul 15, 2021

This pull implements huggingface tokenizer into the pre-processing script.

Comment thread megatron/tokenizer/tokenizer.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants