For reproducibilty and extension of the work, I am trying this repo. I am able to generate superbpe based tokenizer but I believe Olmo pre-training code is missing from this repo. Requesting to provide instruction to install and train necessary models.
For reproducibilty and extension of the work, I am trying this repo. I am able to generate superbpe based tokenizer but I believe Olmo pre-training code is missing from this repo. Requesting to provide instruction to install and train necessary models.