Skip to content

Exploring the code capabilities of ModernBert in comparision with the former BERT model.

License

Notifications You must be signed in to change notification settings

Meeex2/ModernBert_Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModernBERT Tokenizer and Model Evaluation

This project explores the capabilities of ModernBERT and the former bert model by analyzing their tokenization processes and evaluating their performance on various text inputs. It is designed for educational purposes and serves as a starting point for deeper exploration of NLP models.

Features

  • Tokenizer Analysis: Analyze the tokenization behavior of BERT and ModernBert models.
  • Model Evaluation: Evaluate ModernBERT's embedding generation and hidden states.

Installation

  1. Clone the repository:

    git clone https://github.com/Meeex2/ModernBert_Benchmark.git
    cd ModernBert_Benchmark
  2. Install dependencies:

    pip install -r requirements.txt

Usage

1. Analyze Tokenization

Explore and run the tokenizer analysis script:

notebooks/tokenizer_analysis.ipynb

2. Evaluate ModernBERT

Explore and evaluate the model's performance:

notebooks/evaluation.ipynb

Requirements

  • Python 3.8 or higher
  • Libraries:
    • transformers
    • torch

Contributing

Contributions are welcome! Feel free to fork this repository, create a branch, and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • Hugging Face for the Transformers library
  • The research community for advancing NLP and transformer models

About

Exploring the code capabilities of ModernBert in comparision with the former BERT model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published