ModernBERT Tokenizer and Model Evaluation

This project explores the capabilities of ModernBERT and the former bert model by analyzing their tokenization processes and evaluating their performance on various text inputs. It is designed for educational purposes and serves as a starting point for deeper exploration of NLP models.

Features

Tokenizer Analysis: Analyze the tokenization behavior of BERT and ModernBert models.
Model Evaluation: Evaluate ModernBERT's embedding generation and hidden states.

Installation

Clone the repository:

git clone https://github.com/Meeex2/ModernBert_Benchmark.git
cd ModernBert_Benchmark

Install dependencies:
```
pip install -r requirements.txt
```

Usage

1. Analyze Tokenization

Explore and run the tokenizer analysis script:

notebooks/tokenizer_analysis.ipynb

2. Evaluate ModernBERT

Explore and evaluate the model's performance:

notebooks/evaluation.ipynb

Requirements

Python 3.8 or higher
Libraries:
- transformers
- torch

Contributing

Contributions are welcome! Feel free to fork this repository, create a branch, and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Hugging Face for the Transformers library
The research community for advancing NLP and transformer models

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModernBERT Tokenizer and Model Evaluation

Features

Installation

Usage

1. Analyze Tokenization

2. Evaluate ModernBERT

Requirements

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Meeex2/ModernBert_Benchmark

Folders and files

Latest commit

History

Repository files navigation

ModernBERT Tokenizer and Model Evaluation

Features

Installation

Usage

1. Analyze Tokenization

2. Evaluate ModernBERT

Requirements

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages