Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?

Official implementation of "Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?".

Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?

📢 Updates

August, 2025: Paper accepted by EMNLP Findings!

This repository contains the official implementation of the following paper:

Title: Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?
Link: https://arxiv.org/abs/2505.18215

Abstract

The rapid adoption of LLMs has overshadowed the potential advantages of traditional BERT-like models in text classification. This study challenges the prevailing "LLM-centric" trend by systematically comparing three categories of methods, i.e., BERT-like model fine-tuning, LLM internal state utilization, and LLM zero-shot inference across six challenging datasets. Our findings reveal that BERT-like models often outperform LLMs. We further categorize datasets into three types, perform PCA and probing experiments, and identify task-specific model strengths: BERT-like models excel in pattern-driven tasks, while LLMs dominate those requiring deep semantics or world knowledge. Subsequently, we conducted experiments on a broader range of text classification tasks to demonstrate the generalizability of our findings. We further investigated how the relative performance of different models varies under different levels of data availability. Finally, based on these findings, we propose TaMAS, a fine-grained task selection strategy, advocating for a nuanced, task-driven approach over a one-size-fits-all reliance on LLMs.

Installation

conda create -n <env_name> python=3.10
pip install -r requirements.txt

Dataset Preparation

Download the following datasets and place them in a directory named dataset. We provide the pre-split datasets ToxiCloakCNBase, ToxiCloakCNEmoji, and ToxiCloakCNHomo, which can be used directly. For the other datasets, download them using the links provided in the paper and organize them into a similar format.

Compared Methods

Run the following commands to execute the BERT-like models approaches:

python3 -m pipeline.bert_finetune  
python3 -m pipeline.electra_finetune  
python3 -m pipeline.ernie_finetune

Run the following commands to execute the LLM internal state utilization approaches:

python3 -m pipeline.run_all_saplma  
python3 -m pipeline.run_all_mm

Run the following command to perform LLM zero-shot inference:

python3 -m pipeline.llm_ask

For English datasets, please remember to replace the models with their English counterparts as specified in the paper.

Visualization

If visualization as presented in the paper is required, internal representations can be collected during training, and the code provided in the visualization/ directory can be used for generating visualizations.

Cite

@article{zhang2025bert,
  title={Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?},
  author={Zhang, Junyan and Huang, Yiming and Liu, Shuliang and Gao, Yubo and Hu, Xuming},
  journal={arXiv preprint arXiv:2505.18215},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
pipeline		pipeline
visualization		visualization
README.md		README.md
TaMAS.pdf		TaMAS.pdf
TaMAS.png		TaMAS.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?

📢 Updates

Abstract

Installation

Dataset Preparation

Compared Methods

Visualization

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?

📢 Updates

Abstract

Installation

Dataset Preparation

Compared Methods

Visualization

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages