Official implementation of "Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?".
- August, 2025: Paper accepted by EMNLP Findings!
This repository contains the official implementation of the following paper:
Title: Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?
Link: https://arxiv.org/abs/2505.18215
The rapid adoption of LLMs has overshadowed the potential advantages of traditional BERT-like models in text classification. This study challenges the prevailing "LLM-centric" trend by systematically comparing three categories of methods, i.e., BERT-like model fine-tuning, LLM internal state utilization, and LLM zero-shot inference across six challenging datasets. Our findings reveal that BERT-like models often outperform LLMs. We further categorize datasets into three types, perform PCA and probing experiments, and identify task-specific model strengths: BERT-like models excel in pattern-driven tasks, while LLMs dominate those requiring deep semantics or world knowledge. Subsequently, we conducted experiments on a broader range of text classification tasks to demonstrate the generalizability of our findings. We further investigated how the relative performance of different models varies under different levels of data availability. Finally, based on these findings, we propose TaMAS, a fine-grained task selection strategy, advocating for a nuanced, task-driven approach over a one-size-fits-all reliance on LLMs.
conda create -n <env_name> python=3.10
pip install -r requirements.txtDownload the following datasets and place them in a directory named dataset. We provide the pre-split datasets ToxiCloakCNBase, ToxiCloakCNEmoji, and ToxiCloakCNHomo, which can be used directly. For the other datasets, download them using the links provided in the paper and organize them into a similar format.
Run the following commands to execute the BERT-like models approaches:
python3 -m pipeline.bert_finetune
python3 -m pipeline.electra_finetune
python3 -m pipeline.ernie_finetuneRun the following commands to execute the LLM internal state utilization approaches:
python3 -m pipeline.run_all_saplma
python3 -m pipeline.run_all_mmRun the following command to perform LLM zero-shot inference:
python3 -m pipeline.llm_askFor English datasets, please remember to replace the models with their English counterparts as specified in the paper.
If visualization as presented in the paper is required, internal representations can be collected during training, and the code provided in the visualization/ directory can be used for generating visualizations.
@article{zhang2025bert,
title={Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?},
author={Zhang, Junyan and Huang, Yiming and Liu, Shuliang and Gao, Yubo and Hu, Xuming},
journal={arXiv preprint arXiv:2505.18215},
year={2025}
}