HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation

News

August, 2025: Our HS-STaR has been accepted to EMNLP 2025!
November, 2025: Our code and all data has been opened in access!

Abstract: Self-taught reasoners (STaRs) enhance the mathematical reasoning abilities of large language models (LLMs) by leveraging self-generated responses for self-training. Recent studies have incorporated reward models to guide response selection or decoding, aiming to obtain higher-quality data. However, they typically allocate a uniform sampling budget across all problems, overlooking the varying utility of problems at different difficulty levels. In this work, we conduct an empirical study and find that problems near the boundary of the LLM's reasoning capability offer significantly greater learning utility than both easy and overly difficult ones. To identify and exploit such problems, we propose HS-STaR, a Hierarchical Sampling framework for Self-Taught Reasoners. Given a fixed sampling budget, HS-STaR first performs lightweight pre-sampling with a reward-guided difficulty estimation strategy to efficiently identify boundary-level problems. Subsequently, it dynamically reallocates the remaining budget toward these high-utility problems during a re-sampling phase, maximizing the generation of valuable training data. Extensive experiments across multiple reasoning benchmarks and backbone LLMs demonstrate that HS-STaR significantly outperforms other baselines without requiring additional sampling budget.

Installation

To create a new conda environment, run:

conda create -n HS-STaR python=3.10

To activate the environment and install packages:

conda activate HS-STaR
pip install -r requirements.txt

We should install latex2sympy locally:

cd dart_math/latex2sympy
pip install -e .

Prerequisites

Data Availability

We have released all the training data, including the data for step-wise initialization and the data from each iteration. It is available on the Hugging Face Hub at HS-STaR.

Model Availability

In addition, we provide two model checkpoints of Qwen2.5-7B. The first is M0, the model resulting from our step-wise initialization. The second is M3, the final converged model after three iterations of self-training.

Alternatively, you can train the initial model (M0) from scratch by running:

bash scripts/train_step_init.sh

Running & Evaluation

bash run.sh 5e-7 hs-star Qwen2.5-7B math_filtered middle qwen25-step-cot 3 numina

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
cfgs/deepspeed		cfgs/deepspeed
dart_math		dart_math
data/qwen		data/qwen
scripts		scripts
simplerl_math_eval		simplerl_math_eval
skywork		skywork
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation

News

Installation

Prerequisites

Data Availability

Model Availability

Running & Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AMAP-ML/HS-STaR

Folders and files

Latest commit

History

Repository files navigation

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation

News

Installation

Prerequisites

Data Availability

Model Availability

Running & Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages