Skip to content

Source code for paper "Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models"

License

Notifications You must be signed in to change notification settings

thaihungle/Memory-R

Repository files navigation

Memory-R

Code Reference:

Setup

# Install Python
conda create -n memoryr python=3.11
conda activate memoryr
# Install other dependencies
pip install -r requirements.txt

Experiments

Dataset: GSM8K

Training

  • Qwen2.5-0.5B-Instruct:
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=r1 --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Pure RL
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=cosine --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Cosine length reward
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=memoryr --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Memory-R (Only Exploit)
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=memoryr+ --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Memory-R+ (Exploit+Explore)
  • Llama3.2-1B-Instruct (note: needs num_shots=1 to have at least 1 correct answer for learning):
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=r1 --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=cosine --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=memoryr --k=1 --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=memoryr+ --k=1 --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
  • Falcon3-1B-Instruct:
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=r1 --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=cosine --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=memoryr --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=memoryr+ --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200

Evaluation

  • Using LightEval:
python run_eval.py --task=gsm8k --model_name=path/to/model/

About

Source code for paper "Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors