Memory-R

Paper preprint: https://arxiv.org/abs/2504.02273

Code Reference:

Setup

# Install Python
conda create -n memoryr python=3.11
conda activate memoryr
# Install other dependencies
pip install -r requirements.txt

Experiments

Dataset: GSM8K

Training

Qwen2.5-0.5B-Instruct:

python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=r1 --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Pure RL
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=cosine --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Cosine length reward
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=memoryr --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Memory-R (Only Exploit)
python run_gsm8k.py --model_name=Qwen/Qwen2.5-0.5B-Instruct --use_ir=memoryr+ --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200 # Memory-R+ (Exploit+Explore)

Llama3.2-1B-Instruct (note: needs num_shots=1 to have at least 1 correct answer for learning):

python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=r1 --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=cosine --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=memoryr --k=1 --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200
python run_gsm8k.py --model_name=meta-llama/Llama-3.2-1B-Instruct --use_ir=memoryr+ --k=1 --num_shots=1 --nepochs=1 --seed 0 --bs 2 --gc 8 --L 200

Falcon3-1B-Instruct:

python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=r1 --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=cosine --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=memoryr --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200
python run_gsm8k.py --model_name=tiiuae/Falcon3-1B-Instruct --use_ir=memoryr+ --k=1 --num_shots=0 --nepochs=1 --seed 0 --bs 1 --gc 16 --L 200

Evaluation

Using LightEval:

python run_eval.py --task=gsm8k --model_name=path/to/model/

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
evaluation		evaluation
rewards		rewards
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run_eval.py		run_eval.py
run_gsm8k.py		run_gsm8k.py
utils_gsm8k.py		utils_gsm8k.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memory-R

Code Reference:

Setup

Experiments

Dataset: GSM8K

Training

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

thaihungle/Memory-R

Folders and files

Latest commit

History

Repository files navigation

Memory-R

Code Reference:

Setup

Experiments

Dataset: GSM8K

Training

Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages