Fix some bugs by tastelikefeet · Pull Request #169 · modelscope/twinkle

tastelikefeet · 2026-04-18T15:58:10Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request introduces several bug fixes and feature enhancements across the library, including improved numerical stability for ORPO loss, robust LaTeX extraction for math rewards, and support for additional masks in data processing. Key updates include refined gradient accumulation handling, optimized logprob retrieval in vLLM sampling, and expanded unit normalization for evaluation. Review feedback suggests using idiomatic list comprehensions for prompt expansion in GRPO scripts to improve readability.

(cherry picked from commit cdf6bad)

tastelikefeet added 4 commits April 18, 2026 21:35

wip

d638074

fix

94b098b

fix

4c75a84

fix

7eed307

gemini-code-assist Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread cookbook/rl/grpo.py

Comment thread cookbook/rl/grpo_mm.py

fi

375d92c

Jintao-Huang approved these changes Apr 18, 2026

View reviewed changes

fix

0e7841f

tastelikefeet merged commit cdf6bad into modelscope:main Apr 18, 2026
1 of 4 checks passed

tastelikefeet added a commit that referenced this pull request Apr 18, 2026

Fix some bugs (#169)

894fccb

(cherry picked from commit cdf6bad)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some bugs#169

Fix some bugs#169
tastelikefeet merged 6 commits intomodelscope:mainfrom
tastelikefeet:fix/0418-4

tastelikefeet commented Apr 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tastelikefeet commented Apr 18, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants