Feature/ HICRA implementation by w601sxs · Pull Request #4997 · huggingface/trl

w601sxs · 2026-02-06T21:04:59Z

What does this PR do?

This branch implements a new trainer that enables language models to develop emergent hierarchical reasoning capabilities through reinforcement
learning. HICRA is based on research from TIGER-AI-Lab (https://huggingface.co/papers/2509.03646).

Key Changes

New Components:

HICRATrainer (trl/trainer/hicra_trainer.py:1) - Main trainer extending GRPO
HICRAConfig (trl/trainer/hicra_config.py:21) - Configuration class for HICRA-specific parameters
Strategic Grams module (trl/trainer/strategic_grams.py:1) - Identifies high-level planning tokens
Comprehensive documentation (docs/source/hicra_trainer.md:1)

Example Scripts:

Training example (examples/scripts/hicra_training.py:1)
Strategic gram extraction utility (examples/scripts/extract_strategic_grams.py:1)

Test Coverage:

920 lines of trainer tests (tests/test_hicra_trainer.py:1)
636 lines of validation tests (tests/test_hicra_validation.py:1)

How It Works

HICRA amplifies the learning signal for "Strategic Grams" (planning tokens like "let's try a different approach" or "the key insight is"),
which enables models to:

Separate high-level strategic planning from low-level execution
Learn hierarchical reasoning patterns more efficiently than standard GRPO
Achieve better performance on mathematical reasoning tasks

The implementation follows the VeRL reference from TIGER-AI-Lab and extends the existing GRPO trainer with hierarchy-aware advantage
modification.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

- Create trl/trainer/hicra_config.py with HICRAConfig class - Add HICRA-specific parameters: hicra_alpha, use_hicra, hicra_entropy_topk, use_planning_tokens - Add Strategic Gram configuration: strategic_grams_path, strategic_grams, sg_n_range - Add logging configuration: log_semantic_entropy, log_planning_token_ratio - Implement parameter validation in __post_init__ - Add comprehensive docstrings following TRL conventions Implements task 1 from hierarchical-reasoner-trl spec. Requirements: 3.1, 3.2, 3.3, 3.4, 3.5

This PR implements HICRA, a novel RL algorithm that enables LLMs to develop hierarchical reasoning capabilities by amplifying learning signals for strategic planning tokens. Key features: - HICRATrainer extending GRPOTrainer with advantage modification - Strategic Gram utilities for identifying planning tokens - Comprehensive test suite with 37+ tests - Documentation and example scripts - Full compatibility with existing TRL features (PEFT, distributed training, etc.) Implementation follows the VeRL reference (TIGER-AI-Lab/Hierarchical-Reasoner) and the paper 'Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning' (arXiv:2509.03646). Changes: - Add HICRAConfig extending GRPOConfig - Add HICRATrainer with VeRL-based advantage modification - Add Strategic Gram extraction and matching utilities - Add comprehensive test suite (unit, integration, validation) - Add documentation and example scripts - Update README and exports

w601sxs · 2026-02-18T15:05:58Z

@qgallouedec or others any comment on this? we just want to add the HICRA algorithm to trl...

w601sxs added 2 commits December 22, 2025 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/ HICRA implementation#4997

Feature/ HICRA implementation#4997
w601sxs wants to merge 2 commits intohuggingface:mainfrom
w601sxs:feature/hicra-implementation

w601sxs commented Feb 6, 2026 •

edited

Loading

Uh oh!

w601sxs commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

w601sxs commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

w601sxs commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

w601sxs commented Feb 6, 2026 •

edited

Loading