Skip to content

Feature/ HICRA implementation#4997

Open
w601sxs wants to merge 2 commits intohuggingface:mainfrom
w601sxs:feature/hicra-implementation
Open

Feature/ HICRA implementation#4997
w601sxs wants to merge 2 commits intohuggingface:mainfrom
w601sxs:feature/hicra-implementation

Conversation

@w601sxs
Copy link

@w601sxs w601sxs commented Feb 6, 2026

What does this PR do?

This branch implements a new trainer that enables language models to develop emergent hierarchical reasoning capabilities through reinforcement
learning. HICRA is based on research from TIGER-AI-Lab (https://huggingface.co/papers/2509.03646).

Key Changes

New Components:

  • HICRATrainer (trl/trainer/hicra_trainer.py:1) - Main trainer extending GRPO
  • HICRAConfig (trl/trainer/hicra_config.py:21) - Configuration class for HICRA-specific parameters
  • Strategic Grams module (trl/trainer/strategic_grams.py:1) - Identifies high-level planning tokens
  • Comprehensive documentation (docs/source/hicra_trainer.md:1)

Example Scripts:

  • Training example (examples/scripts/hicra_training.py:1)
  • Strategic gram extraction utility (examples/scripts/extract_strategic_grams.py:1)

Test Coverage:

  • 920 lines of trainer tests (tests/test_hicra_trainer.py:1)
  • 636 lines of validation tests (tests/test_hicra_validation.py:1)

How It Works

HICRA amplifies the learning signal for "Strategic Grams" (planning tokens like "let's try a different approach" or "the key insight is"),
which enables models to:

  1. Separate high-level strategic planning from low-level execution
  2. Learn hierarchical reasoning patterns more efficiently than standard GRPO
  3. Achieve better performance on mathematical reasoning tasks

The implementation follows the VeRL reference from TIGER-AI-Lab and extends the existing GRPO trainer with hierarchy-aware advantage
modification.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

- Create trl/trainer/hicra_config.py with HICRAConfig class
- Add HICRA-specific parameters: hicra_alpha, use_hicra, hicra_entropy_topk, use_planning_tokens
- Add Strategic Gram configuration: strategic_grams_path, strategic_grams, sg_n_range
- Add logging configuration: log_semantic_entropy, log_planning_token_ratio
- Implement parameter validation in __post_init__
- Add comprehensive docstrings following TRL conventions

Implements task 1 from hierarchical-reasoner-trl spec.
Requirements: 3.1, 3.2, 3.3, 3.4, 3.5
This PR implements HICRA, a novel RL algorithm that enables LLMs to develop
hierarchical reasoning capabilities by amplifying learning signals for strategic
planning tokens.

Key features:
- HICRATrainer extending GRPOTrainer with advantage modification
- Strategic Gram utilities for identifying planning tokens
- Comprehensive test suite with 37+ tests
- Documentation and example scripts
- Full compatibility with existing TRL features (PEFT, distributed training, etc.)

Implementation follows the VeRL reference (TIGER-AI-Lab/Hierarchical-Reasoner)
and the paper 'Emergent Hierarchical Reasoning in LLMs through Reinforcement
Learning' (arXiv:2509.03646).

Changes:
- Add HICRAConfig extending GRPOConfig
- Add HICRATrainer with VeRL-based advantage modification
- Add Strategic Gram extraction and matching utilities
- Add comprehensive test suite (unit, integration, validation)
- Add documentation and example scripts
- Update README and exports
@w601sxs
Copy link
Author

w601sxs commented Feb 18, 2026

@qgallouedec or others any comment on this? we just want to add the HICRA algorithm to trl...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments