Hyperreasoning is a JetBrains/Codex hackathon project for DSL-guided code search with a Rainbow-style branch controller.
The core idea is to search over structured implementation plans instead of raw code edits:
- generate compact candidate plans in a DSL
- rank and traverse candidate branches with heuristic or Rainbow policies
- compile and verify promising plans against task tests
- surface the search process in a native JetBrains tool window
- compare Rainbow against heuristic, random, and one-shot LLM baselines
The repo contains an end-to-end demo stack:
- Python search-control environment and verifier
- plan DSL proposal and compiler interfaces
- local FastAPI backend for IDE-triggered task runs
- native JetBrains plugin with live search-graph visualization
- offline dataset collection and Rainbow training scripts
- held-out evaluation reports and benchmark artifacts
The current implementation is optimized for clarity and hackathon demo reliability.
Primary held-out eval split: data/splits/eval_10.txt
Current headline result:
| Method | Solve rate | Mean tests passed | Mean time | Mean tokens |
|---|---|---|---|---|
| Rainbow | 80.0% | 80.0% | 15,353 ms | 1,694 |
| Heuristic | 40.0% | 40.0% | 23,144 ms | 2,340 |
| Random | 30.0% | 37.5% | 23,009 ms | 2,342 |
| One-shot | 30.0% | 37.5% | 31,016 ms | 3,649 |
See:
- Eval results
- Codex benchmark comparison
- Summary metrics CSV
- Solve-rate chart
- Rainbow vs heuristic chart
- Rainbow vs one-shot chart
Best observed Rainbow checkpoint:
artifacts/models/rainbow_offline_v1_1776237078/best.pt
The primary demo surface is the JetBrains plugin in jetbrains-plugin/.
Demo flow:
- Start the local backend.
- Open the plugin tool window in IntelliJ/PyCharm.
- Open a task file or task folder.
- Run
Rainbow,Heuristic,1-Shot LLM, orCompare. - Show the live search graph, verifier status, and ranked results.
Useful docs:
backend/- FastAPI bridge for task-time search runs and live graph eventsjetbrains-plugin/- native JetBrains plugin and search visualization UIenv/- search-control runtime, verifier integration, rewards, state encoderllm/- DSL proposal, compiler, repair, and prompt utilitiesrl/- Rainbow/C51 implementation (https://arxiv.org/abs/1710.02298)models/- neural network componentsdata/- task store, splits, transition schemas, replay dataset helpersscripts/- data collection, training, eval, serving, and debug entrypointstests/- backend, env, data, llm, and rl testsdocs/- reports, event contract, backend/plugin notes, charts, CSVsartifacts/- generated synthetic data, model checkpoints, and run outputsagents/- operator runbook and planning notes
conda run -n hyperreasoning python scripts/serve/run_backend.py \
--host 127.0.0.1 \
--port 8765 \
--llm-base-url http://127.0.0.1:8080conda run -n hyperreasoning python -m pytest -q testscd jetbrains-plugin
JAVA_HOME='/Applications/IntelliJ IDEA.app/Contents/jbr/Contents/Home' ./gradlew buildcd jetbrains-plugin
JAVA_HOME='/Applications/IntelliJ IDEA.app/Contents/jbr/Contents/Home' ./gradlew runIdeconda run -n hyperreasoning python scripts/train/train_rainbow.py \
--run-dirs artifacts/synthetic/heuristic_bulk_v1 artifacts/synthetic/hybrid_enrichment_v1 artifacts/synthetic/hybrid_enrichment_v2 \
--offline-updates 5000 \
--batch-size 128 \
--buffer-capacity 100000 \
--experiment-name rainbow_offline_v1conda run -n hyperreasoning python scripts/eval/eval_baselines.py \
--task-manifest data/splits/eval_10.txt \
--num-tasks 10 \
--episodes-per-task 1 \
--policy all \
--checkpoint artifacts/models/rainbow_offline_v1_1776237078/best.pt \
--run-tests \
--max-verified-plans-per-task 1- Synthetic data lives under
artifacts/synthetic/<run_id>/. - Checkpoints live under
artifacts/models/. - Split manifests live under
data/splits/. - Eval charts and CSVs live under
docs/. - The trainer consumes
dataset.jsonlfiles from collected runs. - Action masking is part of the transition contract.