From a49759094b6588a98dab1faca15ea5fd173ab407 Mon Sep 17 00:00:00 2001 From: Robert Schoefbeck Date: Sun, 29 Mar 2026 12:21:58 +0200 Subject: [PATCH 01/17] claude-rs-devel --- ML/BIT/CLAUDE.md | 200 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 200 insertions(+) create mode 100644 ML/BIT/CLAUDE.md diff --git a/ML/BIT/CLAUDE.md b/ML/BIT/CLAUDE.md new file mode 100644 index 0000000..20f0b3a --- /dev/null +++ b/ML/BIT/CLAUDE.md @@ -0,0 +1,200 @@ +# BIT Optimization Project + +## Goal +Optimize the Boosted Information Tree (BIT) algorithm for: +1. **Training time** (primary) +2. **Evaluation/inference time** (primary) +3. **Memory consumption** (important, spikes are damaging) +4. **I/O / data loading** (secondary, optimize if it shows up as bottleneck) + +## Files to Optimize +All in `/users/robert.schoefbeck/claude/GOLLUM/ML/BIT/`: +- `NumbaMultiNode.py` — core tree node logic, split finding, Numba kernels +- `NumbaBIT.py` — boosted tree training loop +- `pdf_bit_training.py` — training script / entry point +- `../../data/RDataLoader.py` — data loader (optimize only if I/O is a bottleneck) + +## Working Directory +Always run benchmarks from: +``` +/users/robert.schoefbeck/claude/GOLLUM/ML/BIT +``` + +## Benchmark Command +```bash +memray run --output benchmark.bin pdf_bit_training.py \ + ../../configs/benchmark/unbinned_delphes_6_RunII.yaml \ + --every 1 \ + --job bit_NG_PDF4LHC21_6_tt2l_delphes \ + --overwrite \ + --max_n_files 1 \ + --profile \ + --postfix +``` + +Followed immediately by: +```bash +memray stats benchmark.bin +``` + +## Mandatory Flags (always include these) +| Flag | Reason | +|------|--------| +| `--every 1` | Required for meaningful results | +| `--profile` | Enables CPU profiling output to stdout | +| `--job bit_NG_PDF4LHC21_6_tt2l_delphes` | Selects the benchmark job | +| `--overwrite` | Overwrites earlier results so comparisons are clean | + +## Variable Flags +- `--max_n_files 1` — default; use a small number for quick iterations. Increase (e.g. to 3 or 5) if a heavier test is needed to confirm a result is not noise. +- `--postfix