TFMC updates + unbinned v6 fits w/ backgrounds by rbarrue · Pull Request #122 · HephyAnalysisSW/GOLLUM

rbarrue · 2026-05-05T17:53:51Z

In view of upcoming v6 fits with backgrounds, I updated the TFMC code to include the train and validation split used for the PNNs. I mostly took the code from pnn_training.py, updated to the TFMC specifics.

The original TFMC code builds the training sample by combining the events from each class into a single sample (keeping class labels) then shuffling them. To do something similar taking into account the train-validation split, I take the training fraction of events from each sample and then combine them into a single training sample, which I then shuffle. I do the same thing for the validation. My goal was to keep the same relative fraction of events from each class in both samples.

To get the inclusive cross-sections, which we need to convert classifier output probabilities into differential cross-section ratios, the original TFMC code iterated through the entire sample and accumulated the weights for each class. Now that we are no longer using the entire sample, I modified the code to use the training fraction of events to accumulate the weights. It will not give us the correct values, but it should keep the relative cross-section differences correct, which in my mind should lead to the correct DCR. I will double-check this with pen and paper and if I find something I'll change it. One way to get closer to the correct values would be to divide the obtained values by the training partition fraction. This relies on the splitting and shuffling being random enough that a fraction x of the sample has a sum of weights = ~ x*full cross-section. Changed the code to get the cross-sections accumulating from the entire sample, but only use the training fraction for training.

WIP so keeping as draft for now.

rbarrue · 2026-05-05T17:54:51Z

Main items to still be implemented:

implement access to last epoch and best epochs (via text files)
implement early stopping (will require changing yaml_loader._apply_defaults_and checks such that the default can be used)

Copilot

Pull request overview

This PR updates the TFMC training workflow to support the same UID-based train/validation split used for PNNs (needed for upcoming unbinned v6 fits with backgrounds), adds explicit validation-loss computation/plot outputs, and wires a TFMC classifier job into the 2016 unbinned v6-rate config. It also extends the YAML defaults logic so TFMC “classifier” jobs inherit the global splitting defaults.

Changes:

Added a TFMC validation-loss path (compute_loss) and updated TFMC training to produce train/val losses and separate train/val convergence plots.
Implemented UID-based train/validation splitting in tfmc_training.py, plus updated IC weight accumulation to run on the training partition.
Updated the unbinned v6-rate 2016 YAML to use a TFMC classifier with backgrounds and added a corresponding TFMC classifier job; updated YAML loader defaults to apply splitting to TFMC classifier jobs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File	Description
ML/TFMC/TFMC.py	Adds a no-update loss computation method for validation/test.
ML/TFMC/tfmc_training.py	Introduces UID train/val split handling, val loss logging/plotting, and adjusts IC accumulation strategy.
configs/unbinned_v6_rate/unbinned_2016_rate.yaml	Switches likelihood classifier to TFMC with backgrounds and adds a TFMC classifier job definition.
common/yaml_loader.py	Applies `defaults.splitting` to TFMC classifier jobs (`type: classifier`, `framework: tfmc`).

Comments suppressed due to low confidence (1)

ML/TFMC/tfmc_training.py:100

With --small, the output path is modified twice: _small is appended to cfg_base when constructing model_dir/plot_dir, and then _small is appended again to the final directory name. This creates a different directory layout than other trainers (and than TFMC previously), and may break resume/lookup logic. Consider using a single consistent suffix strategy (either in cfg_base or at the end, but not both).

cfg_base = os.path.join( cfg.get("version", "default"), J['region'] )

model_dir = os.path.join(user.model_directory, cfg_base+("_small" if args.small else ""), "TFMC", J["id"])
plot_dir  = os.path.join(user.plot_directory,  cfg_base+("_small" if args.small else ""), "TFMC", J["id"])

from common.helpers import copyIndexPHP
copyIndexPHP( plot_dir )
if args.small:
    model_dir += "_small"
    plot_dir  += "_small"
os.makedirs(model_dir, exist_ok=True)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

        if jtyp not in {"scaler", "pnn", "bit", "classifier"}:
            continue
        if jtyp == "classifier" and j.get("framework") != "tfmc":
            continue

        # splitting default (only pnn for now; keep bit/tfmc as comments)
        # if jtyp in {"pnn", "bit", "tfmc"} and default_splitting is not None:
-        if jtyp in {"pnn", "bit", "dnn_c2st"} and default_splitting is not None:
+        if jtyp in {"pnn", "bit", "dnn_c2st", "classifier"} and default_splitting is not None:
            if "splitting" not in j:
                j["splitting"] = default_splitting


…t + tidying outputs

rbarrue · 2026-05-06T10:09:21Z

Implemented a flag to do train-val splitting in iterate_epoch to allow cross-section calculation (+ yield check) done with all the events, as previously, but keeping the train-val split in the training.

…er of training objective

rbarrue · 2026-05-07T14:51:53Z

Given the issues seen with convergence in the presence of a large class imbalance (both in terms of weighted and unweighted numbers of events) implementing a new feature: give the inclusive cross-section ratios as prior probabilities (via softmax logits) and have the network learn around that.

First core implementation is in with hardcoded event number ratios, the next step is to connect it with the ratio calculation done to get the class weights.

… class reweighting in loss

…M or max walltime)

…from user

rbarrue · 2026-05-13T14:18:30Z

The implementation of setting the logit priors to the inclusive XS ratio is postponed, as we are now experimenting with simpler ways to have stable trainings. To avoid noise in the code, I removed that in commit 2060baa

We can return to that if we have exhausted all other options.

rbarrue · 2026-05-13T14:22:43Z

The main new feature implemented into TFMC training is Early Stopping. @Dorhand can you have a look ?

rbarrue · 2026-05-13T14:23:48Z

Additionally, I'm also storing the last and best epoch information. This will allow us to start trainings from the last epoch for cases where we e.g. ran into OOM or max. walltime.

I also implemented a feature both in the TFMC and PNN training to avoid restarting trainings which have properly finished, without using the --overwrite flag.

PS: I also implemented access to RDataLoader n_split access in TFMC training.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

rbarrue added 3 commits May 5, 2026 19:35

tfmc: compute loss without weight upgrade

7553fdd

tfmc_training: train-val split (inc. plots)

3c241ad

configs: bkgs in 2016 rate fit config - classifier and norm NPs

da54490

rbarrue requested review from Copilot and schoef May 5, 2026 17:53

Copilot started reviewing on behalf of rbarrue May 5, 2026 17:54 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

rbarrue added 2 commits May 6, 2026 11:03

tfmc_training: XS calculated on entire sample even for train-val spli…

b4a8c14

…t + tidying outputs

tfmc: adding reweighting to val/test loss computing function

ef65d43

rbarrue added 4 commits May 6, 2026 12:31

tfmc_training: logging rescaled train loss + other small fixes

38907a4

tfmc_training: re-adding shape plotting

7c1df8c

tfmc code: small fixes + comments + no split test job

1c7030a

[WIP] tfmc + training: setting inclusive cross-section ratios as cent…

aab1398

…er of training objective

rbarrue added 14 commits May 7, 2026 17:43

TFMC+training: dedicated config items for setting logit priors and do…

a7b65ec

… class reweighting in loss

configs: using logit priors for TFMC via config

07b6ee4

tfmc training: plotting probabilities (debug) + comments

909318a

code to plot TFMC losses

7ffbbc6

test config with strong regularization

a21b60c

tfmc_training: fixing bug in access to some parameters

3ecec67

TFMC: combined S vs. B trainings (with TTSemi in B)

d103778

TFMC: script to plot true DCR

9b7ac49

tfmc_training: non-default shard split from config

b8c6049

tfmc: set_logit_priors disabled by default in constructor

6d5e42e

tfmc: early stopping + store last and best epoch

1ae4c21

tfmc+pnn training: store file when training finished properly (vs. OO…

21ee81a

…M or max walltime)

tfmc+pnn: not continuing trainings finished properly without request …

775d11a

…from user

tfmc+training: removing set_logit_priors

2060baa

rbarrue marked this pull request as ready for review May 13, 2026 14:23

rbarrue requested review from Dorhand and Copilot May 13, 2026 14:26

Copilot started reviewing on behalf of rbarrue May 13, 2026 14:26 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

rbarrue added 3 commits May 13, 2026 16:45

tfmc+training: implement Copilot comments

affe6b3

tmfc+pnn training: implementing minor comments from Daohan

dde08b7

configs: updating configs with TFMC jobs

886c502

Conversation

rbarrue commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rbarrue commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rbarrue commented May 6, 2026

Uh oh!

rbarrue commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rbarrue commented May 13, 2026

Uh oh!

rbarrue commented May 13, 2026

Uh oh!

rbarrue commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rbarrue commented May 5, 2026 •

edited

Loading

rbarrue commented May 7, 2026 •

edited

Loading