Skip to content

Port CVH refit + WMass NanoAOD customizations to CMSSW_15_0_19_patch2#46

Draft
davidwalter2 wants to merge 140 commits into
WMass:WmassNanoProd_15_0_19_patch2from
davidwalter2:WmassNanoProd_15_0_19_patch2_dev
Draft

Port CVH refit + WMass NanoAOD customizations to CMSSW_15_0_19_patch2#46
davidwalter2 wants to merge 140 commits into
WMass:WmassNanoProd_15_0_19_patch2from
davidwalter2:WmassNanoProd_15_0_19_patch2_dev

Conversation

@davidwalter2
Copy link
Copy Markdown

@davidwalter2 davidwalter2 commented May 13, 2026

Summary

First port of the CVH (Continuous Variable Helix) track-refit infrastructure and the WMass custom-NanoAOD producers from WmassNanoProd_10_6_26 to CMSSW_15_0_19_patch2 (el9_amd64_gcc12, Geant4 11.x). The full tree builds clean; this PR is open as Draft for early review of the porting decisions before validation work proceeds.

What this PR contains

  1. 105 cherry-picks of CVH-related commits from WmassNanoProd_10_6_26, in chronological order, with original authorship preserved (Josh Bendavid, Kenneth Long, et al.). These bring forward the CVH residual-maker plugins, two-track G4e fit, value-map producers, and Analysis/HitAnalyzer per-channel configs.

  2. Bruschini's CVH port (7322cb04daa + d51e8e11c92 from erc-asymow/WMassRefits_CMSSW_15_1_X) layered on top. This contributes the Geant4-11.x-clean *ForCVH propagator + custom physics classes (Wentzel VI MSC, Universal Fluctuation, Energy-Loss tables, Physics List, ErrorEnergyLoss, ErrorMessenger). Co-authored credit to Davide Bruschini, Marco Musich and Josh Bendavid.

  3. Glue commits authored here:

    • Remove orphan *Custom.{h,cc} files superseded by Bruschini's *ForCVH variants
    • Revert the PolyFit3D re-introduction from Bruschini's commit (see exclusions below)
    • Port NanoAOD FlatTable API (drop IntColumn/FloatColumn enum args; addColumn now infers type)
    • Include G4SystemOfUnits.hh in Geant4ePropagator so unqualified CLHEP units compile against G4 11.x
    • Port residual-makers to 15_0 API: ESHandle → ESGetToken (canonical initializer-list style), SiStripClusterInfo per-producer construction + initEvent/setCluster, Eigen::allEigen::placeholders::all

What this PR intentionally excludes

  • MagneticField/ParametrizedEngine/PolyFit3DParametrizedMagneticField — the WMass thread-unsafe 3D field-map wrapper. PolyFit3D was removed from cms-sw upstream long ago for thread-safety reasons. The plan is to replace it with the ScalarPot3D spherical-harmonic basis (see project memory note in calibration_studies) in a Phase 2 follow-up, which is thread-safe by construction and is the path forward for the upcoming 50-parameter scalar-potential fit.
  • PR cms-sw Fixed backward propagator cms-sw/cmssw#32833 backport (10_6-only ESGetToken backport) — already present natively in 15_0.
  • remove dd4hep part (fd80ce811bf) — contrary to 15_0 where dd4hep is the default geometry layer.
  • v720 MagneticFieldLabel everywhere (2f7cca45f37) — 10_6 ESHandle-based label routing. The same functionality will be re-introduced cleanly via 15_0's ESGetToken ESInputTag mechanism, alongside the ScalarPot3D ESProducer in Phase 2.
  • Non-CVH commits in PhysicsTools/NanoAOD (gen-weight handling, LHE info, Mu T&P customizations) — these are not strictly CVH-related and the 15_0 NanoAOD framework is structurally different (modular common_cff / custom_<area>_cff pattern). Deferred to a separate port if/when needed.

Build state

$ scram b -j8
>> All python modules compiled

No errors. All 9 cms-addpkg'd packages built (Analysis/HitAnalyzer, CalibTracker/SiPixelESProducers, MagneticField/ParametrizedEngine, PhysicsTools/NanoAOD, RecoLocalTracker/SiPixelRecHits, RecoLocalTracker/SiStripRecHitConverter, SimG4Core/{GeometryProducer,MagneticField}, TrackingTools/TransientTrack, TrackPropagation/Geant4e).

What is NOT yet done

  • Closure tests (Geant4e propagator state-vector + Jacobian agreement at 1e-6 vs the 10_6_26 reference)
  • CVH J/ψ refit validation on an ALCARECO smoke sample
  • Phase 2: layer the WmassCalib_Bfield_10_6_26 deltas (ScalarPot3D, Phase B Bx/By transport Jacobian, residual-maker scalar-potential switch)
  • MagneticFieldLabel routing through SiPixel/SiStrip CPE ESProducers + TransientTrackBuilder + GeometryProducer (deferred from v720; will land with ScalarPot3D ESProducer)
  • Re-introduction (if needed) of LHE info / gen-weight / Mu T&P NanoAOD producers as a 15_0-style customization module

Test plan

  • scram b -j8 clean (✓ already verified)
  • Geant4e propagator closure test on a 40 GeV muon — state + Jacobian agreement ≤ 1e-6 (λ, qop) and ≤ 1e-2 (φ) vs 10_6_26
  • Field validation plots from MagneticField/Engine/test/validateField.cfg
  • CVH J/ψ refit closure on a 48-event ALCARECO smoke sample
  • NanoAOD branch parity vs the 10_6 reference
  • Single-job CRAB test

🤖 Generated with Claude Code

bendavid and others added 30 commits May 13, 2026 17:04
… info for wedge modules, restore ability to apply linearized corrections to two track output, and additional disabled debugging
… hits), exclude loopers from two track fit, add muon matching and id info to two track fit
davidwalter2 and others added 23 commits May 13, 2026 19:43
G4 10.4 defaulted to cubic-spline interpolation of the dE/dx / range /
inverse-range tables used by the propagator; the WMass fork picked that
up implicitly via G4EmParameters::Instance()->Spline(), which was removed
in G4 11.x with a class-level default of false.

The W mass momentum scale calibration was tuned against spline-interpolated
tables, so flip the default back to true in G4TablesForExtrapolatorForCVH
to preserve that behaviour and avoid an unintended linear-interp shift in
the most-probable dE/dx at the per-mille level on muons.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the step-2 chain for the KS->pi+pi- and Lambda0->p pi- alignment
channels, mirroring the existing D*->D0 pi step-2 implementation.

Step-2 reads the per-channel ALCARECO output (ALCARECOTkAlKsToPiPi /
ALCARECOTkAlLambdaToProtonPi), re-pairs the deduplicated daughter tracks,
applies V0Producer-like displaced-vertex cuts, then runs the existing
ResidualGlobalCorrectionMakerTwoTrackKPiG4e CVH refit on the chosen pair.
The ALCARECO format is treated as just "tracks + hits + clusters", with
no assumption that pair structure or V0 candidate info is preserved -- the
candidate finder rebuilds candidates from scratch.

Files:

* plugins/V0CandidateProducer.cc -- analog of DstToD0PiCandidateProducer
  for 2-body V0 decays. Iterates opposite-sign pairs from the input
  TrackCollection, runs KinematicParticleVertexFitter with configurable
  per-daughter masses, applies vertex p-value, mass-window, transverse
  pointing, and Lxy/sigma cuts using a fresh BeamSpot. For asymmetric
  decays (Lambda) tries both (m1, m2) and (m2, m1) assignments and keeps
  the one closer to the nominal V0 mass. Outputs the daughter tracks of
  the highest-pT surviving candidate in pair order, suitable for the
  downstream ntuplizer with respectTrackOrder=True.

* plugins/BuildFile.xml -- adds DataFormats/BeamSpot dependency
  (V0 displaced-vertex cuts need the beamspot; D* candidate finder did
  not require it because the 3-body Delta-mass cut was sufficient).

* python/V0CandidateProducer_cfi.py -- defaults tuned for KS;
  override daughterMass1, expectedV0Mass, mass window, and
  tryBothAssignments for Lambda.

* python/ResidualGlobalCorrectionMakerTwoTrackPiPiG4e_cfi.py
* python/ResidualGlobalCorrectionMakerTwoTrackProtonPiG4e_cfi.py
  -- per-channel cfi clones of the existing CVH ntuplizer C++ class
  (no new C++ -- the class is already mass-parameterised). The KS
  variant sets both daughter masses to the pion mass; the Lambda
  variant sets daughter1 to the proton mass and daughter2 to the
  pion mass, with respectTrackOrder=True.

* test/runGlobalCorRecKsFromAlca.py
* test/runGlobalCorRecLambdaFromAlca.py
  -- runners that load the corresponding cfis, produce a fresh
  offlineBeamSpot from the standard service (BeamSpot is not stored
  in the ALCARECO output), and chain V0CandidateProducer -> CVH
  ntuplizer. Validated on the existing TkAlKsToPiPi_8h.root and
  TkAlLambdaToProtonPi_8h.root files: KS peak at PDG within 0.001 MeV,
  Lambda peak at PDG within 0.5 MeV, both with ~5 MeV resolution.

Note: a follow-up cleanup is planned to generalise
ResidualGlobalCorrectionMakerTwoTrackG4e (the symmetric J/psi version)
to take per-daughter masses via cfi, retire the K-pi-specific clone,
and have one canonical class for all two-body decays. That refactor
is orthogonal to this commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two cleanup steps building on the previous KS/Lambda step-2 commit:

1) Unify ResidualGlobalCorrectionMakerTwoTrackG4e and
   ResidualGlobalCorrectionMakerTwoTrackKPiG4e into a single class.

   The two classes shared >2700 lines of identical code; the K-Pi
   variant was a clone-and-modify with three real additions on top of
   the symmetric J/psi version: per-daughter masses (kaonMass/pionMass),
   a respectTrackOrder flag, and an L1 trigger decision block. All three
   are useful generically (not K-Pi specific), so the right move is to
   keep the more general code and retire the original.

   Concretely, ResidualGlobalCorrectionMakerTwoTrackG4e.cc is replaced
   by the (renamed) K-Pi version with parameter names changed to be
   channel-agnostic:
     kaonMass    -> daughterMass1
     pionMass    -> daughterMass2
     kaonMassErr -> daughterMass1Err
     pionMassErr -> daughterMass2Err
   All new parameters (daughterMass1/2, daughterMass1/2Err, minPairMass,
   maxPairMass, respectTrackOrder, doL1Trigger, l1Results, l1Triggers)
   are made optional with backward-compatible defaults: muon mass for
   the daughters, no pair-mass cut, no track-order constraint, no L1
   filter. So the existing J/psi/Upsilon runner cfgs (which simply
   don't pass any of these) reduce exactly to the previous behaviour.

   ResidualGlobalCorrectionMakerTwoTrackKPiG4e.cc is deleted. The three
   channel-specific cfi clones (KPi for D*, PiPi for KS, ProtonPi for
   Lambda) are updated to instantiate the unified class with the new
   parameter names; their content is otherwise unchanged.

2) Split the generic V0CandidateProducer_cfi into channel-specific
   cfis matching the DstToD0PiCandidateProducer_cfi pattern.

   V0CandidateProducer_cfi.py is replaced by:
     KsToPiPiCandidateProducer_cfi.py        (KS-tuned: both pions, symmetric)
     LambdaToProtonPiCandidateProducer_cfi.py (Lambda-tuned: p+pi, asymmetric)

   The C++ class V0CandidateProducer is unchanged; only the cfi
   wrappers are now per-channel. Runners load the matching cfi.

Validation: all three V0 step-2 channels reproduce identical entry
counts and mass-peak positions before and after the refactor:
  KS     1985 entries, peak at 0.4976 GeV (PDG 0.4976)
  Lambda  807 entries, peak at 1.1162 GeV (PDG 1.1157)
  D*      152 entries, peak at 1.860  GeV (PDG 1.8648)
The legacy J/psi runner cfg loads cleanly under the unified class
(falling back to muon-mass defaults via existsAs<>); a future bit-level
J/psi tree comparison would complete the validation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New cfi parameters (default off, no behaviour change for existing
J/psi/Upsilon/D*/V0 runners):
  doPointingConstraint  - enable the soft chi^2 pointing term
  pointingSigma         - angular pointing resolution in radians (default 1e-3)

The constraint requires the V0 transverse momentum to align with the
flight vector from the beamspot to the secondary vertex. One scalar
chi^2 term per event with sigma_g = pointingSigma * Lxy * |p_xy|, applied
once at the start of the per-iteration constraint block (id == 0 guard);
+1 ndof. Touches vertex-state indices 0..5 (daughter qop/lambda/phi)
and 7..8 (vertex xy); index 6 (d0) and 9 (vertex z) are not coupled.

Validated on KS (1985 -> 1949 entries, mass RMS 4.59 -> 4.56 MeV) and
Lambda (807 -> 766 entries, RMS 17.6 -> 17.5 MeV). D0 chi^2 increases
by +15 with no RMS gain — the BS reference is too coarse for prompt
charm and a PV reference would be needed there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three optional InputTags to the CVH two-track ntuplizer:
dedxHarmonic2, dedxPixelHarmonic2 and dedxAllHarmonic2. When set, the
producer reads the corresponding ValueMap<float>s projected onto the
ALCARECO selected-track collection (via DeDxValueMapProjector in the
ALCARECO step) and fills new branches Mu{plus,minus}_dedx{,Pixel,All}Harmonic2
in the per-event tree.

The ALCARECO ValueMaps are keyed on the V0DaughterTrackProducer output
collection, which differs from the V0CandidateProducer output the CVH
ntuplizer reads downstream. Lookup is done by matching each fit track's
preserved TrackExtraRef.key() against the source collection (a new
dedxSourceTracks InputTag identifies that source). The joint estimator
read is independently optional via mayConsume so legacy ALCARECO inputs
without that ValueMap fall back to NaN per-track values.

Channel cfis (PiPi, ProtonPi, KPi) point at the matching ALCARECO
ValueMap names by default. Runners are repointed to the multifile
test ALCARECO output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
D0/D* (DstToD0PiCandidateProducer): harmonize the D0, D*, and delta-m
windows with the Stage-1 ALCARECO selector (AlignmentThreeBodyDecayTrackSelector
in ALCARECOTkAlDstToD0Pi_cff.py): D0 mass to PDG +/-50 MeV [1.81483, 1.91483],
D* mass to [1.860, 2.160], delta-m to [0.14243, 0.14843] (+/-3 MeV around
0.14543). The previous looser windows admitted swapped K/pi mass-assignment
configurations that the ALCARECO already rejected.

KS (KsToPiPiCandidateProducer): V0 mass window tightened from +/-100 MeV
to [0.44, 0.56] (PDG +/-60 MeV) to suppress combinatorial background.

Lambda (LambdaToProtonPiCandidateProducer): V0 mass window tightened from
+/-65 MeV to [1.07, 1.16] (-46 MeV / +44 MeV around PDG) to suppress
combinatorial background.

KS and Lambda runner cfgs have their now-redundant override blocks
removed -- they pick up the new tighter cfi defaults.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reverts the iteration to its pre-V0 form `for (auto jtrack = itrack + 1; ...)`
and removes the parallel selection machinery added in 800021d:

  - useFixedPairOrder / firstTrack / secondTrack helpers
  - respectTrackOrder cfi parameter (and the ntuplizer member)
  - minPairMass / maxPairMass raw-mass cut and cfi parameter
  - opposite-sign charge filter (itrack->charge() + jtrack->charge() != 0)
  - now-obsolete `if (jtrack == itrack) continue;` (guaranteed by j > i)

Selection responsibility now lives entirely in the candidate producers
(V0CandidateProducer for KS/Lambda, DstToD0PiCandidateProducer for D0):
the charge requirement, the post-vertex-fit V0/D0 mass window, the
pointing and Lxy-significance cuts, and the choice of which track is
slot 0 vs slot 1 are all enforced upstream. The CVH ntuplizer is now
pure "refit + write tree", with no event-level selection of its own.

Behaviorally unchanged for the V0 channels we currently run on (the 2
tracks per event are already pre-selected and ordered upstream). The
`j > i` form also generalises naturally to >1 candidate per event (a
deferred multi-candidate fix we plan in V0CandidateProducer).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…S veto

V0CandidateProducer + DstToD0PiCandidateProducer: skip input tracks with
isLooper() == true at the start of the candidate search (the CVH refit
downstream cannot handle them and was discarding the events anyway).
For DstToD0PiCandidateProducer this is applied to the K and D0-pion
slots only -- the soft pion track is not propagated downstream by the
CVH so its looper status is left as-is.

V0CandidateProducer: replace the single legacy `daughterMassErr` with
two independent parameters `daughterMass1Err`, `daughterMass2Err`
(falling back to `daughterMassErr` and then 1e-6 for back-compat). The
mass-error pairing follows the mass pairing in fitV0(), so for the
Lambda channel's tryBothAssignments=True swap the proton-mass error
stays paired with the proton mass even when applied to the swapped
input track.

V0CandidateProducer: add an optional K_S mass veto (`applyKsVeto`,
`ksVetoMass`, `ksVetoWindow`, `ksVetoPionMass`). When enabled, a
candidate whose two daughter tracks evaluated under the (pi+, pi-)
hypothesis fall within +/- ksVetoWindow of the K_S PDG mass is
rejected before the kinematic fit. Default off so the K_S producer
itself is unaffected; turned on in the Lambda cfi (+/-10 MeV window)
to suppress real K_S contamination at low Lambda pT, where the
wrong-mass-hypothesis K_S mass falls inside the Lambda window.

LambdaToProtonPi cfi: also bump daughter masses to PDG precision
(0.93827208943 / 0.13957039 with their PDG uncertainties).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Geant4ePropagator: add an optional 8th argument `particleNameOverride`
to propagateGenericWithJacobianAltD. When non-empty it is used directly
in the G4ErrorFreeTrajState constructor instead of the propagator's
constructor-bound `theParticleName` (default "mu"). Empty string
preserves the existing muon behaviour for J/psi / Upsilon callers.
Also adds three per-failure-point counters (`plimit`, `ierr`, `maxlen`)
and detailed kinematic prints at each failure exit -- used to diagnose
the Geant4e propagation-failure rate on V0 daughters. The destructor
prints a one-line summary of total calls and per-mode failure counts.

G4ErrorPhysicsListCustom: register K+/-, anti-proton particle
definitions in addition to the existing gamma / e+/- / mu+/- / pi+/-
/ proton. Without these, G4ErrorTrajState::BuildCharge() aborts with
a fatal G4Exception when asked to propagate a kaon / anti-proton (D0
and Lambda-bar daughters).

ResidualGlobalCorrectionMakerTwoTrackG4e: read per-daughter G4 base
particle names from cfi (`daughterParticleName1/2`), construct the
full G4 particle name per track from (base, charge) -- "<base>+"/"-"
for mesons/leptons, "proton"/"anti_proton" for the baryon case -- and
pass it to propagateGenericWithJacobianAltD(). Empty cfi -> empty
override -> propagator falls back to muon (back-compatible with
existing J/psi/Upsilon runners).

V0 channel cfis (PiPi, ProtonPi, KPi): set daughterParticleName1/2
to "pi"/"pi", "proton"/"pi", "kaon"/"pi". Bump daughter masses and
errors to PDG values, and correct the previous massConstraintWidth
values from "lifetime in ms" to "natural width in GeV" (Gamma =
hbar/tau): K_S 7.351e-15, Lambda 2.502e-15, D0 1.605e-12.

V0 runner cfgs (KS, Lambda, D0): override the global Geant4ePropagator
PropagationPtotLimit from the default 0.5 GeV down to 0.05 GeV. The
0.5 GeV threshold was tuned for muon-spectrometer extrapolation but
rejects ~70% of V0-daughter pion / soft-proton tracks (lab |p| can
fall below 0.5 GeV at low Lambda pT or for backward-emitted daughters).
On a 500-event Lambda quickcheck this drops the per-call propagation
failure rate from ~190 to ~70 per 500 events (~64% reduction).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Analysis/HitAnalyzer/{interface,plugins}/ParticleProperties.{h,cc}
with two free functions in namespace ana_hitanalyzer:

  ParticleProperties getParticleProperties(const std::string &base);
       returns {mass, massErr} (GeV) for the given particle base name.
       Recognised: "mu", "pi", "kaon", "proton", "e".

  std::string g4ParticleName(const std::string &base, int charge);
       formats the G4 particle name for the propagator override:
         empty base   -> empty (caller falls back to propagator default)
         "proton"     -> "proton" / "anti_proton"
         everything else -> base + "+" / "-".

All four producers / ntuplizers in HitAnalyzer that previously held
their own copies of these constants now look them up from the single
table:

  V0CandidateProducer        : daughter mass + err from
                               daughterParticleName1/2
  DstToD0PiCandidateProducer : kaon, pion, soft-pion masses hard-coded
                               by name (this channel is K + pi + pi)
  ResidualGlobalCorrectionMakerG4e (single-track):
                               trackMass from trackParticleName, plus
                               per-track G4-name override on every
                               propagator call (mu/pi/kaon/proton).
  ResidualGlobalCorrectionMakerTwoTrackG4e:
                               daughterMass1/2 + err from
                               daughterParticleName1/2.

The base class no longer carries g4ParticleName as a static helper --
it lives in the shared utility instead, so V0CandidateProducer (which
is not in the CVH inheritance hierarchy) can use the same
implementation.

Channel cfis simplified accordingly: removed the redundant numeric
daughterMass1/2 and daughterMass1/2Err (and kaonMass / pionMass /
softPionMass for D*) cms.double() entries -- they are now derived
from the particle name in one place. Updating a particle's PDG mass
becomes a one-line change in ParticleProperties.cc rather than a
scatter of cms.double()s across half a dozen cfis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ield correction

Replaces the parmtype-6 per-module dBz block (~15 000 free parameters,
one per tracker module) in the CVH ntuplizer with a single global
spherical-harmonic expansion of the magnetic scalar potential, evaluated
by the existing magfieldparam::HarmBasis3DCyl basis. The new layout is
~35 modes by default (lmax=5; n=L*(L+2)) and reaches the 50-mode
prescription validated in mfs by adding (L=6, M=1) via cfi.

  ScalarPotentialFieldCorrection
    Wraps the basis (promoted from MagneticField/ParametrizedEngine/src
    to interface so it can be included from HitAnalyzer). Exposes:
      - getCorrectionAt(pos, corparms)  -> Eigen::Vector3d (Bx, By, Bz)
      - getBzBasisAt(pos, dBzPerMode)   per-mode Bz basis values used in
                                        the per-hit Jacobian chain rule
      - appendParmsetEntries / resolveGlobalIndices for the parmset and
        detidparms bookkeeping. Modes are keyed as
        (parmtype = 14, DetId(modeIdx)).

  ResidualGlobalCorrectionMakerBase
    parmset.emplace(6, parmdetid) is removed (eloss/type 7 stays per
    module). The 6 alignment-/transport-Jacobian helpers and the
    corresponding Geant4ePropagator entry point lift their scalar
    dBz argument to a 3-vector dB; the Bv shift in the SymPy-generated
    5x11 alignment Jacobian becomes 3-component, and
    cmsField->SetOffset(dBx, dBy, dBz) sees the full 3D shift.

  G4e + TwoTrackG4e per-hit loops
    Per-hit dB is fetched from the field correction at the propagation
    start point. The Jfull field block becomes nFieldModes columns:
    each column is FdFm.col(5) * (dBz/dc_i)(prop start) -- chain-rule
    scaling of the existing transportJacobianBzD column. Bx/By transport
    effects (sub-mT residuals at 3.8 T baseline) are neglected at this
    iteration; only the field offset itself uses the full 3D shift.

  cfi
    scalarPotentialLmax (uint32, default 5) and scalarPotentialExtra
    (vstring of "L,M" entries, default empty) on all three V0 channels
    plus the J/psi data + ALCARECO drivers and muons_cff. Old per-module
    corFiles are no longer compatible.

  MagneticField::kTeslaToInvGeV
    Named static constexpr replacing the bare 2.99792458e-3 literal at
    the seven Bv-construction sites in Base.cc and the three sites + the
    res.col(5) scaling in Geant4ePropagator.cc.

KS quickcheck closure (200 events, corparms zero): mean kin-fit mass
0.4972 GeV vs PDG 0.4977, 183 candidates pass, ~0.04% propagation
failures -- matches baseline. nParms (per track) drops from many hundreds
to ~95 because the bfield block is now ~35 globally shared modes
instead of one parameter per hit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stage-1 ALCAReco now persists *Resonances VertexCompositeCandidateCollection
collections for KS, Lambda, J/psi, Upsilon, Z (via the new stage-1
VertexCompositeCandidateRemapper). Switch the CVH refit to consume them
directly: one tree row is emitted per persisted candidate, no
re-pairing of tracks at stage-2.

ResidualGlobalCorrectionMakerBase + ResidualGlobalCorrectionMakerTwoTrackG4e:

- Add an optional `srcCandidates` InputTag (default empty) and an
  inputCandidates_ token in the base class, registered only when the
  label is non-empty.
- At the top of produce(), build a vector of (Track*, Track*) pairs:
  if srcCandidates is set, iterate the persisted candidates and use
  cand.daughter(0/1)->track() refs; otherwise fall back to the legacy
  j>i outer-product over the input TrackCollection.
- Replace the existing nested loop with a single flat loop over the
  pair vector; per-pair body unchanged.
- Tighten the dE/dx parameter-existence check to require non-empty
  InputTag labels (existsAs<>() passes for empty InputTag('') but
  getByToken would then throw ProductNotFound at runtime).
- BuildFile: -Wno-error -Wno-unused-variable to keep pre-existing
  unused-variable warnings from failing the build on full recompile.

cfis:

- New ResidualGlobalCorrectionMakerTwoTrack{JpsiMuMu,UpsilonMuMu,ZMuMu}G4e_cfi
  configurations targeting the persisted *Resonances candidates.
- ResidualGlobalCorrectionMakerTwoTrack{PiPi,ProtonPi,KPi}G4e_cfi: default
  srcCandidates set to ALCARECOTkAl{KsToPiPi,LambdaToProtonPi,DstToD0Pi}Resonances.

Removed V0CandidateProducer.cc and its KsToPiPiCandidateProducer_cfi /
LambdaToProtonPiCandidateProducer_cfi -- all V0 ALCAReco now publishes
the *Resonances candidates, no legacy samples need stage-2 re-pairing.
runGlobalCorRec{Ks,Lambda}FromAlca.py simplified to drop the
candidate-producer step (CVH module reads the candidates from the cfi
default).

Adds Charmonium-flavored runner copies (runCvhJpsiCandidateDriven{,100files}.py,
runGlobalCorRec{Ks,Lambda}From{CharmoniumAlca,Charmonium100files}.py) for
running the new pipeline end-to-end on the Charmonium-derived ALCAReco
that the stage-1 work now produces.

Test plan:
- 1-file Charmonium: 706 J/psi candidates -> 702 entries refit (4 dropped
  by isLooper skip), 0/82329 Geant4e propagation failures, J/psi mass =
  3.093 +/- 0.068 GeV.
- 100-file Charmonium (merged via edmCopyPickMerge): 9-variant scan
  (KS / Lambda / J/psi x nv_np / v_np / v_p) completed with 0 failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…4e path limit

1. cmsRun crashed in TTree::AutoSave at ~3 k events with fillGrads=true,
   the trace ending in GenericWriteAction inside the tree-metadata
   streamer. SetAutoSave(0) disables the in-job metadata save (the tree
   is still written at Close), eliminating that crash path.

2. Slow scalar baskets never flushed during the run, so peak RSS reached
   3.4 GB after one hour at 100 kB/event. SetAutoFlush(-100 MB) flushes
   all open baskets every 100 MB accumulated, capping peak near 1.5 GB
   even on 800 k-event jobs.

3. Geant4ePropagator's max-path-length cutoff had been tightened from
   10 m to 200 cm for V0 / mixed-particle debugging; reverting so long
   J/psi muon trajectories are no longer truncated at 200 cm.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reads ALCARECOTkAlJpsiMuMu directly (no candidate producer required;
relies on the legacy in-module pair loop when srcCandidates is unset).
Knobs (VarParsing 'analysis', all optional with sensible defaults):
  input        local path or root:// URL of one ALCARECO file
  nEvents      -1 = all
  fillJac      per-track Jacobians (default true)
  fillGrads    per-event gradient + packed Hessian (default false)
  doMassConstraint   apply J/psi mass constraint (default false)
  useIdealGeometry   default true; pass 0 to use real alignment
  goldenJson   optional path; applied at the source as lumisToProcess

Also installs an HLTHighLevel pre-filter using the same J/psi paths the
producer records decisions for, so events that fired none are skipped
before geopro/CVH.

Driven by calibration_studies/slurm/submit.sh via --config and
--cmsrun-args.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lumns

Replaces transportJacobianBzD (5x7) with transportJacobianBxByBzD (5x9)
in the Geant4e propagator, exposing partial derivatives wrt dBx and dBy
alongside the existing dBz. Generation now lives in the source tree;
chain-rule sites are re-indexed to the new layout. Bx/By chain-rule
terms in the residual-global-corrections makers are deferred until
MfsHarmonicEval (Phase A.0 of the absolute-field plan) provides
getBxBasisAt / getByBasisAt; comments at both sites flag the dependency.

Generator (TrackPropagation/Geant4e/python/gen_transport_jacobian.py)
  Pinned-SymPy script (1.14) re-derived from Bendavid scipy-MuCal commit
  3567123a (branch obsopt). The original Bz-only generator already had
  Bx, By symbolic throughout the helix-transport ansatz; the only change
  vs the reference is appending Bx, By to inparms. The unmodified Bz-only
  generator reproduces the body of the previous transportJacobianBzD
  byte-for-byte; the Bz column of the new 5x9 is symbolically identical
  to the previous 5x7 (rel diff 0.0).

Validation (TrackPropagation/Geant4e/test/transport_jacobian_fd.py)
  Standalone closure tests at solenoid-like field (Bz~3.8 T, Bx, By a
  few mT). Check 1: Bz-column equivalence with the prior 5x7 to machine
  precision. Check 2: dBx/dBy partial-derivative columns close to fixed-s
  FD at <1e-7 relative. Output well-conditioned: no 1/Bx, 1/By, no
  negative-exponent pow(B*).

Geant4ePropagator
  transportJacobianBxByBzD: 5x9 columns
  (qop0, lam0, phi0, xt0, yt0, dBx, dBy, dBz, dxi);
  res.middleCols(5, 3) *= kTeslaToInvGeV (was res.col(5)) so all three
  field columns carry per-Tesla units. propagateGenericWithJacobianAltD's
  tuple element 3 widens to 5x9; per-step accumulator uses rightCols<4>
  (was <2>), sweeping in dBx/dBy alongside dBz/dxi.

ResidualGlobalCorrectionMakerG4e and ...TwoTrackG4e
  FdFm widened to Matrix<5, 9>. Chain rule re-indexed:
    FdFm.col(5) * dBzPerMode -> FdFm.col(7) * dBzPerMode
    FdFm.col(6)              -> FdFm.col(8)               (xi column)
  FdFmrefarr (declared, unused) widened for type consistency.

scram b -j 4 builds TrackPropagation/Geant4e and Analysis/HitAnalyzer
clean; warnings shown are pre-existing in unrelated functions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Spherical-harmonic scalar-potential basis for the CMS tracker B-field,
ported from mfs/harmonic_basis.py. Published as a labelled
IdealMagneticFieldRecord (default label "ScalarPot3DMf"), mirroring
the PolyFit3D plumbing so CVH-side consumers can be redirected via
MagneticFieldLabel.

A.0 — basis evaluator:
  ScalarPot3DEval reads a Phase-A.5 dump file and evaluates per-mode
  Bz/Br/Bphi (plus Bx/By for Phase B) at any GlobalPoint. Plm
  recurrence with Condon-Shortley phase to match scipy.special.lpmv;
  optional Schmidt semi-normaliser sqrt((l-m)!/(l+m)!) at the scalar
  potential level when the dump's cmssw_norm flag is set.
  testScalarPot3DEval prints aggregate field at random points + per-
  mode at a canonical point; the Python comparison driver verifies
  element-wise agreement with mfs.harmonic_basis (max rel 3e-14
  per-mode at canonical, max 3e-7 T absolute on aggregate).

A.1 — labelled-record ESProducer:
  ScalarPot3DMagneticField subclasses MagneticField, owns a
  ScalarPot3DEval, returns evaluateAbsoluteAt inside its validity
  sphere (default 320 cm; A.2 will replace this hard cutoff with a
  C^1 cosine blend). New version="ScalarPot3D" branch in the
  ParametrizedMagneticFieldFactory; new
  parametrizedMagneticField_ScalarPot3D_cfi.py with InitFile and
  ValidityRadius parameters. testScalarPot3DField_cfg.py runtime
  smoke-test confirms inTesla(0,0,0) = (0, 0, 3.811 T) for the
  PolyFit3D-fit dump (= the dominant l=1 m=0 mode, consistent with
  the 3.8 T solenoid).

Consumer-side wiring:
  PhysicsTools/NanoAOD/python/nano_cff.py — extracted the existing
  3D-field rewire block into a setup3DFieldForRefit(process,
  useScalarPot3D=False, scalarPot3DInitFile='') helper; default
  behaviour unchanged.
  Analysis/HitAnalyzer/test/runCvhJpsi.py — added VarParsing knobs
  (useScalarPot3D, scalarPot3DInitFile) and inline field setup,
  independent of the NanoAOD helper (this driver only loads geopro,
  Geant4ePropagator, and globalCor — not the seven NanoAOD-specific
  consumers).

Closure on real ALCARECO data (TkAlJpsiMuMu, 491 evt processed):
  J/psi mass per-event mean Δ 4.4e-9 GeV, std 4.2e-7 GeV;
  J/psi pT 1.9e-7 ± 5.2e-6 GeV; muon kinematics match to ~1e-7 GeV.
  Numerically equivalent CVH refit to FP arithmetic noise.

The Mfs-named files referenced in earlier comments are now
ScalarPot3D throughout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hot path (evaluateAbsoluteAt during Geant4 stepping) was
computing two std::pow calls per mode and allocating three N-mode
vectors per query. Two changes drop microbench cost from 27.0 us to
3.60 us per call (l_max=18, 360 modes):

1. Per-call GeomCache: fillGeomCache() builds the trig table
   cos/sin(m*phi), the Plm table, and a new Rn_pow[k] = (R/r_scale)^k
   power table once per call (1 division + 18 multiplies). The
   per-mode loop then reads array entries instead of calling
   std::pow per (l,m). evaluateBasisAt and the new fast path
   share the helper.

2. Aggregate fast path in evaluateAbsoluteAt: instead of calling
   evaluateBasisAt and summing three 360-element vectors, the
   per-mode loop accumulates directly into three scalars (sumBz,
   sumBr, sumBphi). Pre-multiplied corparms[i]*schmidt_[i] saves a
   per-mode multiply and skips zero-coefficient modes early.
   Reciprocals (1/r_scale, 1/sin_t, 1/r) hoisted out of the loop.

Closure unchanged — element-wise agreement with mfs.harmonic_basis
still at FP precision per-mode (max rel 3e-14 at canonical) and
~1e-7 T absolute on the aggregate field.

End-to-end ALCARECO timing (Charmonium 2016F, 500 events, otherwise
identical to previous benchmark):

  PolyFit3D:          437 s wall  (0.87 s/event)
  ScalarPot3D pre-opt: 1567 s wall  (3.13 s/event)
  ScalarPot3D opt:    334 s wall  (0.67 s/event)

So ScalarPot3D went from 2.6x slower than PolyFit3D to 1.3x faster.
J/psi mass agreement on the new run is 4.9e-10 ± 3.2e-7 GeV
(FP-arithmetic noise; better than the pre-opt 4.4e-9 ± 4.2e-7).

Also: new benchScalarPot3DEval microbenchmark binary in test/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A.4 -- residual-correction makers switch from the legacy
HarmBasis3DCyl-backed ScalarPotentialFieldCorrection to a
ScalarPot3DEval-backed implementation, and consume an absolute-
field initial coefficient block from a Phase-A.5 dump file.

ScalarPotentialFieldCorrection (refactor):
  - Constructor signature changes from (lmaxFull, extras) to
    (dumpPath). The basis structure (l_max, mode list, Schmidt
    convention) is determined entirely by the loaded dump file.
  - nModes / appendParmsetEntries / resolveGlobalIndices /
    basisGlobalIdx unchanged in semantics. Each mode still keys
    into the global parmset via (parmtype=14, DetId(mode_idx)).
  - getBzBasisAt / getCorrectionAt / getJacobianAt now delegate
    to ScalarPot3DEval.
  - Adds getBxBasisAt / getByBasisAt -- previously absent on the
    legacy class; these unblock the Phase B Bx/By chain rule.
  - New initCoeffs() accessor exposes the dump-file coefficients
    so the residual-correction maker can seed corparms_ at the
    parmtype-14 indices.

ResidualGlobalCorrectionMakerBase:
  - cfi: scalarPotentialLmax / scalarPotentialExtra removed;
    scalarPotentialInitFile (string) added, mandatory non-empty.
  - corparms_ initialisation: zero-init, then seed parmtype-14
    entries from the dump's initCoeffs (absolute starting point);
    on subsequent corFiles, parmtype-14 entries are skipped (the
    delta-on-old-basis semantics from the legacy module-dBz block
    don't translate to the absolute-field paramset; deferred
    to "Out of scope" in the plan).

ResidualGlobalCorrectionMaker{,TwoTrack}G4e -- Phase B closeout:
  - dBxPerMode / dByPerMode populated from getBxBasisAt /
    getByBasisAt at the propagation-start position alongside the
    existing dBzPerMode.
  - Chain rule extended:
      dStateDparams.col(imode) = FdFm.col(5) * dBxPerMode[imode]
                               + FdFm.col(6) * dByPerMode[imode]
                               + FdFm.col(7) * dBzPerMode[imode];
    The structural 5x9 transportJacobianBxByBzD has been in place
    since c25d900; this commit fills in the dBx/dBy chain-rule
    contributions that were previously commented-out.

cfi files (PiPi/KPi/ProtonPi G4e_cfi.py) and test drivers
(RunGlobalCorRecJpsi{Alca,Data}.py, runCvhJpsi.py): cfi knob
renamed to scalarPotentialInitFile, runCvhJpsi.py now requires
scalarPot3DInitFile=<path> on the command line.

Closure on real ALCARECO data (Charmonium 2016F, 48 events
processed, identical inputs):
  - Jpsi_mass per-event mean d -2.0e-8, std 1.6e-6 GeV
  - Muon kinematics: ~1e-7 to 1e-6 std
  - Mass-constrained Jpsicons_mass: std 1.0e-4 GeV (~0.1 MeV --
    propagator-trajectory difference between the two field models
    amplified through the kinematic mass constraint)
  - chi^2 / EDM differ at 3e-5 relative -- the new 360-mode x
    3-component chain rule amplifies the small per-trajectory
    difference but stays well within physics-acceptable precision.

Also: new MagneticField/ParametrizedEngine/test/benchPolyFit3D.cpp
microbenchmark to give a like-for-like per-call comparison
against benchScalarPot3DEval. Numbers (l_max=18, 360 modes):
  PolyFit3D::inTesla            : 2.14 us/call
  ScalarPot3DEval::evaluateAbsoluteAt : 3.57 us/call

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cfi-gated debug feature in both ResidualGlobalCorrectionMakerG4e and
ResidualGlobalCorrectionMakerTwoTrackG4e. When runFDClosure=True, on
the first chain-rule site that has nFieldModes > 0, perturb dB at the
propagation start by eps_i * (dBxPerMode[i], dByPerMode[i],
dBzPerMode[i]) for the 10 modes with the largest basis amplitude at
that point. eps_i is scaled per mode so that eps_i * ||dB_basis_i||
equals epsilonFDClosure (target dB perturbation magnitude in Tesla;
default 1e-4).

For each test mode, re-propagate from the same input state and finite-
difference the basis-invariant curvilinear components (qop, lambda,
phi). These can be derived directly from the global-cartesian 7-vector
output (px, py, pz, q) without knowing the surface's local frame; the
xt, yt curvilinear components require a surface-dependent transform
that is skipped here. Compares against dStateDparams.col(imode).head<3>(),
which is the analytic chain rule
  FdFm.col(5)*dBxPerMode + FdFm.col(6)*dByPerMode + FdFm.col(7)*dBzPerMode
restricted to the (qop, lambda, phi) rows.

Closure result on real ALCARECO data (TwoTrackG4e, dB_target=1e-3 T,
top-10 modes by basis amplitude):

  qop:     0.0e+00  for all modes -- qop is conserved under dB
                    perturbation in helix transport, so both FD and
                    analytic land at FP noise and the comparison is
                    suppressed by an absolute floor of 1e-10.
  lambda:  1e-6 to 1e-7 relative across all modes -- chain rule
                    validated tightly. The lambda response of the
                    propagator is unusually well-conditioned.
  phi:     1e-3 to 5e-2 relative -- forward-FD truncation O(eps)
                    at the chosen target dB; smaller eps would
                    tighten phi but risk qop FP noise.

Worst rel = 4.6e-2 (mode 25 phi); within forward-FD precision.
Validates the FdFm.col(5)*dBx + FdFm.col(6)*dBy + FdFm.col(7)*dBz
chain rule that landed in 5e8cf89 (Phase B closeout).

Implementation:
- runFDClosure_/epsilonFDClosure_/didFDClosure_ as members on
  ResidualGlobalCorrectionMakerBase. didFDClosure_ is mutable so the
  per-job one-shot can be set inside analyze() without losing const-
  ness.
- both makers save propInputState before propagateGenericWithJacobianAltD
  so the FD block can re-call from the same point.
- runCvhJpsi.py exposes runFDClosure / epsilonFDClosure VarParsing knobs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…kers

Build fixes (residual API drift surfaced by the Phase 2 cherry-picks):
- DstToD0PiCandidateProducer: edm::EDProducer (removed in 15_0) ->
  edm::stream::EDProducer<>; migrate the two TransientTrackBuilder
  ESHandle lookups to ESGetToken.
- ResidualGlobalCorrectionMakerBase.h: declare std::string fieldlabel_
  member that the dev-branch CVH commits already assign in the cc.
- ResidualGlobalCorrectionMakerTwoTrackG4e: migrate the L1GtTriggerMenu
  consume to ESGetToken.
- Analysis/HitAnalyzer/plugins/BuildFile.xml: add CondFormats/L1TObjects
  + CondFormats/DataRecord deps for the L1GtTriggerMenu link symbols.

PolyFit3D additions not in the cms-sw release tag, dropped:
- MagneticField/ParametrizedEngine/src/PolyFit3DParametrizedMagneticField.{cc,h}
  were re-added by Bruschini's CVH port; the earlier revert (b9d88cb)
  only restored the factory + cache-sentinel state.
- MagneticField/ParametrizedEngine/test/benchPolyFit3D.cpp + BuildFile.xml
  entry were added by the A.4 / Phase-B finalisation cherry-pick.

Preserved (in upstream cms-sw CMSSW_15_0_19_patch2):
- MagneticField/ParametrizedEngine/python/parametrizedMagneticField_PolyFit3D_cfi.py
- The two `PolyFit3D is not supported anymore` factory branches in
  ParametrizedMagneticFieldFactory.cc

Session-plan comment scrub:
- Replace "Phase-A.0/.1/.4/.5", "Phase B/B.5" mentions and
  "replicated-bouncing-cloud(.md)" references in code comments with
  substantive descriptions of what the code actually does. These were
  internal porting-plan labels, not part of the published WMass plan.

Test-driver follow-up: RunGlobalCorRecJpsiData.py and runCvhJpsi.py no
longer import the dropped wrapper class; they error out cleanly if a
3D-fieldmap mode is requested and direct the user to the ScalarPot3D
path instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes two runtime crashes surfaced when running runCvhJpsi.py
end-to-end on ALCARECO J/psi data:

- Add globalGeometryEventToken_ + trackerTopologyEventToken_ on
  ResidualGlobalCorrectionMakerBase. Derived classes' produce() now
  use Event-transition tokens; the BeginRun-scoped tokens remain
  for beginRun(). Fixes the ESGetTokenWrongTransition exception at
  the first event.

- Set process.Geant4ePropagator.ForCVH = True in runCvhJpsi.py so
  the propagator ctor instantiates fluct (G4UniversalFluctuationFor-
  Extrapolator) + msmodel (G4WentzelVIModelForCVH) and routes their
  table pointers via SetParticleAndCharge. Fixes a null-pointer
  segfault in computeErrorIoni.

A 2-event ALCARECO smoke run now completes cleanly (146 propagation
calls, 1 expected ierr=3 boundary failure, runtree filled).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@davidwalter2 davidwalter2 force-pushed the WmassNanoProd_15_0_19_patch2_dev branch from c2b4ed0 to 6b568d2 Compare May 14, 2026 15:31
davidwalter2 and others added 6 commits May 14, 2026 22:38
…r/worker

Ports the SimG4Core/Application master/worker pattern, slimmed to what the
CVH G4Error flow needs, so `numberOfThreads=N` runs of the CVH residual makers
complete cleanly instead of aborting with "Found that existing tracking
Navigator has NULL world" on the first event.

New scaffolding in TrackPropagation/Geant4e/:
- CvhMaster: G4MTRunManagerKernel + DDDWorld + master magnetic field +
  G4ErrorPhysicsListForCVH, brought through PreInit -> Init -> Idle so the
  master's per-particle ProcessManagers have the G4Error processes attached
  (workers copy them via G4WorkerThread::BuildGeometryAndPhysicsVector).
- CvhMasterThread: dedicated G4 master thread (clone of OscarMTMasterThread,
  state-loop + ESGetToken pattern), held by the residual-maker GlobalCache.
- CvhWorker: per-stream per-thread G4 setup. Lazy first-call (mutex-guarded
  thread_local flag) does BuildGeometryAndPhysicsVector +
  G4WorkerRunManagerKernel + WorkerDefineWorldVolume + per-thread
  CMSFieldManager via sim::FieldBuilder + share physics list via
  InitializeWorker -> SetPhysics -> InitializePhysics -> RunInitialization,
  then sets G4ErrorPropagatorData to G4ErrorState_Init so the propagator
  skips its own (now duplicate) physics registration.

G4ErrorPhysicsListForCVH made MT-idempotent: a thread_local guard at the
top of ConstructProcess prevents the worker-kernel InitializePhysics from
double-attaching G4Transportation / eLoss / step-length / B-field-limit
processes (the "G4ProcessVector inconsistent" abort that surfaced when
master + worker both ran the construction path).

ResidualGlobalCorrectionMakerBase migrated to
edm::stream::EDProducer<edm::GlobalCache<CvhMasterThread>, edm::RunCache<int>>:
- static initializeGlobalCache spawns the dedicated G4 master thread once
  per job, before any TBB worker exists.
- globalBeginRun / globalEndRun forward to the master thread's state loop
  (build the world / field on the master, share with workers).
- globalEndJob joins the master thread.
- Per-stream CvhWorker member owns the per-thread G4 setup; produce()
  calls worker_->ensureInitialized(globalCache()->cvhMaster()) before any
  propagation so the navigator/world/field are live on the TBB worker.
- ResidualGlobalCorrectionMaker{G4e,TwoTrackG4e} ctor signature now takes
  (const ParameterSet&, const CvhMasterThread*).

Geant4ePropagator: defer fluct allocation from the (deep-copy) ctor to the
first propagate() call on each thread, AFTER CvhWorker has installed the
world. Without the deferral, clone() in the residual-maker's per-stream
template-copy aborted in G4UniversalFluctuationForExtrapolator's
G4WentzelVIModel::Initialise on a worker thread with no navigator.

Memory optimisation (#1 from the post-port review):
G4UniversalFluctuationForExtrapolator::tables made process-wide static
(mutex-guarded lazy init), mirroring the existing pattern in
G4EnergyLossForExtrapolatorForCVH::tables. The ionOnly dE/dx + range +
inv-range material tables are read-only after construction and identical
for every instance, so per-stream propagator clones now pay only the
~200 B per-instance state (rndmarray + model params) instead of
re-allocating the full table set.

Test config (runCvhJpsi.py): drops geopro from the path (CvhMaster owns
the world / field now), wires the CvhMaster PSet into the residual maker
(via cvhMaster_cfi.CvhMasterPSet), adds numberOfThreads VarParsing knob.

Closure on 48-event ALCARECO J/psi smoke at N=1, 2, 4, 8, 16: all match
exactly (24701 propagation calls / 175 failures = 0.71% each).

Scaling sweep (500 events / 16 cores) when paired with a 50-mode
"custom50" scalar-potential dump (lphi5-base + l=6,m=1, see
mfs/CLAUDE.md) for the per-event Hessian workspace:
  N=1  : 1:11 / 1.28 GB
  N=2  : 0:54 / 1.30 GB
  N=4  : 0:39 / 1.53 GB
  N=8  : 0:37 / 1.74 GB
  N=16 : 0:32 / 2.27 GB

Down from 7:44 / 3.75 GB (N=1) and 1:34 / 28.4 GB (N=16) with the original
360-mode lmax=18 dump pre-port; the 50-mode dump just exposed the
remaining per-event MatrixXd hessfull (nparmsfull^2) as the dominant cost
and trades it for a basis that is in fact better for the tracker region.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ConstructParticle() now iterates a caller-supplied G4 particle-name list
(CvhMaster.Particles cms.vstring) instead of hardcoding the full set.

NOT a memory optimisation. A scaling sweep (500 evt, 50-mode dump,
N=1..16) of muons-only vs the full 11-particle list shows RSS deltas of
-23..+73 MB -- bidirectional, within run-to-run noise. The per-particle
cost is just a G4ProcessManager + lightweight process wrappers; the
memory-heavy dE/dx tables are already process-wide static (prior commit)
and are referenced unconditionally, so dropping pi/K/anti_proton frees
nothing. The real memory wins were already banked: shared fluct tables
(~300 MB/thread) + 50-mode dump (hessfull 1.3 GB -> 36 MB/event), which
took N=16 from 28.4 GB to 2.3 GB.

What this change is for:
- Footgun removal: an unknown particle name aborts at master-thread
  initG4 with a descriptive G4Exception ("UnknownParticle") listing the
  valid names, instead of a silent NULL G4ParticleDefinition from
  FindParticle and garbage propagation later.
- Explicit intent: each job declares the particles it propagates.
- Marginally faster physics-list init (fewer particle definitions).

API:
- New ctor G4ErrorPhysicsListForCVH(const std::vector<std::string>&).
  No-arg ctor delegates to it with the full canonical list (back-compat
  for any direct caller).
- Routed cvhMaster_cfi.CvhMasterPSet.Particles -> CvhMaster ctor reads
  "Particles" -> CvhMaster::initG4 builds the physics list with it.
- runCvhJpsi.py narrows to ("mu+","mu-").

Mandatory always-on set {gamma, e+, e-, mu+, mu-, proton} is registered
regardless of the list:
- gamma/e+/e-: G4PhysicsListHelper::CheckParticleList aborts otherwise
  in G4 11+ (Run0101 "Missing EM basic particle").
- mu+/mu-/proton: referenced by G4TablesForExtrapolatorForCVH dE/dx /
  range table generation; without pre-registering them on master the
  first muons-only sweep crashed at N=16 with Run0271 / PART10116
  ("ProcessManager set without proper TLS init"). Widening the mandatory
  set fixed it -- N=16 now completes; this is why the achievable
  particle reduction (and thus any memory effect) is small by
  construction.

Closure (48-evt ALCARECO J/psi smoke, muons-only): N=1/2/4 all
2401 calls / 18 failures = 0.75%. Scaling sweep N=1..16: 24701 calls /
175 failures identical at every thread count; N=16 no longer aborts.
Bad-name guard verified (Particles=("mu+","muon-") -> clean fatal with
recognised-names list).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
functions.cpp was entirely commented-out; functions.h held only an
include guard plus commented declarations and was included (no symbols
used) by ResidualGlobalCorrectionMakerBase.{cc,h}. script.txt was a
stray SymPy matrix dump. All carried in via the upstream cherry-pick
replay; none are referenced or built. Drop the files and the now-unused
includes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
runCvhSingleTrack.py drives ResidualGlobalCorrectionMakerG4e (no
kinematic-vertex fit / mass constraint) for single-track CVH
validation and a lighter cross-check of the propagator/residual chain.
runCvhJpsi.py gains an optional useOpera3D switch to use the full 3D
TOSCA volumetric grid (160812) as the baseline field instead of
ScalarPot3D, for B-field cross-checks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mory)

Ports ResidualGlobalCorrectionMakerTwoTrackG4e (J/psi two-track CVH
refit) from the dense hessfull = MatrixXd::Zero(nparmsfull, nparmsfull)
(~1.35 GB/event at 360 field modes) to the single-track maker's sparse
GBL design-matrix formulation: rfull/Ffull/Jfull/Vinvfull + Fsparse +
SimplicialLDLT<SparseMatrix>. Mathematically identical to the dense
path -- dxfull = -(2 Fs^T Vinv Fs)^-1 (2 Fs^T Vinv r) collapses to
dxfree = -(Fs^T Vinv Fs)^-1 Fs^T Vinv r (the factor of 2 cancels).

What changed:
- Sparse member decls; freestateidxs/nstatefree sizing replacing the
  dense freezeparm() 1e6-deweighting (fitFromGenParms_ -> vtx 0..9,
  doVtxConstraint_ -> idx 6, fitFromSimParms_ -> all excluded from
  the solve).
- ncons = 5*nhits + 2*nvalid + 6*bs + pointing + (icons==1 mass).
  The two-track maker uses a uniform 2-row Fhit/dy0 per valid hit
  (strips down-weighted via Vinv), so 2*nvalid, not the single-track
  nvalid+nvalidpixel.
- All 5 fill sites redirected from dense prophess/hithess/hesslocal
  accumulation to rfull/Ffull/Jfull/Vinvfull row writes: prop
  (ihit==0 vtx+state, ihit>0 state), measurement (alignment),
  beamspot, pointing, J/psi mass.
- Iterative solve -> sparse Fsparse/VinvF/Cinvd; dxfree scattered to
  dxfull(freestateidxs); deltachisq = rfull^T VinvF dxfree.
- covstate -> Cinvd.solve(I) scattered free->full (no factor 2;
  dense 2*(2Fs^TVinvFs)^-1 == (Fs^TVinvFs)^-1).
- Final reduction -> Jfinal column-gather + Rsparse; grad/hess via
  2 Jsparse^T Rr / 2 Jsparse^T R Jsparse; dxdparms via
  -Cinvd.solve(VinvF^T Jsparse).transpose(). Removes the O(npars^2)
  dense reduction (~164M iters/track at 360 modes). Muplus/Muminus
  _jacRef + Jpsi_jacMass emitters unchanged.
- #include <Eigen/Sparse>; removed dead freezeparm lambda; added a
  cheap permanent irow==ncons row-alignment assert.

Validation (48-evt 15_0-native ALCARECO; vs dense reference checkout
/work/submit/david_w/ZMass/CMSSW_15_0_19_patch2):
- Build + run clean; 0 propagation failures.
- Physics: Jpsikin_mass BIT-EXACT vs dense; kinematics at FP-noise;
  globalidxv ordering identical 47/47.
- jacRef closes to ~1e-6 of scale (jacRef is stored float, ~1e-7
  intrinsic) whenever the iteration converges to the same stopping
  point. Subset maxabs/scale: niter+edmval match (26 entries) ->
  Muplus 1.4e-6, Muminus 1.2e-6, Jpsi_jacMass 1.9e-6. The ~1e-3 on
  niter-divergent tracks is convergence-path, not a bug: the break
  is the hard edmvalref<1e-5 cut (edmvalref = 0.5 dxRef^T covref^-1
  dxRef over the 10 physical params); dense LDLT vs sparse
  SimplicialLDLT (AMD reordering) differ at ~1e-12/iter, so for
  tracks landing near the cut on the deciding step the integer
  niter flips by 1-2. edmval is bit-identical when niter matches.
  Intrinsic to any reformulation of an iterative solver; within the
  fit's own reproducibility envelope.
- Memory (360-mode dump, 48-evt N=1): peak RSS 1.12 GB vs the old
  dense 360-mode ~3.7 GB (~3.3x), below even the 50-mode dense
  workaround (1.28 GB). The per-event 1.35 GB MatrixXd hessfull and
  the O(npars^2) reduction are gone.
- MT correctness (360-mode, N=1/2/4/8): calls=12167 / 0 failures and
  tree=47 / Jpsikin_mass mean=3.06603227 rms=0.06699844 bit-identical
  across every thread count. RSS 1.12->1.55 GB N=1->8 (dense 360-mode
  would have been ~3.7 GB -> 15+ GB).

Net: full-resolution 360-mode field calibration is now feasible at
~1.1 GB; the 50-mode dump workaround is no longer required for memory
(still usable for speed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants