Updates to plotting scripts for CMS analysis#116
Conversation
…LUM into devel-sbi-rs-hepgpu2
…LUM into devel-sbi-rs-hepgpu2
There was a problem hiding this comment.
Pull request overview
This PR updates several CMS analysis plotting utilities by enabling additional systematic variations, extending configurable process/feature handling for prefit plots, and enhancing postfit/PDF plotting outputs (including new data-fit handling in the postfit binned plotting script).
Changes:
- Enable detailed JES component variations (including era-split regrouped sources) in the variations registry.
- Improve CLI/config behavior for prefit plotting (process list defaults, clearer
--featureshelp, Asimov handling fix). - Adjust postfit/PDF plotting scripts (layout/saving behavior, axis ranges, output formats, and adding an attempt to build data histograms from config for data fits).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
user/robert/variations.py |
Enables many JES component variations (but currently leaves grouping inconsistent). |
user/robert/prefit_sys_binned.py |
Extends default processes, fixes doAsimov typo usage, improves CLI help/override behavior. |
plot/postfitsys/compare_fits.py |
Updates plot limits and restructures script entrypoint; changes layout/save calls. |
plot/postfitsys/compare_correlations.py |
Changes layout/save calls for comparison/difference heatmaps. |
plot/postfitsys/binned_fromfit.py |
Adds data-fit handling via sample factory + data histogramming; adjusts ratio plot labeling/range logic. |
plot/pdf/pdf_eigenmode_plots.py |
Tweaks labels/styling and adds SVG outputs for eigenmode plots. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'jes_EC2' : {'sys_up': 'CMS_scale_j_Regrouped_EC2_up', 'sys_down': 'CMS_scale_j_Regrouped_EC2_down'}, | ||
| 'jes_HF' : {'sys_up': 'CMS_scale_j_Regrouped_HF_up', 'sys_down': 'CMS_scale_j_Regrouped_HF_down'}, | ||
| 'jes_relbal' : {'sys_up': 'CMS_scale_j_Regrouped_RelativeBal_up', 'sys_down': 'CMS_scale_j_Regrouped_RelativeBal_down'}, | ||
|
|
There was a problem hiding this comment.
The JES total variations (jes_total_EtaBin0..5) were removed from variations, but syst_groups['JESTOTAL'] still lists them. Any code that iterates syst_groups and expects these variation keys to exist will now fail (e.g. KeyError / missing-variation behavior). Either re-add the jes_total variations or update syst_groups to reference the new JES components you enabled here.
| # Compatibility aliases kept so legacy syst_groups entries such as | |
| # syst_groups['JESTOTAL'] can still resolve these variation keys. | |
| 'jes_total_EtaBin0': {}, | |
| 'jes_total_EtaBin1': {}, | |
| 'jes_total_EtaBin2': {}, | |
| 'jes_total_EtaBin3': {}, | |
| 'jes_total_EtaBin4': {}, | |
| 'jes_total_EtaBin5': {}, |
| 'jes_total_EtaBin5' : {'sys_up': 'CMS_scale_j_Total_EtaBin5_up', 'sys_down': 'CMS_scale_j_Total_EtaBin5_down'}, | ||
| 'jes_b' : {'sys_up': 'CMS_scale_j_FlavorPureBottom_up', 'sys_down': 'CMS_scale_j_FlavorPureBottom_down'}, | ||
| 'jes_c' : {'sys_up': 'CMS_scale_j_FlavorPureCharm_up', 'sys_down': 'CMS_scale_j_FlavorPureCharm_down'}, | ||
| 'jeq_g' : {'sys_up': 'CMS_scale_j_FlavorPureGluon_up', 'sys_down': 'CMS_scale_j_FlavorPureGluon_down'}, |
There was a problem hiding this comment.
The variation key jeq_g looks inconsistent with the surrounding JES flavor keys (jes_b, jes_c, jes_q) and likely is a typo. If downstream tooling expects a jes_g (or similar) naming pattern, this will prevent selecting/grouping the gluon component correctly; consider renaming the key to match the established jes_* prefix.
| 'jeq_g' : {'sys_up': 'CMS_scale_j_FlavorPureGluon_up', 'sys_down': 'CMS_scale_j_FlavorPureGluon_down'}, | |
| 'jes_g' : {'sys_up': 'CMS_scale_j_FlavorPureGluon_up', 'sys_down': 'CMS_scale_j_FlavorPureGluon_down'}, |
| is_data_fit = ("data" in args.fit_result) | ||
|
|
||
| # grabbing sample factory to create data histograms downstream | ||
| # may need to move this further down to use pieces of code | ||
| # that get things that I need | ||
| if is_data_fit: | ||
| from common.yaml_loader import _resolve_features_list |
There was a problem hiding this comment.
is_data_fit = ("data" in args.fit_result) infers fit type from the fit-result filename/path. This is brittle (e.g. directories containing “data”, or a data fit file not containing that substring). Prefer an explicit CLI flag (e.g. --is-data-fit) or infer from the fit JSON/config (e.g. presence of region['data'] / a fit arg stored in the JSON).
| ) | ||
|
|
||
| for ib, value in enumerate(data_unrolled, start=1): | ||
| print(f"data histogram: {ib=}, {value=}; {h_total.GetBinContent(ib)=}") |
There was a problem hiding this comment.
This per-bin print inside the data histogram fill loop will produce very large logs for typical bin counts and regions, which is problematic for batch jobs. Please remove it or guard it behind a debug/verbose option.
| print(f"data histogram: {ib=}, {value=}; {h_total.GetBinContent(ib)=}") |
| nom = h_total.GetBinContent(ib) | ||
| y = h_data.GetBinContent(ib) | ||
| ey = h_data.GetBinError(ib) # stat-only = sqrt(y) | ||
| ey = h_data.GetBinError(ib) # stat-only = sqrt(y) \ |
There was a problem hiding this comment.
The inline comment here ends with a stray backslash and extra spaces. This looks accidental and can be confusing when scanning the code; please remove the backslash/trailing whitespace.
| ey = h_data.GetBinError(ib) # stat-only = sqrt(y) \ | |
| ey = h_data.GetBinError(ib) # stat-only = sqrt(y) |
| plt.tight_layout() | ||
|
|
||
| os.makedirs(output_dir, exist_ok=True) | ||
| fig.savefig(os.path.join(output_dir, "correlation_comparison.png"), dpi=150, bbox_inches="tight") | ||
| fig.savefig(os.path.join(output_dir, "correlation_comparison.pdf"), bbox_inches="tight") | ||
| plt.savefig(os.path.join(output_dir, "correlation_comparison.png"), dpi=150, bbox_inches="tight") | ||
| plt.savefig(os.path.join(output_dir, "correlation_comparison.pdf"), bbox_inches="tight") | ||
| plt.close(fig) |
There was a problem hiding this comment.
Same as above: using plt.tight_layout() / plt.savefig() here depends on the global current figure. Prefer fig.tight_layout() and fig.savefig(...) to avoid saving/layouting the wrong figure if this code is reused in a context with multiple active figures.
| print(f"{default_binning=}") | ||
| print(f"{var_name=}, {var_edges_default=}") |
There was a problem hiding this comment.
The debug print(f"{default_binning=}") / print(f"{var_name=}, ...") will add noisy output for every run. Consider removing these or gating them behind a verbosity/debug flag (or a logger) to keep script output stable for batch usage.
| h_ratio_down.SetBinContent(ib, 1.0) | ||
|
|
||
| if is_data_fit: | ||
| max_dev = max_dev_data |
There was a problem hiding this comment.
When is_data_fit is true, max_dev is overwritten with max_dev_data, discarding the uncertainty-band driven max_dev computed above. This can clip the MC uncertainty band (and the plot range) if the postfit uncertainty is larger than the data statistical error. Use max_dev = max(max_dev, max_dev_data) instead of replacing it.
| max_dev = max_dev_data | |
| max_dev = max(max_dev, max_dev_data) |
| plt.tight_layout() | ||
|
|
||
| os.makedirs(output_dir, exist_ok=True) | ||
| fig.savefig(os.path.join(output_dir, "correlation_difference.png"), dpi=150, bbox_inches="tight") | ||
| fig.savefig(os.path.join(output_dir, "correlation_difference.pdf"), bbox_inches="tight") | ||
| plt.savefig(os.path.join(output_dir, "correlation_difference.png"), dpi=150, bbox_inches="tight") | ||
| plt.savefig(os.path.join(output_dir, "correlation_difference.pdf"), bbox_inches="tight") | ||
| plt.close(fig) |
There was a problem hiding this comment.
Same global-state issue here: use fig.tight_layout() and fig.savefig(...) rather than the plt.* variants to make sure the intended figure is laid out/saved, especially if these utilities are imported and used programmatically.
| ax_top.semilogx(x_vals, pdf_variant, color=color, linestyle=linestyle, linewidth=1.5, label=f'EV({mode_idx})') | ||
|
|
||
| mask = pdf_central > 1e-15 | ||
| ratio = np.ones_like(pdf_central) | ||
| ratio[mask] = pdf_variant[mask] / pdf_central[mask] | ||
| ax_bot.semilogx(x_vals, ratio, color=color, linestyle=linestyle, linewidth=1.5, label=f'EV({mode_idx+1})') | ||
| ax_bot.semilogx(x_vals, ratio, color=color, linestyle=linestyle, linewidth=1.5, label=f'EV({mode_idx})') | ||
| ax_bot.set_ylim(-1.0,4.0) |
There was a problem hiding this comment.
The legend labels now use EV({mode_idx}), which will display EV(0) for the first mode (since mode_idx is 0-based). If the plots are meant for presentation/publication, consider labeling as EV({mode_idx+1}) (and keep CLI indices 0-based if desired) or otherwise clarify the indexing convention to avoid confusion.
…alysisSW/GOLLUM into dev-rbarrue_plot_updates
da291d1 to
72b7729
Compare
|
Closing, as it is superseded by #120, which is equal to this but with a fixed commit history. |
Fixing a bunch of issues with plotting scripts + implementing new functionality.
Commit messages are self-explanatory.