MLflow logging of drift metrics

I see a couple of issues with the way MLflow logging is set up. It doesn't seem to happen with the provided examples, but they do with the example set up in the [slac-fel branch](https://github.com/AI-ModCon/BaseSim_Framework/tree/slac-fel) (maybe due to the higher number of global steps and inner CL loops?)

In that branch I [committed](https://github.com/AI-ModCon/BaseSim_Framework/commit/3ef2dac03ba72b9eb7539dd58ee1e866d75d3159) a fix that needed to be in there to be able to see drift.detected on MLflow/WandB: logging them as ints not bools. 

It's also possible drift.detected should use drift.step (the per-stage step) as its x-axis, not the global step. This way, the chart shows consistent progression independent of how many eval/cl steps happen in between. I did not implement that because I am not sure it resolves the full issue I see below.

**Main issue**
During a run, the dashboard does not show all the expected drift.detected events (should be 4):

<img width="3386" height="748" alt="Image" src="https://github.com/user-attachments/assets/5415137f-7bb9-4390-8424-ac093c63ce53" />

Moreover, after the run ends, MLflow seems to remove some rows from the data, and since we have very few 1s vs 0s, we end up with a flat 0 line:

<img width="3400" height="746" alt="Image" src="https://github.com/user-attachments/assets/4513baef-10c0-411c-85df-4758ce216443" />

Exporting the CSV data for drift.detected, for this example, we expected a 1 around step 4500 but it's been removed.

<img width="347" height="403" alt="Image" src="https://github.com/user-attachments/assets/43962247-8038-45bf-a302-cb4502c33620" />


I haven't had the time to create a simple reproducer to put here. Perhaps I or @anagainaru can share the data with whoever is assigned to this (I am also going to upload it to Perlmutter tomorrow).

FWIW, the saved CSV looks fine (4 drifts detected), attached here.
[slac-fel.csv](https://github.com/user-attachments/files/25723801/slac-fel.csv)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLflow logging of drift metrics #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MLflow logging of drift metrics #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions