You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two types of metrics in ML projects: scalar metrics or just a number (like AUC) and continuous metrics or a sequence of numbers (like ROC curve - an array of numbers).
Today we support csv and json files as metrics without specifying what metrics are needed. In fact, DVC commands support only scalars: dvc metrics diff.
Solution
It is important to support both, understand the semantics of the metrics and provide more value.
For scalar metrics, DVC should provide (and it does provide it now) dvc metrics diff functionality. We just need clearly file format in the documentation to make sure only scalar metrics are supported. Json/Ini are both good formats for scalar metrics.
For continuous metrics, DVC should provide plots/graph visualization functionality. Like dvc viz show roc.json and dvc viz diff roc.json HEAD HEAD^^. Both of the commands should generate graphs. The second command - two graphs (not diff).
Types
Types might improve visualization a lot. There many possible types of graphs for dvc viz. It would be great to support a few common ones like regular plot, confusion matrix.
For scalar types, it is also important to specify a type. A scalar can be the result of the minimization of some function or maximization. If the user can specify this information it might help to show proper color: red if max meterics was decreased and green if increased.
We should not make the metrics file format complicated. So, it might be a better solution to specify metrics format in a separate file (like Dvcfile or special metrics file).
- scalars_types:
AUC: max
error_1: min
- metrics:
roc.json: plot
error.csv: plot
mx.json: confusion_matrix
Open questions
Should we include dvc viz in the core dvc package? Problem - most likely it will bring dependencies to some graphics libraries which can cause installation issues in OS without graphics (like default EC2 instances). What options do we have?
How we should specify the format of the graphs. Do we need any customization? For example, mlflow supports only one type of graph and cannot show something more advance (like a confusion matrix). If we choose Vega-graph as the format we can support graph customization.
The metrics can be related to experiment parameters (see Introduce hyper parameters and config #3393) because we are introducing new DVC file types and can use a single DVC file to preserve some information about metrics or metrics types.
Actions
Modify DVC documentation - only scalar metrics are supported in the current format and DVC commands.
Implement continuous metrics with dvc viz command.
Problem
There are two types of metrics in ML projects: scalar metrics or just a number (like AUC) and continuous metrics or a sequence of numbers (like ROC curve - an array of numbers).
Today we support csv and json files as metrics without specifying what metrics are needed. In fact, DVC commands support only scalars:
dvc metrics diff.Solution
It is important to support both, understand the semantics of the metrics and provide more value.
dvc metrics difffunctionality. We just need clearly file format in the documentation to make sure only scalar metrics are supported. Json/Ini are both good formats for scalar metrics.dvc viz show roc.jsonanddvc viz diff roc.json HEAD HEAD^^. Both of the commands should generate graphs. The second command - two graphs (not diff).Types
Types might improve visualization a lot. There many possible types of graphs for
dvc viz. It would be great to support a few common ones like regular plot, confusion matrix.For scalar types, it is also important to specify a type. A scalar can be the result of the minimization of some function or maximization. If the user can specify this information it might help to show proper color: red if max meterics was decreased and green if increased.
We should not make the metrics file format complicated. So, it might be a better solution to specify metrics format in a separate file (like
Dvcfileor special metrics file).Open questions
dvc vizin the core dvc package? Problem - most likely it will bring dependencies to some graphics libraries which can cause installation issues in OS without graphics (like default EC2 instances). What options do we have?Actions
dvc vizcommand.