Story: Parallel coordinates plot

Opening this to discuss how Parallel Coordinates Plot is implemented in DVC (https://github.com/iterative/dvc/pull/6933)

---

The DVC implementation is based on performing operations using an internal class called `TabularData`. 

Unfortunately, all these operations happen *after* the `--json` output is dumped, so VSCode can't reuse the logic.

I will describe the operations bellow so you can decide which ones make sense to implement on vscode side.

<details>
<summary>Sample Table used to generate snippets bellow</summary>

| Experiment   | Created      | loss    | accuracy   | train.batch_size   | train.hidden_units   | train.dropout   | train.num_epochs   | train.lr   | train.conv_activation   | missing_categorical   | missing_scalar   |
|--------------|--------------|---------|------------|--------------------|----------------------|-----------------|--------------------|------------|-------------------------|-----------------------|------------------|
| workspace    | -            | 0.26484 | 0.9038     | 128                | 64                   | 0.4             | 10                 | 0.001      | relu                    | bar                   | 1                |
| main         | Sep 14, 2021 | 0.26484 | 0.9038     | 128                | 64                   | 0.4             | 10                 | 0.001      | relu                    | -                     | -                |
| 5bcd44f      | Sep 01, 2021 | 0.25026 | 0.9095     | 128                | 64                   | 0.4             | 10                 | 0.001      | relu                    | -                     | -                |
| b06a6ba      | Aug 31, 2021 | 0.25026 | 0.9095     | 128                | 64                   | 0.4             | 10                 | 0.001      | relu                    | -                     | -                |
| d34fd8c      | Aug 30, 2021 | 0.30741 | 0.8929     | 128                | 64                   | 0.4             | 10                 | 0.01       | relu                    | -                     | -                |

</details>

<details>
<summary>Associated plot</summary>

![newplot(15)](https://user-images.githubusercontent.com/12677733/144487777-90c34950-0b13-4854-a598-fa17c25e03df.png)

</details>

## Plot Structure

### HTML template

The plot is rendered with https://plotly.com/javascript/ 

<details>
<summary>This is how the HTML template looks like</summary>

```html
<!DOCTYPE html>
<html>
<head>
    <title>DVC Plot</title>
    <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
</head>
<body>
    <div id = "plot_experiments">
        <script type = "text/javascript">
            var plotly_data = {
              "data": {{DATA}}, 
              "layout": {{LAYOUT}} 
            };
            Plotly.newPlot("plot_experiments", plotly_data.data, plotly_data.layout);
        </script>
    </div>
</body>
</html>
```

</details>

### {{DATA}}

`{{DATA}}` it's a list of Plotly `traces`. 

In the case of this plot, it is a list with a single `trace` of type `parcoords`. The full reference is here: https://plotly.com/python/reference/parcoords/

We use the `exp show` table to fill the `{{DATA}}` placeholder (more on this bellow). 

<details>
<summary>This is how filled {{DATA}} looks like</summary>

```json
[
    {
        "type": "parcoords",
        "dimensions": [
            {
                "label": "Experiment",
                "values": [
                    4,
                    3,
                    0,
                    1,
                    2
                ],
                "tickvals": [
                    4,
                    3,
                    0,
                    1,
                    2
                ],
                "ticktext": [
                    "workspace",
                    "main",
                    "5bcd44f",
                    "b06a6ba",
                    "d34fd8c"
                ]
            },
            {
                "label": "loss",
                "values": [
                    0.26484,
                    0.26484,
                    0.25026,
                    0.25026,
                    0.30741
                ]
            },
            {
                "label": "accuracy",
                "values": [
                    0.9038,
                    0.9038,
                    0.9095,
                    0.9095,
                    0.8929
                ]
            },
            {
                "label": "train.lr",
                "values": [
                    0.001,
                    0.001,
                    0.001,
                    0.001,
                    0.01
                ]
            },
            {
                "label": "missing_categorical",
                "values": [
                    0,
                    1,
                    1,
                    1,
                    1
                ],
                "tickvals": [
                    0,
                    1,
                    1,
                    1,
                    1
                ],
                "ticktext": [
                    "bar",
                    "Missing",
                    "Missing",
                    "Missing",
                    "Missing"
                ]
            },
            {
                "label": "missing_scalar",
                "values": [
                    1.0,
                    null,
                    null,
                    null,
                    null
                ]
            }
        ],
        "line": {
            "color": [
                0.9038,
                0.9038,
                0.9095,
                0.9095,
                0.8929
            ],
            "showscale": true,
            "colorbar": {
                "title": "accuracy"
            }
        }
    }
]
```

</details>

### {{LAYOUT}}

We don't currently use `{{LAYOUT}}` at all but we plan to allow users customizing some of its properties.

Full reference of options here: https://plotly.com/python/reference/layout/

## Filling {{DATA}}

### Overview

This the high level schema of unfilled {{DATA}}:

```json
[
        "type": "parcoords",
        "dimensions": [
            {{DIMENSIONS}}
         ],
        "line": {{LINE}}
]
```

### {{DIMENSIONS}}

This is a list of dictionaries. Full reference: https://plotly.com/python/reference/parcoords/#parcoords-dimensions

*Each column in the experiments table will be one item in this list*

To prevent saturating the plot, we use the `drop_duplicates` operation which removes any column with 0 variance (same value for all rows like `train.conv_activation` in the sample table).

#### Scalar columns

For **scalar** columns, the structure of the item to append to the list is very simple:

<details>
<summary>Scalar column</summary>

```json
{
    "label": "loss",
    "values": [
        0.26484,
        0.26484,
        0.25026,
        0.25026,
        0.30741
    ]
}
```

</details>

When a value is missing, we just inject a `null`:

<details>
<summary>Scalar column with missing value(s)</summary>

```json
{
    "label": "missing_scalar",
    "values": [
        1.0,
        null,
        null,
        null,
        null
    ]
}
```

</details>

#### Categorical columns

For **categorical** columns, the structure of the item to append to the list it's a little more elaborated:

<details>
<summary>Categorical column</summary>

```json
{
    "label": "Experiment",
    "values": [
        4,
        3,
        0,
        1,
        2
    ],
    "tickvals": [
        4,
        3,
        0,
        1,
        2
    ],
    "ticktext": [
        "workspace",
        "main",
        "5bcd44f",
        "b06a6ba",
        "d34fd8c"
    ]
},
```

</details>

When a value is missing, we just inject a `"Missing"`:

<details>
<summary>Categorical column with missing value(s)</summary>

```json
{
    "label": "missing_categorical",
    "values": [
        0,
        1,
        1,
        1,
        1
    ],
    "tickvals": [
        0,
        1,
        1,
        1,
        1
    ],
    "ticktext": [
        "bar",
        "Missing",
        "Missing",
        "Missing",
        "Missing"
    ]
}
```

</details>

There are a few gotchas here (don't hesitate on asking). It would be probably better to just check the source code: 
[Here is the logic for generating the content](https://github.com/iterative/dvc/blob/e1f47bd226b206661003b34d85b21837d0af78c0/dvc/render/plotly.py#L56-L73)


### {{LINE}}

The line property is what the defines the colors of the lines and the color bar showed on the right. In DVC, we reuse the existing flag `--sort-by` to select which column defines the colors (If `--sort-by` is not provided, we colorize using the `Experiment` column). 

The example plot above is generated by `dvc exp show --html --sort-by accuracy`.

#### Scalar lines

For **scalar** lines, we use the `values` of the associated `dimension` dict as `"color" and the `label` as `"colorbar.title"`:

<details>
<summary>Scalar line</summary>

```json
"line": {
    "color": [
        0.9038,
        0.9038,
        0.9095,
        0.9095,
        0.8929
    ],
    "showscale": true,
    "colorbar": {
        "title": "accuracy"
    }
}
```

</details>

#### Categorical lines

For **categorical** lines, in addition to what we do for scalars, we use the `tickvals` and `ticktext` of the associated `dimension` dict and set the `tickmode` to `"array"`:

<details>
<summary>Categorical line</summary>

```json
"line": {
    "color": [
        4,
        3,
        0,
        1,
        2
    ],
    "showscale": true,
    "colorbar": {
        "title": "Experiment",
        "tickmode": "array",
        "tickvals": [
            4,
            3,
            0,
            1,
            2
        ],
        "ticktext": [
            "workspace",
            "main",
            "5bcd44f",
            "b06a6ba",
            "d34fd8c"
        ]
    }
}
```

</details>

---

Hope this helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Story: Parallel coordinates plot #1117

Plot Structure

HTML template

{{DATA}}

{{LAYOUT}}

Filling {{DATA}}

Overview

{{DIMENSIONS}}

Scalar columns

Categorical columns

{{LINE}}

Scalar lines

Categorical lines

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment	Created	loss	accuracy	train.batch_size	train.hidden_units	train.dropout	train.num_epochs	train.lr	train.conv_activation	missing_categorical	missing_scalar
workspace	-	0.26484	0.9038	128	64	0.4	10	0.001	relu	bar	1
main	Sep 14, 2021	0.26484	0.9038	128	64	0.4	10	0.001	relu	-	-
5bcd44f	Sep 01, 2021	0.25026	0.9095	128	64	0.4	10	0.001	relu	-	-
b06a6ba	Aug 31, 2021	0.25026	0.9095	128	64	0.4	10	0.001	relu	-	-
d34fd8c	Aug 30, 2021	0.30741	0.8929	128	64	0.4	10	0.01	relu	-	-

Story: Parallel coordinates plot #1117

Description

Plot Structure

HTML template

{{DATA}}

{{LAYOUT}}

Filling {{DATA}}

Overview

{{DIMENSIONS}}

Scalar columns

Categorical columns

{{LINE}}

Scalar lines

Categorical lines

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions