Skip to content

Conversation

@daavoo
Copy link
Contributor

@daavoo daavoo commented Apr 21, 2023

Closes #8775
Closes #7167


import pandas as pd
from dvc.api import exp_show
df = pd.DataFrame(exp_show())
   Experiment        rev            typ       Created parent  ...                       data/train_data                  models/model.pkl                 src/data_split.py                   src/evaluate.py                      src/train.py
0              workspace       baseline          None         ...  c6b5cd86cb58b6f5a2ce4d4a95996bb4.dir  cbc3f65deeb3e17cbc50e572da003f8c  e25f97724df2d2dfdfd05e228b788a58  e2d5d64f6064e95b7661b239b8b20626  ef9d70ee318c6b345ba974af4d2af4e4
1                   main       baseline  Apr 03, 2023         ...  c6b5cd86cb58b6f5a2ce4d4a95996bb4.dir  cbc3f65deeb3e17cbc50e572da003f8c  e25f97724df2d2dfdfd05e228b788a58  e2d5d64f6064e95b7661b239b8b20626  ef9d70ee318c6b345ba974af4d2af4e4
2  heapy-vans    4c73fef  branch_commit  Apr 05, 2023         ...  c6b5cd86cb58b6f5a2ce4d4a95996bb4.dir  c5c073155e04165f3e016586c9d8eebc  e25f97724df2d2dfdfd05e228b788a58  e2d5d64f6064e95b7661b239b8b20626  ef9d70ee318c6b345ba974af4d2af4e4
3  coaly-raps    3a410a3  branch_commit  Apr 05, 2023         ...  c6b5cd86cb58b6f5a2ce4d4a95996bb4.dir  5496716df021bebec07f99035b00cd0d  e25f97724df2d2dfdfd05e228b788a58  e2d5d64f6064e95b7661b239b8b20626  ef9d70ee318c6b345ba974af4d2af4e4
4  awash-sons    d3b108d    branch_base  Apr 05, 2023         ...  c6b5cd86cb58b6f5a2ce4d4a95996bb4.dir  9a3748fc21bcc121252ecf4218ab90d6  e25f97724df2d2dfdfd05e228b788a58  e2d5d64f6064e95b7661b239b8b20626  ef9d70ee318c6b345ba974af4d2af4e4

[5 rows x 34 columns]

Using scm helpers:

import pandas as pd
from dvc.api import all_tags, exp_show
df = pd.DataFrame(exp_show(revs=all_tags()))
   Experiment                           rev       typ  ...              src/featurization.py                    src/prepare.py                      src/train.py
0                                 workspace  baseline  ...  e0265fc22f056a4b86d85c3056bc2894  f09ea0c15980b43010257ccb9f0055e2  c3961d777cfbd7727f9fde4851896006
1                                0-git-init  baseline  ...                              None                              None                              None
2                                1-dvc-init  baseline  ...                              None                              None                              None
3                        bigrams-experiment  baseline  ...  61c592707fd1b33e27819c87cf93f80a  51549a1c87b182ebdd785704f56ffaf1  9ab95496b29b6ea3418bbf20b9fe3473
4              11-random-forest-experiments  baseline  ...  61c592707fd1b33e27819c87cf93f80a  51549a1c87b182ebdd785704f56ffaf1  9ab95496b29b6ea3418bbf20b9fe3473
5                              2-track-data  baseline  ...                              None                              None                              None
6                           3-config-remote  baseline  ...                              None                              None                              None
7                             4-import-data  baseline  ...                              None                              None                              None
8                             5-source-code  baseline  ...                              None                              None                              None
9                           6-prepare-stage  baseline  ...                              None  51549a1c87b182ebdd785704f56ffaf1                              None
10                            7-ml-pipeline  baseline  ...  61c592707fd1b33e27819c87cf93f80a  51549a1c87b182ebdd785704f56ffaf1  9ab95496b29b6ea3418bbf20b9fe3473
11                      baseline-experiment  baseline  ...  61c592707fd1b33e27819c87cf93f80a  51549a1c87b182ebdd785704f56ffaf1  9ab95496b29b6ea3418bbf20b9fe3473
12                          9-bigrams-model  baseline  ...  61c592707fd1b33e27819c87cf93f80a  51549a1c87b182ebdd785704f56ffaf1  9ab95496b29b6ea3418bbf20b9fe3473
13                        classifier@v0.0.1  baseline  ...  e0265fc22f056a4b86d85c3056bc2894  f09ea0c15980b43010257ccb9f0055e2  ba77b061fc7967455cff3bd92b654a52

[14 rows x 32 columns]

TODO:

  • Docstring
  • Tests
  • dvc.org docs

@daavoo daavoo linked an issue Apr 21, 2023 that may be closed by this pull request
@daavoo daavoo self-assigned this Apr 21, 2023
@daavoo daavoo added A: api Related to the dvc.api A: experiments Related to dvc exp feature is a feature labels Apr 21, 2023
@daavoo daavoo force-pushed the 8775-api-add-dvcapiexp_show branch 5 times, most recently from 7631a01 to f29ef35 Compare April 26, 2023 09:39
@pmrowla
Copy link
Contributor

pmrowla commented Apr 26, 2023

Should api.exp_show return the table formatted data? or should it just match the structure you get with exp show --json (but in native python dicts/lists instead)?

To return the native python types it would just be

return [exp.dumpd() for exp in repo.experiments.show()]

I was also wondering if supporting the all_commits/branches/etc flags makes sense in the api? or if you should just pass a list of exp/branch/tag names or git shas and then get a dict mapping the specific requested revs to the data (without trying to organize it by baseline), which could just be something like

from repo.experiments.collect import collect_rev

return {
    rev: collect_rev(scm.resolve_rev(rev)).dumpd()
    for rev in revs
}

@codecov
Copy link

codecov bot commented Apr 26, 2023

Codecov Report

Patch coverage: 96.72% and no project coverage change.

Comparison is base (7e1036b) 92.60% compared to head (28e3a0b) 92.60%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9356   +/-   ##
=======================================
  Coverage   92.60%   92.60%           
=======================================
  Files         461      463    +2     
  Lines       37322    37378   +56     
  Branches     5380     5388    +8     
=======================================
+ Hits        34561    34615   +54     
- Misses       2207     2209    +2     
  Partials      554      554           
Impacted Files Coverage Δ
dvc/repo/experiments/show.py 92.94% <ø> (ø)
dvc/api/experiments.py 89.47% <88.88%> (-1.01%) ⬇️
dvc/api/__init__.py 100.00% <100.00%> (ø)
dvc/api/scm.py 100.00% <100.00%> (ø)
dvc/compare.py 97.82% <100.00%> (ø)
dvc/ui/table.py 100.00% <100.00%> (ø)
tests/func/api/test_experiments.py 100.00% <100.00%> (ø)
tests/func/api/test_scm.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@daavoo
Copy link
Contributor Author

daavoo commented Apr 26, 2023

Should api.exp_show return the table formatted data? or should it just match the structure you get with exp show --json (but in native python dicts/lists instead)?

To return the native python types it would just be

return [exp.dumpd() for exp in repo.experiments.show()]

I feel that from the user's perspective the current result from tabulate makes more sense than the native exp show --json.
I might be biased towards people wanting to put the result into a pandas DataFrame because that's what I would want.

I was also wondering if supporting the all_commits/branches/etc flags makes sense in the api? or if you should just pass a list of exp/branch/tag names or git shas and then get a dict mapping the specific requested revs to the data (without trying to organize it by baseline), which could just be something like

On this, I am also unsure. Assuming the tabulate format, I think maybe the best would be to just support rev, like params_show, and just ask users to stack the results if they really want multiple revs:

all_exps = []
for rev in revs:
    all_exps.extend(exp_show(rev=rev))

It would be nice if we complement this with some pubic helpers for getting those common sets of revisions

@daavoo daavoo force-pushed the 8775-api-add-dvcapiexp_show branch 2 times, most recently from 79d71d0 to 36970c0 Compare April 26, 2023 11:34
@dberenbaum
Copy link
Contributor

I feel that from the user's perspective the current result from tabulate makes more sense than the native exp show --json.
I might be biased towards people wanting to put the result into a pandas DataFrame because that's what I would want.

Agreed. I think we had similar conversations around exp show --csv and decided we should not do things like rounding floats but should try to reflect the UI table structure, and I think the same applies here.

Assuming the tabulate format, I think maybe the best would be to just support rev, like params_show, and just ask users to stack the results if they really want multiple revs:

Not a blocker, but IMO an arg that accepts multiple revs makes more sense since I think it's likely to be a much more common operation than for params_show.

@daavoo
Copy link
Contributor Author

daavoo commented Apr 26, 2023

Not a blocker, but IMO an arg that accepts multiple revs makes more sense since I think it's likely to be a much more common operation than for params_show.

But what about the all_{X} flags, should we include them in the API or not?

@dberenbaum
Copy link
Contributor

dberenbaum commented Apr 26, 2023

But what about the all_{X} flags, should we include them in the API or not?

I think it's fine to drop them if we have revs. WDYT?

[edit: tbh I still think they would be helpful unless we plan to provide a bunch of example on how to iterate over revisions (like getting all branches or all commits on a branch), but not a blocker.]

@daavoo daavoo force-pushed the 8775-api-add-dvcapiexp_show branch from 36970c0 to 0ccf532 Compare April 27, 2023 08:24
@daavoo
Copy link
Contributor Author

daavoo commented Apr 27, 2023

But what about the all_{X} flags, should we include them in the API or not?

I think it's fine to drop them if we have revs. WDYT?

[edit: tbh I still think they would be helpful unless we plan to provide a bunch of example on how to iterate over revisions (like getting all branches or all commits on a branch), but not a blocker.]

Pushed an update accepting only revs and providing all_{x} helpers as part of the dvc.api

@daavoo daavoo force-pushed the 8775-api-add-dvcapiexp_show branch from 0ccf532 to 852ae09 Compare April 27, 2023 08:51
@daavoo daavoo marked this pull request as ready for review April 27, 2023 08:52
@daavoo daavoo force-pushed the 8775-api-add-dvcapiexp_show branch from 852ae09 to 0c9699e Compare April 27, 2023 08:52
@daavoo daavoo requested review from a team and dberenbaum April 27, 2023 08:52
Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done QA, but the interface LGTM!

@daavoo daavoo enabled auto-merge (rebase) April 28, 2023 10:06
@daavoo daavoo force-pushed the 8775-api-add-dvcapiexp_show branch from 0c9699e to 28e3a0b Compare April 28, 2023 10:06
@daavoo daavoo merged commit 2a04263 into main Apr 28, 2023
@daavoo daavoo deleted the 8775-api-add-dvcapiexp_show branch April 28, 2023 10:33
@dberenbaum
Copy link
Contributor

dberenbaum commented Apr 28, 2023

@daavoo Can we make an issue to add this to all the example notebooks where we currently use !dvc exp show?

edit: also a docs pr 🙏

Comment on lines +122 to +131
hide_queued (bool, optional): hide experiments that are queued for
execution.
Defaults to `False`.
hide_failed (bool, optional): hide experiments that have failed.
sha (bool, optional): show the Git commit SHAs of the experiments
instead of branch, tag, or experiment names.
Defaults to `False`.
param_deps (bool, optional): include only parameters that are stage
dependencies.
Defaults to `False`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @daavoo, I should have reviewed this closer before merging. Nothing severe, but I have minor questions when you have a chance:

  • Why did you choose to include these options to hide rows or columns but not others like only-changed?
  • What's the point of including sha since there's already a rev column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose to include these options to hide rows or columns but not others like only-changed?

Because these are applied at the "low level" collection and the others are applied on the UI. I will open P.R. dropping them, though

What's the point of including sha since there's already a rev column?

There is no point.

daavoo added a commit that referenced this pull request May 1, 2023
daavoo added a commit that referenced this pull request May 1, 2023
daavoo added a commit to treeverse/dvc.org that referenced this pull request May 2, 2023
daavoo added a commit to treeverse/dvc.org that referenced this pull request May 2, 2023
daavoo added a commit to treeverse/dvc.org that referenced this pull request May 2, 2023
daavoo added a commit to treeverse/dvc.org that referenced this pull request May 15, 2023
* api-reference: Add `exp_show` and `scm`.

Per treeverse/dvc#9356

* Update content/docs/api-reference/exp_show.md

Co-authored-by: Dave Berenbaum <dave@iterative.ai>

* Add exp_show to sidebar

* updates

* Add example to scm

* fix

* Update content/docs/api-reference/exp_show.md

---------

Co-authored-by: Dave Berenbaum <dave@iterative.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: api Related to the dvc.api A: experiments Related to dvc exp feature is a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

api: add dvc.api.exp_show TabularData: Use None internally

3 participants