Skip to content

add toml support for ParamsDependency#4258

Merged
efiop merged 2 commits into
treeverse:masterfrom
dtrifiro:toml-params-support
Jul 24, 2020
Merged

add toml support for ParamsDependency#4258
efiop merged 2 commits into
treeverse:masterfrom
dtrifiro:toml-params-support

Conversation

@dtrifiro
Copy link
Copy Markdown
Contributor

@dtrifiro dtrifiro commented Jul 22, 2020

We've been testing DVC with a few of our projects and for some of those we use a toml config to store parameters.
This PR adds support for parameters file in toml format. Example:

dvc run -n train                      \
-d data/input.csv                   \
-o model/model.h5                \
-M model/train_history.csv   \
-p params.toml:model.training \
python train.py

Where params.toml looks something like this:

[transformers.numerical_quantile_transformer]
n_quantiles = 300
output_distribution = "uniform"

[model.training]
epochs = 50
verbose = 1

Comment thread dvc/dependency/param.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use yaml to load json as well.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just try checking toml suffix, and then fallback to yaml.safe_load.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just force-pushed a check for suffix (use yaml.safe_load() for json and yaml)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this misses the yml. So, it's better to just:

if suffix == "toml":
    # toml load
else:
   # yaml load

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually safe for us to guess the file format from the suffix here? It seems like we'll potential to run into issues where users have params filenames with unexpected capitalization on case-sensitive filesystems, or where they are using .yml instead of .yaml.

It seems to me we should maybe just try to load as yaml (and json) first, then try toml if that fails, then error out?

Copy link
Copy Markdown
Contributor Author

@dtrifiro dtrifiro Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a loader map which is based on the lowercase file suffix, which defaults to yaml.safe_load for unknown suffixes. I also updated _read_params

Copy link
Copy Markdown
Contributor

@efiop efiop Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with @pmrowla , it is better to just try to load and catch yaml formatting error than only trying to judge by the extension.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last force-push I ended up using another approach (default to yaml.safe_load, use toml if lowercase suffix matches) because I felt a nested try/catch a bit awkward, especially if there are other loaders different from yaml/json and toml (not that I think there's much more, these 3 alone probably cover most of the use cases).

What is your preference?

Also, I saw that the Travis build fails on Windows and Mac (probably due to misconfiguration?) is there anything I need to do about that? Thanks

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop, In case of unknown file format, sure, it could fallback to yaml; but, if the file is clearly a toml file, it should just try to load as toml and fail.

This is what @dtrifiro is doing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let's keep it. 👍

@dtrifiro dtrifiro force-pushed the toml-params-support branch from 06fd02a to 2900649 Compare July 22, 2020 08:23
Comment thread tests/unit/dependency/test_params.py Outdated
@dtrifiro dtrifiro force-pushed the toml-params-support branch from 2900649 to edebcb4 Compare July 22, 2020 11:37
Comment thread dvc/repo/params/show.py
res[str(config)] = yaml.safe_load(fobj)
except yaml.YAMLError:
res[str(config)] = ParamsDependency.PARAMS_FILE_LOADERS[
config.suffix.lower()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@shcheklein
Copy link
Copy Markdown
Contributor

Please, don't forget to update the docs after this is merged.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 22, 2020

Codecov Report

Merging #4258 into master will decrease coverage by 0.47%.
The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4258      +/-   ##
==========================================
- Coverage   91.77%   91.30%   -0.48%     
==========================================
  Files         173      173              
  Lines       11743    11772      +29     
==========================================
- Hits        10777    10748      -29     
- Misses        966     1024      +58     
Impacted Files Coverage Δ
dvc/external_repo.py 86.63% <94.44%> (+0.03%) ⬆️
dvc/scm/git.py 88.12% <100.00%> (+0.27%) ⬆️
dvc/system.py 62.39% <0.00%> (-26.50%) ⬇️
dvc/utils/fs.py 88.88% <0.00%> (-3.97%) ⬇️
dvc/analytics.py 92.59% <0.00%> (-3.71%) ⬇️
dvc/stage/run.py 94.02% <0.00%> (-2.99%) ⬇️
dvc/tree/ssh/connection.py 79.80% <0.00%> (-2.41%) ⬇️
dvc/daemon.py 54.38% <0.00%> (-1.76%) ⬇️
dvc/tree/local.py 93.86% <0.00%> (-0.95%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b77ce02...edebcb4. Read the comment docs.

@efiop efiop requested a review from skshetry July 24, 2020 02:21
@efiop
Copy link
Copy Markdown
Contributor

efiop commented Jul 24, 2020

@dtrifiro Could you please submit a docs PR for https://dvc.org/doc/command-reference/run ?

Copy link
Copy Markdown
Collaborator

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It'd be great if you could move these logics to somewhere else, like dvc/repo/params/utils.py as load_params(path_info), but I'm fine with it for now though.

@dtrifiro
Copy link
Copy Markdown
Contributor Author

dtrifiro commented Jul 24, 2020

@dtrifiro Could you please submit a docs PR for https://dvc.org/doc/command-reference/run ?

@efiop
treeverse/dvc.org#1612

@efiop
Copy link
Copy Markdown
Contributor

efiop commented Jul 24, 2020

@dtrifiro Thank you so much! 🙏

@efiop efiop merged commit e2b1fb1 into treeverse:master Jul 24, 2020
@dtrifiro dtrifiro deleted the toml-params-support branch July 27, 2020 07:47
@skshetry skshetry mentioned this pull request May 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants