Skip to content

summon: first draft with python specific implementation#2971

Merged
efiop merged 32 commits into
masterfrom
unknown repository
Dec 27, 2019
Merged

summon: first draft with python specific implementation#2971
efiop merged 32 commits into
masterfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Dec 18, 2019

TODO:

  • Support overriding the default parameters through summon
  • Find a good name for the YAML and its contents (no magic references 🙁)
    • Staying with dvcsummon.yaml and objects for now
  • artifacts.yamldvcsummon.yaml
  • artifactobject
  • dvcsummon.yamlformulas.yaml
  • objectdata_object
  • Isolate summon parameters under the summon keyword
  • Handle unsafe situations when reading dvcsummon.yaml (Error handling)
    • Check schema with voluptuous
    • Define optional and required keys
    • Add single exception for Invalid schema
    • Add a key for type: python, show error otherwhise
  • Collect all the cache first and then pull (Not, that important, let's skip it)
  • Change paramsargs
  • Use import_string instead of importlib
  • Change sys.path instead of current directory with os.chdir
  • Add parameter for custom YAML file name in summon()
  • Write beautiful code 💅

Things omitted on this PR:

  • Support for different types other than python.
  • Checking requirements.

@shcheklein
Copy link
Copy Markdown
Contributor

@MrOutis enable the pre-commit to avoid restyled triggered?

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 18, 2019

@shcheklein , yes, already did :)

Copy link
Copy Markdown
Contributor

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
def summon(name, repo=None, rev=None):
# 1. Read grimoire.yaml
# 2. Pull dependencies (TODO)
# 3. Get the call and parameters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we suppose to "deserialize" not-python objects when the yml file contains type: CSV for examlpe?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important from an external library developer's point of view - when I develop a library to deserialize CSV or Parquet for example.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmpetrov , decided not to implement type: csv in this first iteration because it was ambiguous.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. But please make sure it won't affect design decisions - see my comment about yml file format and file/method/params.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmpetrov dvc.api.summon() is not supposed to deserialize type: python, we agreed to move this functionality into a separate library.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor right. But what the library supposes to call from DVC? I expected to have some generalized approach of getting objects. So, we can create the library and custom types easily on top of this DVC call.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor , did you mean type: csv ?

Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread tests/func/test_api.py Outdated
Comment thread tests/func/test_api.py Outdated
Comment thread dvc/api.py Outdated
@ghost ghost changed the title [wip] summon: ugly hack [wip] summon: first draft with python specific implementation Dec 18, 2019
@ghost ghost requested review from Suor, efiop and pared December 18, 2019 23:27
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread tests/func/test_api.py Outdated
Comment thread tests/func/test_api.py Outdated
Copy link
Copy Markdown
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a question about default name,
Is actually name, what we want to use by default?
Shouldn't it be {something}.dvcsummon? In case of no fname provided we could be gathering all the dvcsummon's and try to find proper object among them by name

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 19, 2019

I've got a question about default name,
Is actually name, what we want to use by default?
Shouldn't it be {something}.dvcsummon? In case of no fname provided we could be gathering all the dvcsummon's and try to find proper object among them by name

@pared, that's interesting. Not sure what we are planning to support.
As far as I understood, you can only use one summon.yaml at the time, so there's no need to search through all summon files as we do with dvcfiles .

@dmpetrov , @shcheklein , are we supporting this case? ☝️

Comment thread dvc/api.py Outdated
Comment on lines +121 to +123
old_dir = os.path.abspath(os.curdir)
os.chdir(path)
sys.path.insert(0, path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do both probably. Also, use try/finally

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed, @Suor, or at least I couldn't rely solely on os.chdir.
Looks like importlib.import_module doesn't care of the current directory (well, sys.path is not being updated when running os.chdir)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do you mean try/finally instead of the context manager, or inside the context manager?

Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Copy link
Copy Markdown
Contributor

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, you can only use one summon.yaml at the time, so there's no need to search through all summon files as we do with dvcfiles .

We discussed this above - I suggested to change the method signature.

Yes, we'd like to avoid the summon file search. At the same time, we cannot use a predefined file in a repo root dir - it blocks the entire scenario for mono-repo teams. So, summon.yaml is default file that can be changed by the user when user calls the methiod.

Comment thread dvc/api.py Outdated
def summon(name, repo=None, rev=None):
# 1. Read grimoire.yaml
# 2. Pull dependencies (TODO)
# 3. Get the call and parameters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. But please make sure it won't affect design decisions - see my comment about yml file format and file/method/params.

Comment thread tests/func/test_api.py Outdated
Comment thread dvc/api.py Outdated
def summon(name, repo=None, rev=None):
# 1. Read grimoire.yaml
# 2. Pull dependencies (TODO)
# 3. Get the call and parameters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor right. But what the library supposes to call from DVC? I expected to have some generalized approach of getting objects. So, we can create the library and custom types easily on top of this DVC call.

Comment thread dvc/api.py Outdated
@dmpetrov
Copy link
Copy Markdown
Contributor

@MrOutis I want to clarify a piece of requirements that might be not clear from the description. We need to support custome types such as CSV. I want to emphasize, that we should have all the functions in dvc.api to implement a custom type. So, dvc.api.summon should be general enough (we can have a separate general dvc.api.summon_ex and python specific dvc.api.summon) OR we should have a set of methods.

@ghost ghost requested a review from Suor December 24, 2019 08:18
Copy link
Copy Markdown
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a SummonError exception class, use it for all summon errors with appropriate messages. Add a test or a couple, which will catch those.

Comment thread dvc/api.py Outdated
# TODO: Write a meaningful docstring about `summon`
with _make_repo(repo, rev=rev) as _repo:

def pull_dependencies(deps):
Copy link
Copy Markdown
Contributor

@Suor Suor Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Might be moved outside.
  2. This is actually repo.pull(deps) after granularity patch is in, see dvc: support granularity for fetch/pull/push/status/checkout #3002

Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
try:
objects = ruamel.yaml.safe_load(fobj.read())["objects"]
objects = schema(objects)
return next(x for x in objects if x["name"] == name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, need to check there is only one object named name.

Copy link
Copy Markdown
Author

@ghost ghost Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to raise an error? Isn't it just better to pick the first one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need to raise an error, otherwise it might be a nasty surprise for a user.

Comment thread dvc/api.py Outdated
Comment thread dvc/api.py Outdated
Comment thread dvc/api.py

try:
os.chdir(path)
sys.path.insert(0, path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't need sys.path.insert() if we already chdired into in.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor , left a commend right here: #2971 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only it works as long as you have '.' or '' in sys.path, which is the default. Without that lots of python code won't work. Are you sure imports don't work for you without that? Can you test outside dvc on a simple script?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might be the reason why random tests failed.

Comment thread tests/func/test_api.py Outdated
Comment thread dvc/api.py Outdated
yield repo


def summon(name, fname="dvcsummon.yaml", args=None, repo=None, rev=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def summon(name, fname="dvcsummon.yaml", args=None, repo=None, rev=None):
def summon(name, repo=None, rev=None, summon_file="dvcsummon.yaml", args=None):

Comment thread dvc/api.py
Comment on lines +147 to +150
# XXX: Some issues with this approach:
# * Not thread safe
# * Import will pollute sys.modules
# * Weird errors if there is a name clash within sys.modules
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good non complicated way to handle this in Python 2. I will come up with an approach next sprint. Leave as is for now.

Comment thread dvc/api.py Outdated
]
},
required=True,
extra=ALLOW_EXTRA,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not allow extra everywhere. May not allow it anywhere for now for better future-compatibility. Say only allow objects at the top, and only allow name, meta and summon in the object, all the user data should go under meta, like description, paper, author, etc.

Comment thread dvc/api.py
pull_dependencies(info.get("deps"))

_args = copy.deepcopy(info.get("args", {}))
_args.update(args or {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either remove or {} here or a default value to .get() on prev line.

Comment thread dvc/api.py Outdated
Comment on lines +138 to +139
# XXX: Prints ugly absolute path.
raise SummonError("No such YAML file with path: '{}'".format(path))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should contain whatever user passed, e.g. file ... not found in repo ...

Comment thread dvc/api.py Outdated
# XXX: Prints ugly absolute path.
raise SummonError("No such YAML file with path: '{}'".format(path))
except ruamel.yaml.YAMLError as exc:
raise SummonError("Failed to parse YAML correctly", exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to say which file in which repo and a line/message of an error if available.

Comment thread dvc/api.py Outdated
raise SummonError("Failed to parse YAML correctly", exc)
except Invalid as exc:
# XXX: Doesn't point out the error.
raise SummonError("YAML file dosen't match with the schema", exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just yaml file, this is a "summon file" is broken.

Comment thread dvc/api.py Outdated
# XXX: Doesn't point out the error.
raise SummonError("YAML file dosen't match with the schema", exc)
except StopIteration:
raise SummonError("No such object with name '{}'".format(name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, need to add summon_file and repo. We might actually add it outside where this info is available. This will fix all of the above and wouldn't need those two be passed here.

Comment thread dvc/api.py

try:
os.chdir(path)
sys.path.insert(0, path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only it works as long as you have '.' or '' in sys.path, which is the default. Without that lots of python code won't work. Are you sure imports don't work for you without that? Can you test outside dvc on a simple script?

@Suor
Copy link
Copy Markdown
Contributor

Suor commented Dec 25, 2019

It looks like something is wrong with Python 2.7. Besides that looks good)

@dmpetrov
Copy link
Copy Markdown
Contributor

dmpetrov commented Dec 25, 2019

@MrOutis it looks like you had a great time on Christmas eve (I'm not sure which emoji to put here)
🎄👨‍💻?

@Suor
Copy link
Copy Markdown
Contributor

Suor commented Dec 26, 2019

Are all the remaining fails unrelated?

@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 27, 2019

Are all the remaining fails unrelated?

No, @Suor , looks like I've introduced some bugs 🤔
Tests are running correctly on master but my branch fails to run the test suite without errors

Comment thread dvc/api.py Outdated
Comment thread dvc/api.py
@ghost
Copy link
Copy Markdown
Author

ghost commented Dec 27, 2019

No, @Suor , looks like I've introduced some bugs thinking
Tests are running correctly on master but my branch fails to run the test suite without errors

Looks like sys.path.pop wasn't on the correct index

@ghost ghost changed the title [wip] summon: first draft with python specific implementation summon: first draft with python specific implementation Dec 27, 2019
@efiop efiop merged commit 1038cfc into treeverse:master Dec 27, 2019
@ghost ghost deleted the summon branch December 27, 2019 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants