Generalize file up/download; use file-system prefix instead of FileSystemStorage by reelmatt · Pull Request #47 · PyWorkflowApp/visual-programming

reelmatt · 2020-04-10T16:33:58Z

This PR at least partially addresses some of the bugs @reddigari summarized in #43 dealing with file naming/access.

Workflow still relies on Django's FileSystemStorage API, but is now passed in to the constructor rather than being defined in multiple places: pyworkflow/node.py and workflow/node views.py. Most file operations are still handled in the Workflow class, but the create_node method can now do the following:

json_data["options"][field] = request.pyworkflow.fs.path(opt_value)

when a user uploads a file during node configuration, or when a download is triggered from WriteCsv. If a user enters test.csv for the output file, when the update endpoint is triggered, this is converted to /tmp/test.csv using the Workflow's file system.

This helps solve the "giant security problem" of downloads, if we switch the endpoint to POST with the Node's info. It's still hard-coded for the WriteCsvNode because it looks up the Node's path_or_buf option where the output filename is stored, but this does prevent arbitrary downloads.

~~The store_node_data and retrieve_node_data static methods in Workflow haven't been updated but I think (?) these can be done easily if this approach seems good.~~

Uses the Workflow's assigned file_system (defaulting to the Django FileSystemStorage).

reddigari

Left a comment in #43 about my concerns with putting fs entirely inside the Workflow, but outside of that this looks great!

Any thoughts on how to avoid the explicit use of path_or_buf which is exclusive to pandas.DataFrame.to_<filetype>()? Since it seems like we're moving away from calling the pandas functions with **kwargs, perhaps we could have an option name dedicated to a to-be-downloaded file (e.g. options["output_file"])?

vp/workflow/views.py

reelmatt · 2020-04-10T18:01:25Z

One of the ideas I had behind the intermediate classes like IONode and ManipulationNode was it could be useful to store information common to any node of that type (e.g. 'files' for IONodes, and ¯_(ツ)_/¯ for ManipulationNodes). ReadCsv would then treat that 'file' as input and WriteCsv would treat it as output, but both could be read/accessed similarly. And this could avoid having to specify 'input_file' or 'output_file' when configuring the Node; you could just have 'file' and then the concrete Node implementation would figure it out as long as some value is provided.

Likewise, this could probably help when creating nodes too. I think we could avoid checking the OPTION_TYPES completely à la

for field, info in json_data.get("option_types", dict()).items():
    if info["type"] == "file" or info["name"] == "Filename":
        ...

and instead just check DEFAULT_OPTIONS if we knew any file/filepath is stored in a generic "file" attribute.

json_data["options"]["file"]

Reflects further discssion from #43

reelmatt · 2020-04-10T21:21:27Z

This latest commit—51c4ee5— does change file handling a bit more than the initial PR which moved FileSystemStorage solely to pyworkflow/workflow.py. The API is now not used, an instead settings.MEDIA_ROOT is passed in to the Workflow constructor and regular Python os calls are used.

I think this better reflects the discussion from #43, but is obviously a bigger change with potentially big implications. If this deviates too far from what others are thinking, we can always revert to before entirely, or c664947 from earlier in this PR.

reddigari

Consider a method Workflow.full_path(self, filename) to abstract all the os.path.join() calls.

I'm super happy with this refactor, but as you said, if there are consequences I'm not seeing, reverting to the previous commit is totally reasonable.

reddigari · 2020-04-10T22:26:22Z

pyworkflow/pyworkflow/workflow.py

    """

-    def __init__(self, graph=nx.DiGraph(), file_path=None, name='a-name', flow_vars=nx.Graph(), file_system=fs):
+    def __init__(self, graph=nx.DiGraph(), file_path=None, name='a-name', flow_vars=nx.Graph(), root_dir=settings.MEDIA_ROOT):


I tried writing a pyworkflow unit test that imported django.conf.settings, and it complains about Django not being configured. So I would either set /tmp as the default, or make it None and then set it to os.getcwd() when checking its existence in __init__. That would allow someone to use pyworkflow as a regular-old-python-module and have all their output where they're working.

diegostruk

Looks good! I ran the code and couldn't find any bugs in the functionality but like you mentioned worse case scenario we can revert.

hcat-pge added 8 commits April 10, 2020 11:23

chore: Remove duplicated create_node method

b5b3f97

chore: Change global flow_var assignment

1a37a52

refactor: Move fs to Workflow constructor

9b3e073

feat: Add upload/download methods to Workflow

fa168d8

Uses the Workflow's assigned file_system (defaulting to the Django FileSystemStorage).

refactor: Change upload/download endpoints to new Workflow fs

a23b936

fix: Add exception-handling for upload

639a321

fix: Use Workflow fs in Node endpoints

877a687

fix: Save/restore Workflow fs when going to/from session

c664947

reelmatt requested review from cesaragv, cesarnda, diegostruk, matthew-t-smith and reddigari April 10, 2020 16:33

reddigari reviewed Apr 10, 2020

View reviewed changes

vp/workflow/views.py Outdated Show resolved Hide resolved

reddigari approved these changes Apr 10, 2020

View reviewed changes

reelmatt mentioned this pull request Apr 10, 2020

File naming/access in pyworkflow vs. Django vs. CLI #43

Closed

hcat-pge added 2 commits April 10, 2020 17:03

fix: use get() for uploaded files to avoid KeyError

82b4dbf

refactor: Change remaining FileSystemStorage refs in pyworkflow

51c4ee5

Reflects further discssion from #43

reelmatt requested a review from reddigari April 10, 2020 21:21

reelmatt changed the title ~~Move fs to Workflow object; generalize file upload/download~~ Generalize file up/download; use file-system prefix instead of FileSystemStorage Apr 10, 2020

reddigari approved these changes Apr 10, 2020

View reviewed changes

diegostruk approved these changes Apr 11, 2020

View reviewed changes

reddigari merged commit a5e7aca into master Apr 11, 2020

reelmatt linked an issue Apr 11, 2020 that may be closed by this pull request

File naming/access in pyworkflow vs. Django vs. CLI #43

Closed

reelmatt deleted the dev/mthomas branch April 11, 2020 17:47

reelmatt mentioned this pull request Apr 12, 2020

Clean up Workflow methods; update naming for Workflow/edit endpoint #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize file up/download; use file-system prefix instead of FileSystemStorage#47

Generalize file up/download; use file-system prefix instead of FileSystemStorage#47
reddigari merged 10 commits intomasterfrom
dev/mthomas

reelmatt commented Apr 10, 2020 •

edited

Loading

Uh oh!

reddigari left a comment •

edited

Loading

Uh oh!

Uh oh!

reelmatt commented Apr 10, 2020 •

edited

Loading

Uh oh!

reelmatt commented Apr 10, 2020

Uh oh!

reddigari left a comment

Uh oh!

reddigari Apr 10, 2020

Uh oh!

diegostruk left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

reelmatt commented Apr 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reddigari left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reelmatt commented Apr 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reelmatt commented Apr 10, 2020

Uh oh!

reddigari left a comment

Choose a reason for hiding this comment

Uh oh!

reddigari Apr 10, 2020

Choose a reason for hiding this comment

Uh oh!

diegostruk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

reelmatt commented Apr 10, 2020 •

edited

Loading

reddigari left a comment •

edited

Loading

reelmatt commented Apr 10, 2020 •

edited

Loading