Generalize file up/download; use file-system prefix instead of FileSystemStorage#47
Generalize file up/download; use file-system prefix instead of FileSystemStorage#47
Conversation
Uses the Workflow's assigned file_system (defaulting to the Django FileSystemStorage).
There was a problem hiding this comment.
Left a comment in #43 about my concerns with putting fs entirely inside the Workflow, but outside of that this looks great!
Any thoughts on how to avoid the explicit use of path_or_buf which is exclusive to pandas.DataFrame.to_<filetype>()? Since it seems like we're moving away from calling the pandas functions with **kwargs, perhaps we could have an option name dedicated to a to-be-downloaded file (e.g. options["output_file"])?
|
One of the ideas I had behind the intermediate classes like Likewise, this could probably help when creating nodes too. I think we could avoid checking the for field, info in json_data.get("option_types", dict()).items():
if info["type"] == "file" or info["name"] == "Filename":
...and instead just check json_data["options"]["file"] |
Reflects further discssion from #43
|
This latest commit— I think this better reflects the discussion from #43, but is obviously a bigger change with potentially big implications. If this deviates too far from what others are thinking, we can always revert to before entirely, or |
fs to Workflow object; generalize file upload/download
reddigari
left a comment
There was a problem hiding this comment.
Consider a method Workflow.full_path(self, filename) to abstract all the os.path.join() calls.
I'm super happy with this refactor, but as you said, if there are consequences I'm not seeing, reverting to the previous commit is totally reasonable.
| """ | ||
|
|
||
| def __init__(self, graph=nx.DiGraph(), file_path=None, name='a-name', flow_vars=nx.Graph(), file_system=fs): | ||
| def __init__(self, graph=nx.DiGraph(), file_path=None, name='a-name', flow_vars=nx.Graph(), root_dir=settings.MEDIA_ROOT): |
There was a problem hiding this comment.
I tried writing a pyworkflow unit test that imported django.conf.settings, and it complains about Django not being configured. So I would either set /tmp as the default, or make it None and then set it to os.getcwd() when checking its existence in __init__. That would allow someone to use pyworkflow as a regular-old-python-module and have all their output where they're working.
diegostruk
left a comment
There was a problem hiding this comment.
Looks good! I ran the code and couldn't find any bugs in the functionality but like you mentioned worse case scenario we can revert.
This PR at least partially addresses some of the bugs @reddigari summarized in #43 dealing with file naming/access.
Workflow still relies on Django'sFileSystemStorageAPI, but is now passed in to the constructor rather than being defined in multiple places:pyworkflow/node.pyand workflow/nodeviews.py. Most file operations are still handled in theWorkflowclass, but thecreate_nodemethod can now do the following:when a user uploads a file during node configuration, or when a download is triggered from
WriteCsv. If a user enterstest.csvfor the output file, when theupdateendpoint is triggered, this is converted to/tmp/test.csvusing the Workflow's file system.This helps solve the "giant security problem" of downloads, if we switch the endpoint to POST with the Node's info. It's still hard-coded for the
WriteCsvNodebecause it looks up the Node'spath_or_bufoption where the output filename is stored, but this does prevent arbitrary downloads.Thestore_node_dataandretrieve_node_datastatic methods in Workflow haven't been updated but I think (?) these can be done easily if this approach seems good.