Currently ocrd workspace clone will by default only download/copy the mets.xml and none of the files linked in it.
The reasoning behind this was to reduce bandwidth requirements to allow users to download only the files they needed.
However, with the strong focus on relative file paths and general FS-based operations, I propose to invert the logic: By default download/copy all the files and turn that behavior off with a flag.
The main problem I have with the current approach is that it will break resolving relative files. While the process is running, the src_baseurl of the workspace is still known, but it is gone after process ends. Hence you end up with a mets.xml file with relative files and no idea where it came from or how to resolve the files.
Currently
ocrd workspace clonewill by default only download/copy the mets.xml and none of the files linked in it.The reasoning behind this was to reduce bandwidth requirements to allow users to download only the files they needed.
However, with the strong focus on relative file paths and general FS-based operations, I propose to invert the logic: By default download/copy all the files and turn that behavior off with a flag.
The main problem I have with the current approach is that it will break resolving relative files. While the process is running, the
src_baseurlof the workspace is still known, but it is gone after process ends. Hence you end up with a mets.xml file with relative files and no idea where it came from or how to resolve the files.