-
Notifications
You must be signed in to change notification settings - Fork 535
Description
We do not store the size of the saved original in the database; so whenever we need that, we have to look the file up in storage. The cost may be considered negligible with files on disk; but with S3 and/or Swift, it becomes quite real. (tl;dr - see the note at the end)
Saving the size, just like we are already saving the original file format, would be both easy and helpful. The DataTable object is the obvious place for it. As in
String origFormat = dataTable.getOriginalFileFormat();
Long origSize = dataTable.getOriginalFileSize();
We also have users specifically requesting this size in the output of the datasets/files api. (See #5321) Once it is in the database, adding it to the api output will be trivial.
So it will be something like:
"contentType": "text/tab-separated-values",
"originalFileFormat": "application/x-stata",
"originalFileSize": 123456,
...
This is really something like 1 or 1.5 on our scale. 2, with an API call for retroactively populating the sizes for the already existing tabular files.
tl;dr note: Yes, there are situations where we need to know the sizes of (multiple) originals, without and/or before opening the files and reading the contents. For example, when a user requests the originals for a whole dataset, or otherwise a whole bunch of them - for various reasons, we have to make a decision as to how many/which of the files we can pack into a single zip w/out going over the size limit, before we start reading any bytes/writing the output. Having to do S3 open() calls - that are notoriously expensive, comparatively - in order to get the sizes, it results in a potentially long delay.