Conversation
|
I updated the messaging based on discussion during the Design Meeting. |
scolapasta
left a comment
There was a problem hiding this comment.
@sekmiller I added some comments; please review and let me know what you think.
| // on *new* datafiles, that haven't been saved in the database yet; | ||
| // but it should never be the case in the context of this API) | ||
| // -- L.A. Mar. 2020 | ||
| //SEK 5/2020 - we can't use checksum because it |
There was a problem hiding this comment.
rather than leave (and add) to these comments, at this point we can probably remove all (from the "not sure.." . Feel free to check with @landreev if he feels any reason these should stay?
There was a problem hiding this comment.
did you check if we can just remove?
|
|
||
| - Files with the same checksum can be included in a dataset, even if the files are in the same directory. | ||
| - Files with the same filename can be included in a dataset as long as the files are in different directories. | ||
| - If a user attempts to add a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded. |
There was a problem hiding this comment.
is it worth being more explicit here about upload vs edit? i.e. related to the next bullet about changing the directory, is also changing the name after upload. (so the suggestion is to add that there, and change add to upload here)
There was a problem hiding this comment.
I made a change to the doc here, please see that it helps to add some clarity.
|
|
||
| } | ||
|
|
||
| public void deleteMarkedAsDuplicateFiles() { |
There was a problem hiding this comment.
does this need to be a separate method? or could it call delete files after setting the selected files from the files with same checksum? Or maybe better yet, have a private delete method that takes a list, have this call this with the list of files with same checksum and have the other (regular) delete call the private method with the list of selected files. (you also have to be careful to differentiate between delete of newly uploaded vs pre-existing, but I think in the end it would be cleaner (and less duplicate) code. Let me know what you think of this suggestion.
There was a problem hiding this comment.
Makes sense. i'll take a look. I know we have different success messages - that were introduced yesterday, but it might not be too bad to consolidate.
There was a problem hiding this comment.
Was able to use only one method for delete. Ran into a weird situation - real edge case - where if you uploaded a dupe, allowed it to stay, uploaded it again, decided no once is enough so deleted the new one it would also delete the first dupe that was "ok" - so fixed that, allowing the first dupe to stay.
There was a problem hiding this comment.
Turns out the one method way (and checking if selected to determine which delete is wanted) is problematic, so we do still need two public methods. BUT they just need to "collect" the files and then call a private delete which still centralizes the code. @sekmiller's aware of the change needed, and will move to QA once that is all set.
| @Transient | ||
| private boolean markedAsDuplicate; | ||
|
|
||
| public boolean isMarkedAsDuplicate() { |
There was a problem hiding this comment.
is this for checksums? if so, do we need, since we already have it on the datafile object?
There was a problem hiding this comment.
We actually do need it because there's a separate list of file metadata that feed the uploaded files table and if the user deletes the duplicates we have to remove them too. I was also using it to write the success message, but since were no longer using names there I did some code simplification there.
There was a problem hiding this comment.
But can't you call fileMetaData.datafile.isMarkedAsDuplicate()? or something like that?
# Conflicts: # src/main/webapp/editFilesFragment.xhtml
|
So, I tested all the rules and used cases as best I could. I think they all work except on upload when paths are involved for duplicates that have not yet been saved.
|

What this PR does / why we need it:
This will allow users to upload multiple files with the same checksum value to a given dataset. On upload either via the interface or api the user will be warned that a duplicate file exists in the dataset (with the current path/label of the duplicate file.) At that point they can directly delete the newly uploaded file - if the upload is via the UI.
Which issue(s) this PR closes:
#4813 - allow files with the same MD5/checksum to exist in the same dataset
Closes #4813
Closes #6468
Special notes for your reviewer:
Really wanted to take a stick of dynamite to AddReplaceFileHelper, but ended up working with it as it exists. Also fixed an issue with the editFileMetadata api where if you weren't updating the file's label you'd get an duplicate file name error. This was causing a failure in the .testForceReplaceAndUpdate Test
Suggestions on how to test this:
various scenarios uploading a duplicate file including replace
document outlining upload use cases and expected messaging
Does this PR introduce a user interface change?:
Introduces a popup on upload of a duplicate file, which warns the user and allows them to immediately delete the newly uploaded file
Is there a release notes update needed for this change?:
We could note that duplicate files within a dataset are now allowed as a new feature
Additional documentation: