4813 allow duplicate files by sekmiller · Pull Request #6924 · IQSS/dataverse

sekmiller · 2020-05-20T19:42:15Z

What this PR does / why we need it:
This will allow users to upload multiple files with the same checksum value to a given dataset. On upload either via the interface or api the user will be warned that a duplicate file exists in the dataset (with the current path/label of the duplicate file.) At that point they can directly delete the newly uploaded file - if the upload is via the UI.

Which issue(s) this PR closes:
#4813 - allow files with the same MD5/checksum to exist in the same dataset

Closes #4813
Closes #6468

Special notes for your reviewer:
Really wanted to take a stick of dynamite to AddReplaceFileHelper, but ended up working with it as it exists. Also fixed an issue with the editFileMetadata api where if you weren't updating the file's label you'd get an duplicate file name error. This was causing a failure in the .testForceReplaceAndUpdate Test

Suggestions on how to test this:
various scenarios uploading a duplicate file including replace
document outlining upload use cases and expected messaging

Does this PR introduce a user interface change?:
Introduces a popup on upload of a duplicate file, which warns the user and allows them to immediately delete the newly uploaded file

Is there a release notes update needed for this change?:
We could note that duplicate files within a dataset are now allowed as a new feature

Additional documentation:

coveralls · 2020-05-20T19:52:01Z

Coverage decreased (-0.04%) to 19.562% when pulling 57ab613 on 4813-allow-duplicate-files into a6f580f on develop.

sekmiller · 2020-06-02T14:59:06Z

I updated the messaging based on discussion during the Design Meeting.

scolapasta

@sekmiller I added some comments; please review and let me know what you think.

scolapasta · 2020-05-28T21:34:29Z

src/main/java/edu/harvard/iq/dataverse/api/Files.java

                    // on *new* datafiles, that haven't been saved in the database yet;
                    // but it should never be the case in the context of this API) 
                    // -- L.A. Mar. 2020
+                    //SEK 5/2020 - we can't use checksum because it


rather than leave (and add) to these comments, at this point we can probably remove all (from the "not sure.." . Feel free to check with @landreev if he feels any reason these should stay?

did you check if we can just remove?

scolapasta · 2020-07-29T19:08:11Z

doc/sphinx-guides/source/user/dataset-management.rst

+
+- Files with the same checksum can be included in a dataset, even if the files are in the same directory.
+- Files with the same filename can be included in a dataset as long as the files are in different directories.
+- If a user attempts to add a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded. 


is it worth being more explicit here about upload vs edit? i.e. related to the next bullet about changing the directory, is also changing the name after upload. (so the suggestion is to add that there, and change add to upload here)

I made a change to the doc here, please see that it helps to add some clarity.

scolapasta · 2020-07-29T19:13:43Z

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java


    }
+
+    public void deleteMarkedAsDuplicateFiles() {


does this need to be a separate method? or could it call delete files after setting the selected files from the files with same checksum? Or maybe better yet, have a private delete method that takes a list, have this call this with the list of files with same checksum and have the other (regular) delete call the private method with the list of selected files. (you also have to be careful to differentiate between delete of newly uploaded vs pre-existing, but I think in the end it would be cleaner (and less duplicate) code. Let me know what you think of this suggestion.

Makes sense. i'll take a look. I know we have different success messages - that were introduced yesterday, but it might not be too bad to consolidate.

Was able to use only one method for delete. Ran into a weird situation - real edge case - where if you uploaded a dupe, allowed it to stay, uploaded it again, decided no once is enough so deleted the new one it would also delete the first dupe that was "ok" - so fixed that, allowing the first dupe to stay.

Turns out the one method way (and checking if selected to determine which delete is wanted) is problematic, so we do still need two public methods. BUT they just need to "collect" the files and then call a private delete which still centralizes the code. @sekmiller's aware of the change needed, and will move to QA once that is all set.

scolapasta · 2020-07-29T19:16:27Z

src/main/java/edu/harvard/iq/dataverse/FileMetadata.java

+    @Transient
+    private boolean markedAsDuplicate;
+
+    public boolean isMarkedAsDuplicate() {


is this for checksums? if so, do we need, since we already have it on the datafile object?

We actually do need it because there's a separate list of file metadata that feed the uploaded files table and if the user deletes the duplicates we have to remove them too. I was also using it to write the success message, but since were no longer using names there I did some code simplification there.

But can't you call fileMetaData.datafile.isMarkedAsDuplicate()? or something like that?

# Conflicts: # src/main/webapp/editFilesFragment.xhtml

kcondon · 2020-07-31T22:23:18Z

So, I tested all the rules and used cases as best I could. I think they all work except on upload when paths are involved for duplicates that have not yet been saved.

Upload file2.txt twice, edit paths to be c,d. Cannot save, says duplicate filenames.
This works if we upload a zip with dupe filenames in different paths.

sekmiller · 2020-08-03T13:47:46Z

I was able to upload duplicate files and edit their paths and successfully save them

sekmiller and others added 10 commits May 12, 2020 16:58

#4813 allow duplicate files to be uploaded

f311312

Merge branch 'develop' into 4813-allow-duplicate-files

cfe629d

#4813 Allow replacement with the same file

abcaa3b

Merge branch 'develop' into 4813-allow-duplicate-files

ce34432

#4813 update comments

4743b23

#4813 update test and bundle

ca7ba78

#4813 fix failing test

17869df

Merge branch 'develop' into 4813-allow-duplicate-files

34ce725

#4813 fix filemetadata testing

9b4152c

Create 4813-allow-duplicate-files.md

801d4bb

Merge branch 'develop' into 4813-allow-duplicate-files

d1139a9

djbrooke assigned sekmiller May 20, 2020

#4813 remove extraneous logging

aa02d34

sekmiller removed their assignment May 20, 2020

scolapasta assigned scolapasta and sekmiller and unassigned scolapasta May 28, 2020

sekmiller added 7 commits May 29, 2020 08:57

#4813 fix display of file names in popup

0cea0da

Merge branch 'develop' into 4813-allow-duplicate-files

2997792

#4813 Do not allow replace with same file

121620c

Merge branch 'develop' into 4813-allow-duplicate-files

5012eb6

#4813 modify inline error message for dupes

b24b3c2

#4813 add more detail to duplicate messages

1182c54

#4813 grammar

6fb7c93

sekmiller added 4 commits June 3, 2020 09:26

Merge branch 'develop' into 4813-allow-duplicate-files

86294eb

#4813 update button label

21a2a7f

Merge branch 'develop' into 4813-allow-duplicate-files

f82ca33

#4813 add note about duplicate file path/name

a323db5

TaniaSchlatter removed their assignment Jul 29, 2020

scolapasta reviewed Jul 29, 2020

View reviewed changes

scolapasta assigned sekmiller Jul 29, 2020

sekmiller added 8 commits July 29, 2020 16:00

Merge branch 'develop' into 4813-allow-duplicate-files

8169a1b

# Conflicts: # src/main/webapp/editFilesFragment.xhtml

#4813 redo update buttons

cef57dc

#4813 remove file names from delete success msg

031b253

Merge branch 'develop' into 4813-allow-duplicate-files

a4be5eb

#4813 code cleanup

46badd3

Merge branch 'develop' into 4813-allow-duplicate-files

90edc9e

#4813 clarify file name/path editing rules

0e99e15

#4813 removing out of date comments

d0bfd3d

sekmiller removed their assignment Jul 30, 2020

djbrooke assigned scolapasta and sekmiller Jul 30, 2020

sekmiller added 2 commits July 30, 2020 15:30

#4813 remove marked as dup from file metadata

82f9825

#4813 separate retrieval of files for Deletion

57ab613

sekmiller removed their assignment Jul 30, 2020

scolapasta approved these changes Jul 30, 2020

View reviewed changes

scolapasta removed their assignment Jul 30, 2020

kcondon self-assigned this Jul 31, 2020

kcondon assigned sekmiller and unassigned kcondon Jul 31, 2020

kcondon assigned kcondon and unassigned sekmiller Aug 3, 2020

kcondon merged commit de0e4c4 into develop Aug 3, 2020

kcondon deleted the 4813-allow-duplicate-files branch August 3, 2020 15:02

jggautier mentioned this pull request Jan 4, 2024

Investigate and possibly redesign duplicate file handling and messaging #10209

Open

Conversation

sekmiller commented May 20, 2020 • edited by TaniaSchlatter Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sekmiller commented Jun 2, 2020

Uh oh!

scolapasta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kcondon commented Jul 31, 2020

Uh oh!

sekmiller commented Aug 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sekmiller commented May 20, 2020 •

edited by TaniaSchlatter

Loading

coveralls commented May 20, 2020 •

edited

Loading