10116 incomplete matadata label setting by ErykKul · Pull Request #10172 · IQSS/dataverse

ErykKul · 2023-12-07T15:06:15Z

What this PR does / why we need it:
Fixed the bug where incomplete metadata label was shown on a published dataset and visible for everybody.
The label is now only shown for draft dataset or when the new dataverse.api.show-label-for-incomplete-when-published feature is enabled, but only for the published datasets that the users can edit (e.g., when you are logged in, and you are a contributor for a given published dataset with incomplete metadata).

Which issue(s) this PR closes:

Closes #10116

Is there a release notes update needed for this change?:
Yes

coveralls · 2023-12-07T15:13:51Z

coverage: 20.59% (-0.01%) from 20.603%
when pulling c40838c on ErykKul:10116_incomplete_matadata_label_setting
into a329f29 on IQSS:develop.

src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java

…hub.com/ErykKul/dataverse into 10116_incomplete_matadata_label_setting

…dity-label-when-published

qqmyers

Looks good. I think the build is failing because something isn't up to date w.r.t. the war file name after the version change. Once that's fixed, the build should be rerun.

stevenferey · 2024-03-25T16:02:24Z

Hello and thank you for this fix.
We also observed this behavior in v5.14 because the required metadata has changed over time.

Looking at the developments of this ticket, I see that it is not possible to have an overview of the published datasets concerned for an administrator (this is however specified in the doc), perhaps this is normal?

Indeed, "my Data" will not display the dataset if the administrator is not the depositor.
In addition, it is not possible to do a global search (for example with datasetValid:false) because the published datasets are automatically indexed with datasetValid = true

dataverse/src/main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java

Lines 787 to 798 in 7a3ee97

    
           boolean valid; 
        
           if (!indexableDataset.getDatasetVersion().isDraft()) { 
        
               valid = true; 
        
           } else { 
        
               DatasetVersion version = indexableDataset.getDatasetVersion().cloneDatasetVersion(); 
        
               version.setDatasetFields(version.initDatasetFields()); 
        
               valid = version.isValid(); 
        
           } 
        
           if (JvmSettings.API_ALLOW_INCOMPLETE_METADATA.lookupOptional(Boolean.class).orElse(false)) { 
        
               solrInputDocument.addField(SearchFields.DATASET_VALID, valid); 
        
           }

Perhaps we need to modify the condition on the draft versions to obtain a consistent search?

Thanks a lot
Steven.

ErykKul · 2024-03-26T09:11:14Z

@stevenferey Good catch on the indexing part! Thanks! I think I over-fixed it while making sure that published datasets never show as incomplete to regular users. I will fix it and retest it to see if it works as intended.

As for the filters part and datasets you see as administrator, it might be because of the roles that are assigned to the administrator account? You could have accounts with different roles, even one specific to detecting the incomplete datasets. I think it might be the Curator role that shows all datasets in my data tab and lets you edit them? I am not sure. In our installation these are the roles assigned to the admin account (and I can see all datasets end edit them in my data tab while logged in as admin):

The filter does work too, but indeed I see only draft datasets with incomplete metadata. I overlooked that it does not show any published datasets with incomplete metadata as we do not have any. I will create some on my test installation and fix the problem.

stevenferey · 2024-03-26T13:08:15Z

Thank you for the feedback and future adjustments,

Indeed, an administrator with "contributor" rights on a dataverse displays the draft datasets with incomplete metadata in the "my data" page.
But no published dataset with incomplete metadata because it is impossible to identify at the moment.

qqmyers · 2024-04-04T17:42:57Z

@ErykKul - I assigned you since it looks like you were going to make additional changes. If that's not true or that's a separate PR, let me know so we can move this to Ready For QA. Otherwise, just assign me when you've made the changes and I'll re-review.

ErykKul · 2024-04-06T11:17:20Z

Yes, I need to make some changes first. Easy to do, but then I need to test it thoughtfully. I rill do it after my vacation after 15th. This PR is not urgent, once ready, I will let you know.

stevenferey · 2024-04-10T15:12:52Z

In addition, we identified bad behavior with the "my Data" page and the active "metadata validity" filters:

In this case, the user's dataverse is not displayed.
The display of dataverses should not be impacted by the activation or not of "metadata validity" filters?
Thanks

…ady published

…nd incmoplete) are checked

…ata, you are allowed to see incomplete metadata label on published datasets when the flag is enabled

ErykKul · 2024-05-06T13:10:13Z

@qqmyers
I think it is working as it should now. I had to remove the permission wrapper from the mydata bean, but it seems OK, since it is your data, and the incomplete metadata labels on published datasets are turned on, then you can see them. Also, validation of published dataset after metadata was changed was tricky, but I think it works now as it should. I think that this PR is ready for QA.

qqmyers · 2024-05-06T18:39:16Z

src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java

                    for (DatasetField dsf : retList) {
                        if (dsfType.equals(dsf.getDatasetFieldType())) {
                            add = false;
+                            if (removeEmptyValues) {


I'm not sure I understand this part. Can you explain more about why this change is needed? With removeEmptyValues = true, which is called only from the DatasetPage.isValid method (and not the FilePage.isValid method in line 325?), this removes empty values for any field that is in the version and is in the metadata blocks. Could it just be for any field in the version, e.g. called around line 1597? If so, since this whole method just adds empty values, do you just not need line 2301 in DatasetPage (and 325 in FilePage)?

I find this part strange and confusing too. Maybe I am missing something. The reasons why I did it this way are the following:

when I simply do a isValid() check on a version with incomplete metadata that does not have some mandatory fields, it says that it is valid. Only when I first call "initDatasetFields", it works and detects when mandatory fields are missing (it keeps the fields that are already there, but adds empty values for the fields that are not there yet, and now the validator can detect that these are empty). It still says valid for valid metadata.

however, I managed to break that logic in my tests by manipulating metadata blocks and making fields mandatory, etc. As it turns out, it worked for new versions made with the new citation block, but old version, already published, still came out as 'valid' from the check (even after making a new draft version from an old published version with now incomplete metadata, it still would be marked as 'valid'). This was inconsistent, since you can make some changes to the citation block that do get detected in an already published datasets... After some digging, it turned out that some fields were not empty and had the special "N/A" value in them. I am not sure if empty strings were fine or not for the isValid check, I did not dig deep enough for that, I think that only "N/A" was causing the problems.

I have tried using the "is empty" checks implemented in the dataset field class, none of them worked for this special case, datasets were still "valid", even when incomplete (both tests did not detect empty values well enough to fix the problem).

inspired by what the 'is empty' checks are doing, I implemented this new method that removes primitive values if they are either empty strings or contain the special "N/A" value. With this new method I had consistent results: published datasets with incomplete metadata showed as invalid, and with complete metadata as valid. The same for all draft versions. I only tested with the dataset page, so I missed the file page thing.

since what I did looks strange and drastic to me, I did not make it a "new standard" and decided to use it only for the test on completeness of metadata. I am still not sure how exactly the UI validation logic works on metadata, and how it is different from "isValid" check done in the dataset version, it seems that they use the same bean validators, but results are very different. It looks to me that it might be worth it to invest more time in this and do some refactoring there.

qqmyers · 2024-05-06T18:58:25Z

src/main/webapp/file.xhtml

                                        <h:outputText value="#{bundle['dataset.versionUI.deaccessioned']}" styleClass="label label-danger" rendered="#{FilePage.fileMetadata.datasetVersion.deaccessioned}"/>
                                        <h:outputText value="#{FilePage.fileMetadata.datasetVersion.externalStatusLabel}" styleClass="label label-info" rendered="#{FilePage.fileMetadata.datasetVersion.externalStatusLabel!=null  and FilePage.canPublishDataset()}"/>
-                                        <h:outputText value="#{bundle['incomplete']}" styleClass="label label-danger" rendered="#{FilePage.fileMetadata.datasetVersion.draft and !FilePage.fileMetadata.datasetVersion.valid}"/>
+                                        <h:outputText value="#{bundle['incomplete']}" styleClass="label label-danger" rendered="#{FilePage.fileMetadata.datasetVersion.draft and !FilePage.valid}"/>


This just seems to bypass lines 324-325 in FilePage.isValid() (and skips the new caching there). Is this why you didn't need to call initDatasetFields(true) there?

Also - FilePage.isValid is called in the FilePage.displayPublishMessage() call - should that have the same logic as the call here?

I focused on the dataset page and missed that one. I have changed the logic to 'show incomplete label when not FilePage.valid', and added the 'true' in the call to initDatasetFields.

src/main/java/edu/harvard/iq/dataverse/mydata/DataRetrieverAPI.java

ErykKul · 2024-05-07T12:38:17Z

@qqmyers
Thanks for reviewing! I did some fixes and gave the explanation on the strange part. Can you re-review?

ErykKul · 2024-05-07T15:07:01Z

It turns out that I was the only one using the "isValid" method in DatasetVersion, so I changed it to keep the logic centralized. I added comments to make it more clear what is happening there.

ErykKul · 2024-05-07T15:38:35Z

I retested it: collection now do show up in my data, incomplete and complete (draft and published) datasets have correct labels when "dataverse.ui.show-validity-label-when-published" is enabled, and published datasets do not have incomplete labels when they are disabled for published datasets.

ErykKul · 2024-05-07T15:40:19Z

You need to reindex for the changes to take effect.

qqmyers

Looks good. For QA, it sounds like two ways to set up - allow incomplete datasets and upload one via the api, or publish a dataset and then make a field required in a metadata block.

sekmiller · 2024-05-14T13:46:23Z

@ErykKul This looks good. The only concern I have is that the show-validity-label defaults to false. I think if I have edit rights, I would want to know if I have a published dataset that needs attention - especially since it should be a rare occurrence. I would argue for a default of 'true' then if it gets to be too much you could shut it off with false.

ErykKul · 2024-05-14T14:24:56Z

@sekmiller Sounds good, I changed the default to true.

ErykKul and others added 5 commits December 1, 2023 17:24

incmoplete metadata label visibility setting

0ead2c2

merged develop

bda39cc

added documentation

0cd23fb

typo fix

02f2edc

Merge branch 'IQSS:develop' into 10116_incomplete_matadata_label_setting

99a4b25

qqmyers reviewed Dec 7, 2023

View reviewed changes

src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java Outdated Show resolved Hide resolved

ErykKul added 2 commits December 19, 2023 16:35

Merge branch '10116_incomplete_matadata_label_setting' of https://git…

7248fdd

…hub.com/ErykKul/dataverse into 10116_incomplete_matadata_label_setting

option renamed: show-label-for-incomplete-when-published -> show-vali…

f6e5db2

…dity-label-when-published

qqmyers approved these changes Dec 19, 2023

View reviewed changes

qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Dec 19, 2023

Merge branch 'IQSS:develop' into 10116_incomplete_matadata_label_setting

7a3ee97

qqmyers assigned ErykKul Apr 4, 2024

qqmyers added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Apr 4, 2024

ErykKul and others added 8 commits April 17, 2024 12:34

Merge branch 'IQSS:develop' into 10116_incomplete_matadata_label_setting

3ff4183

merged develop

d97697b

dataset is always checked for validity while indexing, even when alre…

afcbdfe

…ady published

fix for collections not showing up when both validity facets (valid a…

eef2416

…nd incmoplete) are checked

removed unused method

75e87e4

reverted removing method that is used by the frontend

d4c7196

fixed incomplete metadata being indexed as complete in some cases

2ee5bff

fix for permission wrapper not available in mydata -> if it is your d…

232029e

…ata, you are allowed to see incomplete metadata label on published datasets when the flag is enabled

qqmyers reviewed May 6, 2024

View reviewed changes

src/main/java/edu/harvard/iq/dataverse/mydata/DataRetrieverAPI.java Outdated Show resolved Hide resolved

ErykKul added 2 commits May 7, 2024 13:07

unused variable cleanup

ff4742a

cleaned up file page logic for incomplete metadata

8525c9a

refactored isValid in DatasetVersion

724f238

qqmyers approved these changes May 7, 2024

View reviewed changes

qqmyers unassigned ErykKul May 7, 2024

sekmiller self-assigned this May 9, 2024

sekmiller removed the Status: Needs Input Applied to issues in need of input from someone currently unavailable label May 9, 2024

change the default to false for UI_SHOW_VALIDITY_LABEL_WHEN_PUBLISHED

c40838c

sekmiller merged commit da3dd95 into IQSS:develop May 14, 2024

pdurbin added this to the 6.3 milestone May 14, 2024

pdurbin mentioned this pull request Jul 8, 2024

"Incomplete metadata" label on published datasets #10116

Closed

Comments

Conversation

ErykKul commented Dec 7, 2023

Uh oh!

coveralls commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

qqmyers left a comment

Choose a reason for hiding this comment

Uh oh!

stevenferey commented Mar 25, 2024

Uh oh!

ErykKul commented Mar 26, 2024

Uh oh!

stevenferey commented Mar 26, 2024

Uh oh!

qqmyers commented Apr 4, 2024

Uh oh!

ErykKul commented Apr 6, 2024

Uh oh!

stevenferey commented Apr 10, 2024

Uh oh!

ErykKul commented May 6, 2024

Uh oh!

qqmyers May 6, 2024

Choose a reason for hiding this comment

Uh oh!

ErykKul May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qqmyers May 6, 2024

Choose a reason for hiding this comment

Uh oh!

ErykKul May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ErykKul commented May 7, 2024

Uh oh!

ErykKul commented May 7, 2024

Uh oh!

ErykKul commented May 7, 2024

Uh oh!

ErykKul commented May 7, 2024

Uh oh!

qqmyers left a comment

Choose a reason for hiding this comment

Uh oh!

sekmiller commented May 14, 2024

Uh oh!

ErykKul commented May 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

coveralls commented Dec 7, 2023 •

edited

Loading

ErykKul May 7, 2024 •

edited

Loading

ErykKul May 7, 2024 •

edited

Loading