Skip to content

Comments

10116 incomplete matadata label setting#10172

Merged
sekmiller merged 20 commits intoIQSS:developfrom
ErykKul:10116_incomplete_matadata_label_setting
May 14, 2024
Merged

10116 incomplete matadata label setting#10172
sekmiller merged 20 commits intoIQSS:developfrom
ErykKul:10116_incomplete_matadata_label_setting

Conversation

@ErykKul
Copy link
Collaborator

@ErykKul ErykKul commented Dec 7, 2023

What this PR does / why we need it:
Fixed the bug where incomplete metadata label was shown on a published dataset and visible for everybody.
The label is now only shown for draft dataset or when the new dataverse.api.show-label-for-incomplete-when-published feature is enabled, but only for the published datasets that the users can edit (e.g., when you are logged in, and you are a contributor for a given published dataset with incomplete metadata).

Which issue(s) this PR closes:

Closes #10116

Is there a release notes update needed for this change?:
Yes

@coveralls
Copy link

coveralls commented Dec 7, 2023

Coverage Status

coverage: 20.59% (-0.01%) from 20.603%
when pulling c40838c on ErykKul:10116_incomplete_matadata_label_setting
into a329f29 on IQSS:develop.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the build is failing because something isn't up to date w.r.t. the war file name after the version change. Once that's fixed, the build should be rerun.

@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Dec 19, 2023
@stevenferey
Copy link
Contributor

Hello and thank you for this fix.
We also observed this behavior in v5.14 because the required metadata has changed over time.

Looking at the developments of this ticket, I see that it is not possible to have an overview of the published datasets concerned for an administrator (this is however specified in the doc), perhaps this is normal?

Indeed, "my Data" will not display the dataset if the administrator is not the depositor.
In addition, it is not possible to do a global search (for example with datasetValid:false) because the published datasets are automatically indexed with datasetValid = true

boolean valid;
if (!indexableDataset.getDatasetVersion().isDraft()) {
valid = true;
} else {
DatasetVersion version = indexableDataset.getDatasetVersion().cloneDatasetVersion();
version.setDatasetFields(version.initDatasetFields());
valid = version.isValid();
}
if (JvmSettings.API_ALLOW_INCOMPLETE_METADATA.lookupOptional(Boolean.class).orElse(false)) {
solrInputDocument.addField(SearchFields.DATASET_VALID, valid);
}

Perhaps we need to modify the condition on the draft versions to obtain a consistent search?

Thanks a lot
Steven.

@ErykKul
Copy link
Collaborator Author

ErykKul commented Mar 26, 2024

@stevenferey Good catch on the indexing part! Thanks! I think I over-fixed it while making sure that published datasets never show as incomplete to regular users. I will fix it and retest it to see if it works as intended.

As for the filters part and datasets you see as administrator, it might be because of the roles that are assigned to the administrator account? You could have accounts with different roles, even one specific to detecting the incomplete datasets. I think it might be the Curator role that shows all datasets in my data tab and lets you edit them? I am not sure. In our installation these are the roles assigned to the admin account (and I can see all datasets end edit them in my data tab while logged in as admin):

image

The filter does work too, but indeed I see only draft datasets with incomplete metadata. I overlooked that it does not show any published datasets with incomplete metadata as we do not have any. I will create some on my test installation and fix the problem.

@stevenferey
Copy link
Contributor

Thank you for the feedback and future adjustments,

Indeed, an administrator with "contributor" rights on a dataverse displays the draft datasets with incomplete metadata in the "my data" page.
But no published dataset with incomplete metadata because it is impossible to identify at the moment.

@qqmyers qqmyers added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Apr 4, 2024
@qqmyers
Copy link
Member

qqmyers commented Apr 4, 2024

@ErykKul - I assigned you since it looks like you were going to make additional changes. If that's not true or that's a separate PR, let me know so we can move this to Ready For QA. Otherwise, just assign me when you've made the changes and I'll re-review.

@ErykKul
Copy link
Collaborator Author

ErykKul commented Apr 6, 2024

Yes, I need to make some changes first. Easy to do, but then I need to test it thoughtfully. I rill do it after my vacation after 15th. This PR is not urgent, once ready, I will let you know.

@stevenferey
Copy link
Contributor

In addition, we identified bad behavior with the "my Data" page and the active "metadata validity" filters:

my_data

In this case, the user's dataverse is not displayed.
The display of dataverses should not be impacted by the activation or not of "metadata validity" filters?
Thanks

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 6, 2024

@qqmyers
I think it is working as it should now. I had to remove the permission wrapper from the mydata bean, but it seems OK, since it is your data, and the incomplete metadata labels on published datasets are turned on, then you can see them. Also, validation of published dataset after metadata was changed was tricky, but I think it works now as it should. I think that this PR is ready for QA.

for (DatasetField dsf : retList) {
if (dsfType.equals(dsf.getDatasetFieldType())) {
add = false;
if (removeEmptyValues) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this part. Can you explain more about why this change is needed? With removeEmptyValues = true, which is called only from the DatasetPage.isValid method (and not the FilePage.isValid method in line 325?), this removes empty values for any field that is in the version and is in the metadata blocks. Could it just be for any field in the version, e.g. called around line 1597? If so, since this whole method just adds empty values, do you just not need line 2301 in DatasetPage (and 325 in FilePage)?

Copy link
Collaborator Author

@ErykKul ErykKul May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this part strange and confusing too. Maybe I am missing something. The reasons why I did it this way are the following:

  • when I simply do a isValid() check on a version with incomplete metadata that does not have some mandatory fields, it says that it is valid. Only when I first call "initDatasetFields", it works and detects when mandatory fields are missing (it keeps the fields that are already there, but adds empty values for the fields that are not there yet, and now the validator can detect that these are empty). It still says valid for valid metadata.
  • however, I managed to break that logic in my tests by manipulating metadata blocks and making fields mandatory, etc. As it turns out, it worked for new versions made with the new citation block, but old version, already published, still came out as 'valid' from the check (even after making a new draft version from an old published version with now incomplete metadata, it still would be marked as 'valid'). This was inconsistent, since you can make some changes to the citation block that do get detected in an already published datasets... After some digging, it turned out that some fields were not empty and had the special "N/A" value in them. I am not sure if empty strings were fine or not for the isValid check, I did not dig deep enough for that, I think that only "N/A" was causing the problems.
  • I have tried using the "is empty" checks implemented in the dataset field class, none of them worked for this special case, datasets were still "valid", even when incomplete (both tests did not detect empty values well enough to fix the problem).
  • inspired by what the 'is empty' checks are doing, I implemented this new method that removes primitive values if they are either empty strings or contain the special "N/A" value. With this new method I had consistent results: published datasets with incomplete metadata showed as invalid, and with complete metadata as valid. The same for all draft versions. I only tested with the dataset page, so I missed the file page thing.
  • since what I did looks strange and drastic to me, I did not make it a "new standard" and decided to use it only for the test on completeness of metadata. I am still not sure how exactly the UI validation logic works on metadata, and how it is different from "isValid" check done in the dataset version, it seems that they use the same bean validators, but results are very different. It looks to me that it might be worth it to invest more time in this and do some refactoring there.

<h:outputText value="#{bundle['dataset.versionUI.deaccessioned']}" styleClass="label label-danger" rendered="#{FilePage.fileMetadata.datasetVersion.deaccessioned}"/>
<h:outputText value="#{FilePage.fileMetadata.datasetVersion.externalStatusLabel}" styleClass="label label-info" rendered="#{FilePage.fileMetadata.datasetVersion.externalStatusLabel!=null and FilePage.canPublishDataset()}"/>
<h:outputText value="#{bundle['incomplete']}" styleClass="label label-danger" rendered="#{FilePage.fileMetadata.datasetVersion.draft and !FilePage.fileMetadata.datasetVersion.valid}"/>
<h:outputText value="#{bundle['incomplete']}" styleClass="label label-danger" rendered="#{FilePage.fileMetadata.datasetVersion.draft and !FilePage.valid}"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just seems to bypass lines 324-325 in FilePage.isValid() (and skips the new caching there). Is this why you didn't need to call initDatasetFields(true) there?

Also - FilePage.isValid is called in the FilePage.displayPublishMessage() call - should that have the same logic as the call here?

Copy link
Collaborator Author

@ErykKul ErykKul May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused on the dataset page and missed that one. I have changed the logic to 'show incomplete label when not FilePage.valid', and added the 'true' in the call to initDatasetFields.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 7, 2024

@qqmyers
Thanks for reviewing! I did some fixes and gave the explanation on the strange part. Can you re-review?

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 7, 2024

It turns out that I was the only one using the "isValid" method in DatasetVersion, so I changed it to keep the logic centralized. I added comments to make it more clear what is happening there.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 7, 2024

I retested it: collection now do show up in my data, incomplete and complete (draft and published) datasets have correct labels when "dataverse.ui.show-validity-label-when-published" is enabled, and published datasets do not have incomplete labels when they are disabled for published datasets.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 7, 2024

You need to reindex for the changes to take effect.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. For QA, it sounds like two ways to set up - allow incomplete datasets and upload one via the api, or publish a dataset and then make a field required in a metadata block.

@sekmiller sekmiller self-assigned this May 9, 2024
@sekmiller sekmiller removed the Status: Needs Input Applied to issues in need of input from someone currently unavailable label May 9, 2024
@sekmiller
Copy link
Contributor

@ErykKul This looks good. The only concern I have is that the show-validity-label defaults to false. I think if I have edit rights, I would want to know if I have a published dataset that needs attention - especially since it should be a rare occurrence. I would argue for a default of 'true' then if it gets to be too much you could shut it off with false.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 14, 2024

@sekmiller Sounds good, I changed the default to true.

@sekmiller sekmiller merged commit da3dd95 into IQSS:develop May 14, 2024
@pdurbin pdurbin added this to the 6.3 milestone May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Size: 3 A percentage of a sprint. 2.1 hours.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Incomplete metadata" label on published datasets

6 participants