Skip to content

Comments

Add ability to store alternative values in the meta-metadata#154

Merged
led02 merged 7 commits intodevelopfrom
feature/67-process-version
Apr 26, 2023
Merged

Add ability to store alternative values in the meta-metadata#154
led02 merged 7 commits intodevelopfrom
feature/67-process-version

Conversation

@led02
Copy link
Member

@led02 led02 commented Mar 31, 2023

How to use?

  • Run the harvest and process steps
  • Copy the .hermes/process/codemeta.json to your project dir (and slightly modify it ... to trigger new behaviour)
  • Enable codemeta as harvester (in hermes.toml)
  • Run harvest and process step again
  • Have a look at the additional metadata alternatives in hermes/process/tags.json

@led02 led02 linked an issue Mar 31, 2023 that may be closed by this pull request
@jkelling
Copy link
Contributor

jkelling commented Apr 3, 2023

Thanks you for PR.

There is one thing that looks like a bug to me:
When I change the version in codemeta.json, so it disagrees with CITATION.cff, then the codemeta value takes precedence towards .hermes/process/codemeta.json, but there is no alternative in the version tag.

case list() as item: item[:] = value
case _: target[key] = value
case _:
if key in target and target[key] != value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not key in target, the access in line 134 should already throw, so, this check is superfluous.

case list() as item: item[:] = value
case _: target[key] = value
case _:
if key in target:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not key in target, the access in line 100 should already throw, so, this check is superfluous.

@jkelling
Copy link
Contributor

jkelling commented Apr 3, 2023

https://github.com/hermes-hmc/workflow/blob/05e4e530af77bd38810dd802c5c4f1b8249a8ca5/src/hermes/model/merge.py#L166-L170

@led02 : I think the problem with the missing version alternative is, that only the ObjectMergeStrategy set these alternative and it appears to exclude version in its white list. The alternatives should be automatically tracked for all merged data, so the code creating them should apply to all

I think I do not understand the intention behind the MergeStrategy architecture. I there should not be whitelists for this generic code. All alternatives should be tracked equally.

@led02
Copy link
Member Author

led02 commented Apr 3, 2023

Thanks for the reviews and comments. In fact, the MergeStrategy stuff is cause and solution to all of this refactoring...

Indeed, the idea is to have common handling for different data types availble while still being flexible for configuration. Hence, there is always a fall-back strategy availble (that is bascially ObjectMergeStrategy()). The "whitelisting" is simply adding more specific merge strategies to the pool (e.g. one that understands when two persons are the same).

As I can recapitulate the general idea behind this code, I also have my problems (re-)understanding the implementation. What I understand that it needs serious refactoring. ;)

My discussions with different team members brought made me think that it might be a good idea to split up the MergeStrategy into one part that compares entities and one that merges entitites. On top, there needs to be a better mapping that selects the appropriate implementation based on the entity path, the entity LD type, and some configuration values...

@led02
Copy link
Member Author

led02 commented Apr 3, 2023

PS: I'm pretty confident the problem with the version alternatives arises from the missing implementation in https://github.com/hermes-hmc/workflow/blob/7a18bd80c7be25bb44b12564817f180dedda9fa8/src/hermes/model/merge.py#L105

@jkelling
Copy link
Contributor

jkelling commented Apr 3, 2023

PS: I'm pretty confident the problem with the version alternatives arises from the missing implementation in

https://github.com/hermes-hmc/workflow/blob/7a18bd80c7be25bb44b12564817f180dedda9fa8/src/hermes/model/merge.py#L105

Hmm, interesting. I did not expect the CollectionMergeStrategy to be responsible for merging this single value.

led02 added 2 commits April 4, 2023 13:43
`merge.py` and `path.py` are seriously broken and really, really need refactoring.
@led02
Copy link
Member Author

led02 commented Apr 4, 2023

Hmm, interesting. I did not expect the CollectionMergeStrategy to be responsible for merging this single value.

Yeah, just another indicator that the code has it's shortcomings... I thought it would be ObjectMergeStrategy but in the end, it's ... drumroll ... ContextPath that handles the version... 🙄 Anyways, your misconception here was that not the single values are merged but the object that contains the respective fields are merged.

Well, for now the version alternatives are collected. I think we agree there needs to be done some proper refactoring here. Again, I'm sorry... it was about 4:20 AM when I wrote this ... code with shortcomings 😊

@sdruskat sdruskat marked this pull request as ready for review April 24, 2023 13:24
Copy link
Contributor

@sdruskat sdruskat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @led02!

A bit of DRYing up work left to do, see inline comments.

Other than that, works fine for some values (e.g., changing the affiliation for an author), witness:

"alternatives": [
      [
        "German Aerospace Center (DLR)",
        {
          "local_path": "CITATION.cff",
          "timestamp": "2023-04-24T15:49:17",
          "harvester": "cff"
        }
      ]
    ],

CodeMeta overrides CFF in this case, which has issues with intuition (agree with @jkelling in this case). I don't think this is wrong, but needs to be clarified to users that the last added value will/seems to override any previous values.

One thing I noticed is that this makes obvious what seems to be an issue with git harvesting, where the same harvester adds a null alternative:

  • Run hermes from within the project dir without configuring codemeta harvester, and you'll get:
"hermes:gitBranch": {
    "alternatives": [
      [
        null,
        {
          "timestamp": "2023-04-24T15:52:46",
          "harvester": "git"
        }
      ]
    ]
  }

Providing this review as a comment to clarify if that's an issue with the git harvester, or with the implementation of alternative values.

@led02 led02 requested a review from sdruskat April 25, 2023 11:27
@jkelling jkelling self-requested a review April 26, 2023 09:48
@led02 led02 merged commit 9a32a0e into develop Apr 26, 2023
@jkelling jkelling added this to the Proof-of-concept milestone Apr 26, 2023
@led02 led02 deleted the feature/67-process-version branch May 3, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide processed version number

3 participants