Skip to content

Write ADR for processing / merging #96

@poikilotherm

Description

@poikilotherm

We need to document our processing and merging strategy. Our way is to implement a chained approach, where we succeedingly enhance the final data model with more data from sources.

We define an order of mergers collating the data from each of the sources sequentially. The harvesters can run in parallel.
This enables us to benefit from prior knowledge of former processed sources, enabling a ranking by starting with high value sources (CFF, CodeMeta, ...) to lower quality and coarse ones like Git, APIs, etc.

grafik

The idea is to use a general merger that has selectable generalized merging strategies. This general merger is fed with data from sources that might have been mangled by a preprocessor (which already builds on former knowledge from sources merged before).

Other approach on the right: push all data into one giant model and cleanup collisions, duplicates etc.

grafik

This could also be combined: the general merger might implement such a cleanup and run last in order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    conference-discussionIssues that could be discussed at next conference

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions