Add improver pipeline to flag ghost packages #644 #917 #1395#1533
Add improver pipeline to flag ghost packages #644 #917 #1395#1533keshav-space merged 16 commits intomainfrom
Conversation
pombredanne
left a comment
There was a problem hiding this comment.
Here are a few nits for your review.
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
- Package can have malicious, ghost, yanked, valid and unknown status Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
e093a58 to
b848747
Compare
pombredanne
left a comment
There was a problem hiding this comment.
Thanks... here a few nits for your review... I wonder if there is a real need to convert versions with univers for this use case.
Also:
- Thanks for adding logging!
- Please add a CHANGELOG entry for logging and pipeline introduction
| version_class = RANGE_CLASS_BY_SCHEMES[purl.type].version_class | ||
|
|
||
| try: | ||
| return {version_class(v.value) for v in versions(str(purl))} |
There was a problem hiding this comment.
To check for a ghost, why do we need to convert things to a version class?
Could we just check the if the version string exists?
Otherwise we are eventually ghosting all the packages with a version we cannot parse
There was a problem hiding this comment.
To check for a ghost, why do we need to convert things to a version class?
Could we just check the if the version string exists?
Simple string comparison may lead to incorrectly flagging legitimate versions as ghost packages:
"1.2.0" != "1.02.0"
"v1.2.3" != "1.2.3"
"1.0" != "1.0.0"Otherwise we are eventually ghosting all the packages with a version we cannot parse
If we cannot parse the version for a given base_purl, we skip those and don’t mark them as ghosts.
Signed-off-by: Keshav Priyadarshi <git@keshav.space> Co-authored-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
|
Would this also address situations where the upstream index is wrong, such as #915 ? |
@armijnhemel at present, this will flag the ghost packages from the following ecosystems: |
I am not sure you understood my question: what if upstream is wrong (as is the case in Alpine see #915 )? Are you always assuming that the index is correct? |
@armijnhemel Yes, this pipeline assumes that package index is right. We should have seperate pipeline for case where index is wrong. We will need some generalized way to conclude that package index is reporting wrong version. |
So far I have only seen it once (in Alpine, see #915) so perhaps it could just be added as an exception without building a separate pipeline but I guess it might happen more often. |
|
@armijnhemel The general approach to curation is going to be:
|
This PR introduces the aboutcode pipeline for defining and running improvements on data (see #1395). It also adds a new pipeline to flag ghost packages. Ghost packages are those packages that are not present in the upstream package manager index.