Skip to content
This repository was archived by the owner on Jan 29, 2024. It is now read-only.
This repository was archived by the owner on Jan 29, 2024. It is now read-only.

Make add handle duplicate articles #576

@jankrepl

Description

@jankrepl

🚀 Feature

Currently, the bbs_database add is going to error out when an article is already in the database. When trying to add all articles in a folder It would be more convenient to compute new_articles = set(parsed_uids) - set(existing_uids) and only add the new articles.

Motivation

In the overall pipeline we would want to bulk add all articles from a given source and a given month to the database. However, one duplicate article would lead to errors.

Alternatives

  • Compute the above set difference outside of add and create a new folder of symlinks only holding new articles
  • Add articles to the database one by one and just ignore failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions