Skip to content
This repository was archived by the owner on Jul 15, 2024. It is now read-only.
This repository was archived by the owner on Jul 15, 2024. It is now read-only.

Consider a better implementation for source #66

@galamit86

Description

@galamit86

Currently, each data has a strict requirement to have a source specified. It is then used mainly (possibly only) for naming of files and folders - i.e. *cbs*.v3.83583NED_ColDescriptions.json in several places along the process:

  • Locally
  • In GCS
  • IN BQ (as a prefix for datasets an names)

While informative and often relevant, there are times when it becomes an obstacle. For example, uploading CBS catalogs, currently results in a BQ dataset titled None_catalogs.

Another problem occurs if we want to load datasets from multiple sources at once - such as when loading the entirety of the IV3 catalog from statline.

I can imagine changing the flow parameters to take a tuple (or dict) of (source, id).

These might be two different issues - and neither of them is strictly necessary. The concept of source might be reconsidered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    investigateThis is an issue that requires investigationquestionFurther information is requestedrefactoring

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions