Currently, each data has a strict requirement to have a source specified. It is then used mainly (possibly only) for naming of files and folders - i.e. *cbs*.v3.83583NED_ColDescriptions.json in several places along the process:
- Locally
- In GCS
- IN BQ (as a prefix for datasets an names)
While informative and often relevant, there are times when it becomes an obstacle. For example, uploading CBS catalogs, currently results in a BQ dataset titled None_catalogs.
Another problem occurs if we want to load datasets from multiple sources at once - such as when loading the entirety of the IV3 catalog from statline.
I can imagine changing the flow parameters to take a tuple (or dict) of (source, id).
These might be two different issues - and neither of them is strictly necessary. The concept of source might be reconsidered.
Currently, each data has a strict requirement to have a
sourcespecified. It is then used mainly (possibly only) for naming of files and folders - i.e.*cbs*.v3.83583NED_ColDescriptions.jsonin several places along the process:While informative and often relevant, there are times when it becomes an obstacle. For example, uploading CBS catalogs, currently results in a BQ dataset titled
None_catalogs.Another problem occurs if we want to load datasets from multiple sources at once - such as when loading the entirety of the IV3 catalog from statline.
I can imagine changing the flow parameters to take a tuple (or dict) of (
source,id).These might be two different issues - and neither of them is strictly necessary. The concept of
sourcemight be reconsidered.