Releases: timeoutdigital/to-data-library
v2.0.0
What's Changed
- DSS-3346 Rewrite s3_to_gs and gs_to_bq to use new GCS file structure in #33
This release removes the s3_to_bq method and provides breaking changes to the gs_to_bq and s3_to_gs methods. These methods now enforce the use of our GCS file structure (outlined here), requiring the user to provide the relevant information to organise their files when calling the methods.
Full Changelog: v1.0.20...v2.0.0
DSS-3232 Update s3_to_bq method to handle multiple files
DSS-3232: Add multifile functionality to s3_to_bq (#29) Co-authored-by: JenHolmes608 <jennifer.holmes@timeout.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Stop gs_to_bq transfer method from failing silently
This allows the gs_to_bq transfer method to return the result of the load_table_from_uris method (regardless of whether the actual BQ load job has succeeded or not). So when used in a data app, we can now raise an error by adding something like:
load_job = transfer_client.gs_to_bq(...)
if not load_job[0]:
raise Exception(load_job[1])
This means we are able to raise non-silent errors.
DSS-3231 Include schema_update_options in gs_to_bq config
This allows us to use the option to automatically add columns if new columns appear in the data. For example, if we update a schema file to include a new field, we can now set the column to add automatically (as NULLABLE) without having to manually add columns in dev, staging and prod separately.
See the PR for more info on tests and checks.
DSS-3165 Update convert_json_array_to_ndjson
DSS-3163 Updated convert_json_array_to_ndjson method (#28) I have updated convert_json_array_to_ndjson method to handle different json formats, ie ndjson. Have added tests too. Here are the logs from a successful run using this to-data-library in the cms ingestion pipeline: <img width="1176" height="719" alt="image" src="https://github.com/user-attachments/assets/d7890ec7-d54f-4f26-8e95-19ddb7b596e7" />
DSS-3163
Updated methods to handle gcs bucket without gs:// prefix
DSS-3165 Update s3_to_bq method
Have added a step so files are moved from s3 to gcs, then to bq.
DSS-3165 Update s3_to_bq method
Updated the s3_to_bq method in transfer.py to allow for partitioned tables and added some schema validation
DSS-2515 Add additional functionailty
Updated load_table_from_uris
Added additional functionality to gs
DSS-2512 Add parquet file ingestion
added impersonation