Skip to content

Releases: timeoutdigital/to-data-library

v2.0.0

05 Nov 12:17
b90adbc

Choose a tag to compare

What's Changed

  • DSS-3346 Rewrite s3_to_gs and gs_to_bq to use new GCS file structure in #33

This release removes the s3_to_bq method and provides breaking changes to the gs_to_bq and s3_to_gs methods. These methods now enforce the use of our GCS file structure (outlined here), requiring the user to provide the relevant information to organise their files when calling the methods.

Full Changelog: v1.0.20...v2.0.0

DSS-3232 Update s3_to_bq method to handle multiple files

01 Oct 14:41
5847cfe

Choose a tag to compare

DSS-3232: Add multifile functionality to s3_to_bq  (#29)

Co-authored-by: JenHolmes608 <jennifer.holmes@timeout.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Stop gs_to_bq transfer method from failing silently

10 Sep 14:22
6477527

Choose a tag to compare

This allows the gs_to_bq transfer method to return the result of the load_table_from_uris method (regardless of whether the actual BQ load job has succeeded or not). So when used in a data app, we can now raise an error by adding something like:

load_job = transfer_client.gs_to_bq(...)

if not load_job[0]:
            raise Exception(load_job[1])

This means we are able to raise non-silent errors.

DSS-3231 Include schema_update_options in gs_to_bq config

09 Sep 15:03
3a42448

Choose a tag to compare

This allows us to use the option to automatically add columns if new columns appear in the data. For example, if we update a schema file to include a new field, we can now set the column to add automatically (as NULLABLE) without having to manually add columns in dev, staging and prod separately.

See the PR for more info on tests and checks.

DSS-3165 Update convert_json_array_to_ndjson

28 Jul 10:16
4a56113

Choose a tag to compare

DSS-3163 Updated convert_json_array_to_ndjson method (#28)

I have updated convert_json_array_to_ndjson method to handle different
json formats, ie ndjson.

Have added tests too.

Here are the logs from a successful run using this to-data-library in
the cms ingestion pipeline:
<img width="1176" height="719" alt="image"
src="https://github.com/user-attachments/assets/d7890ec7-d54f-4f26-8e95-19ddb7b596e7"
/>

DSS-3163

21 Jul 14:52
ad9147a

Choose a tag to compare

Updated methods to handle gcs bucket without gs:// prefix

DSS-3165 Update s3_to_bq method

18 Jul 08:21
1d04ffe

Choose a tag to compare

Have added a step so files are moved from s3 to gcs, then to bq.

DSS-3165 Update s3_to_bq method

17 Jul 14:09
b0be878

Choose a tag to compare

Updated the s3_to_bq method in transfer.py to allow for partitioned tables and added some schema validation

DSS-2515 Add additional functionailty

26 Sep 12:47
c85afd4

Choose a tag to compare

Updated load_table_from_uris
Added additional functionality to gs

DSS-2512 Add parquet file ingestion

28 Aug 15:07
409b5f7

Choose a tag to compare