Dumping bag to textfiles for cbs.v3.84750NED_TypedDataSet
Task 'tables_to_parquet[8]': Starting task run...
Finished dumping bag to textfiles for cbs.v3.84750NED_TypedDataSet
Starting to concatanate files for cbs.v3.84750NED_TypedDataSet
Concluded concatanating files for cbs.v3.84750NED_TypedDataSet
Unexpected error: FileNotFoundError(2, "Failed to open local file '/tmp/cbs/v3/84750NED/20210320/json/cbs.v3.84750NED_TypedDataSet/cbs.v3.84750NED_TypedDataSet.json'. Detail: [errno 2] No such file or directory")
Traceback (most recent call last):
File "/home/amitgalmail/nl-open-data/.venv/lib/python3.8/site-packages/prefect/engine/runner.py", line 48, in inner
new_state = method(self, state, *args, **kwargs)
File "/home/amitgalmail/nl-open-data/.venv/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 865, in get_task_run_state
value = prefect.utilities.executors.run_task_with_timeout(
File "/home/amitgalmail/nl-open-data/.venv/lib/python3.8/site-packages/prefect/utilities/executors.py", line 299, in run_task_with_timeout
return task.run(*args, **kwargs) # type: ignore
File "/home/amitgalmail/nl-open-data/.venv/lib/python3.8/site-packages/statline_bq/utils.py", line 1390, in tables_to_parquet
pq_path = convert_table_to_parquet(
File "/home/amitgalmail/nl-open-data/.venv/lib/python3.8/site-packages/statline_bq/utils.py", line 629, in convert_table_to_parquet
pa_table = pa_json.read_json(json_path)
File "pyarrow/_json.pyx", line 238, in pyarrow._json.read_json
File "pyarrow/_json.pyx", line 193, in pyarrow._json._get_reader
File "pyarrow/io.pxi", line 1493, in pyarrow.lib.get_input_stream
File "pyarrow/io.pxi", line 1464, in pyarrow.lib.get_native_file
File "pyarrow/io.pxi", line 827, in pyarrow.lib.OSFile.__cinit__
File "pyarrow/io.pxi", line 837, in pyarrow.lib.OSFile._open_readable
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Failed to open local file '/tmp/cbs/v3/84750NED/20210320/json/cbs.v3.84750NED_TypedDataSet/cbs.v3.84750NED_TypedDataSet.json'. Detail: [errno 2] No such file or directory
In some cases,
tables_to_parquet()fails on a large dataset, due to a restart of the task. The pathology is:and the next one, (indicating this task has been restarted):
Dumping bag...)Full log from prefect attached.