Skip to content

dataset extractors do not work on empty dataset #36

@tcnichol

Description

@tcnichol

Dataset extractors fail for an empty dataset. This is relevant since we are creating extractors that submit to HPC resources where the extractor triggers the upload of files from the HPC resource. Users would likely create empty datasets to fill up.

Here is the stack trace for what happens when empty dataset committed:
self._RealGetContents() File "/usr/local/lib/python3.9/zipfile.py", line 1324, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file 2021-08-24 19:37:07,512 [Thread-16 ] INFO : pyclowder.connectors - [61254a46e4b0d8ae89cf36d6] : StatusMessage.retry: (#10) File is not a zip file 2021-08-24 19:37:08,244 [Thread-17 ] DEBUG : pyclowder.connectors - ['tcnichol@illinois.edu'] 2021-08-24 19:37:08,371 [Thread-17 ] INFO : pyclowder.connectors - [61254a46e4b0d8ae89cf36d6] : StatusMessage.start: Started processing. 2021-08-24 19:37:08,372 [Thread-17 ] DEBUG : pyclowder.extractors - default check message : {'notifies': ['tcnichol@illinois.edu'], 'source': {'id': {'resourceType': "'dataset", 'id': '61254a46e4b0d8ae89cf36d6'}, 'extra': {}}, 'jobid': '61254a5ae4b0d8ae89cf36da', 'msgid': '61254a5ae4b0d8ae89cf36db', 'flags': '', 'intermediateId': '61254a46e4b0d8ae89cf36d6', 'host': 'https://pdg.clowderframework.org', 'datasetId': '61254a46e4b0d8ae89cf36d6', 'id': '61254a46e4b0d8ae89cf36d6', 'datasetname': 'EMPTY', 'fileSize': '0', 'target': '{}', 'secretKey': 'aea4c447-7a7e-4c7d-b717-bde0fb57eed0', 'activity': 'submitted', 'routing_key': 'extractors.ncsa.maple.bridges2.dataset', 'parameters': '{"directory":"/jet/home/ocean/MAPLE/data"}', 'action': 'manual-submission', 'retry_count': 10} 2021-08-24 19:37:08,457 [Thread-17 ] INFO : pyclowder.connectors - [61254a46e4b0d8ae89cf36d6] : StatusMessage.processing: Downloading dataset. 2021-08-24 19:37:08,509 [Thread-17 ] ERROR : pyclowder.connectors - [61254a46e4b0d8ae89cf36d6] File is not a zip file Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/pyclowder/connectors.py", line 443, in _process_message (file_paths, tmp_files, tmp_dirs) = self._prepare_dataset(host, secret_key, resource) File "/usr/local/lib/python3.9/site-packages/pyclowder/connectors.py", line 357, in _prepare_dataset file_paths = pyclowder.utils.extract_zip_contents(inputzip) File "/usr/local/lib/python3.9/site-packages/pyclowder/utils.py", line 125, in extract_zip_contents zipobj = zipfile.ZipFile(zipfilepath) File "/usr/local/lib/python3.9/zipfile.py", line 1257, in __init__ self._RealGetContents() File "/usr/local/lib/python3.9/zipfile.py", line 1324, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file 2021-08-24 19:37:08,509 [Thread-17 ] INFO : pyclowder.connectors - [61254a46e4b0d8ae89cf36d6] : StatusMessage.error: File is not a zip file

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions