Skip to content

run_summarization likely race condition creates flaky test_tensorflow_examples #32667

@molbap

Description

@molbap

As in title. See for instance https://app.circleci.com/pipelines/github/huggingface/transformers/100640/workflows/f4e73659-1f14-45da-8740-7314e788fb72/jobs/1340353 that happened in #32651 . Can't reproduce locally.
Caused by

try:
nltk.data.find("tokenizers/punkt")
except (LookupError, OSError):
if is_offline_mode():
raise LookupError(
"Offline mode: run this script without TRANSFORMERS_OFFLINE first to download nltk data files"
)
with FileLock(".lock") as lock:
nltk.download("punkt", quiet=True)
:

Will cause a failure at test collection

_______ ERROR collecting examples/tensorflow/test_tensorflow_examples.py _______
examples/tensorflow/test_tensorflow_examples.py:67: in <module>
    import run_summarization
examples/tensorflow/summarization/run_summarization.py:63: in <module>
    nltk.data.find("tokenizers/punkt")
/usr/local/lib/python3.10/site-packages/nltk/data.py:554: in find
    return find(modified_name, paths)
/usr/local/lib/python3.10/site-packages/nltk/data.py:541: in find
    return ZipFilePathPointer(p, zipentry)
/usr/local/lib/python3.10/site-packages/nltk/compat.py:36: in _decorator
    return init_func(*args, **kwargs)
/usr/local/lib/python3.10/site-packages/nltk/data.py:394: in __init__
    zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
/usr/local/lib/python3.10/site-packages/nltk/compat.py:36: in _decorator
    return init_func(*args, **kwargs)
/usr/local/lib/python3.10/site-packages/nltk/data.py:943: in __init__
    zipfile.ZipFile.__init__(self, filename)
/usr/local/lib/python3.10/zipfile.py:1271: in __init__
    self._RealGetContents()
/usr/local/lib/python3.10/zipfile.py:1338: in _RealGetContents
    raise BadZipFile("File is not a zip file")
E   zipfile.BadZipFile: File is not a zip file
------------------------------- Captured stderr --------------------------------

Would be nice to have:

  • isolate the behaviour of the bad file creation
  • flag the flaky part of the failing test accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    TestsRelated to tests

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions