Skip to content

UnicodeDecodeError in plots.collect() #5490

@Suor

Description

@Suor

Have a PDF file in plots, which breaks repo.plots.collect() call:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 10: invalid start byte
  File "repos/parsing/dvcmeat.py", line 49, in parse_dvc
    _parse_plots(repo, commit)
  File "repos/parsing/dvcmeat.py", line 96, in _parse_plots
    commit.plots = repo.plots.collect().get("workspace", {})
  File "dvc/repo/plots/__init__.py", line 64, in collect
    data[rev][datafile]["data"] = fd.read()
  File "codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)

More context: https://sentry.io/organizations/iterative/issues/2138285532/?project=5220519&referrer=slack

Output of dvc version:

$ dvc version
DVC version: 1.11.0+af26d5 
---------------------------------
Platform: Python 3.8.2 on Linux-5.4.0-58-generic-x86_64-with-glibc2.27
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions