Misclassification of binary files as text files

In the case where binary files have a large text header, the current
checksum routine will treat said files as text files and normalize
line-endings before performing the checksum. Not only is it dangerous to
manipulate binary files like this, it also doubles the runtime of the
checksum routine, as every block of data must be read twice.

As noted in #3264, at the minimum, DVC should probably match
Git's text file detection routine, which interrogates the first 8 kilobytes
(and doesn't do heuristics on ratio of printable characters,
as DVC currently does).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misclassification of binary files as text files #3364

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misclassification of binary files as text files #3364

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions