-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Milestone
Description
All formats use ASCII encoding to write down fields, which is unfortunate because a UTF8 name like "földër" will look garbled when read back.
MemoryStream ms = new MemoryStream();
TarWriter writer = new(ms, leaveOpen: true);
GnuTarEntry gnuEntry = new(TarEntryType.Directory, "földër");
writer.WriteEntry(gnuEntry);
writer.Dispose();
ms.Position = 0;
TarReader reader = new(ms);
TarEntry readEntry = reader.GetNextEntry();
Console.WriteLine(readEntry.Name); // Prints "f?ld?r".
reader.Dispose();This is visually mitigated on Pax because UTF8 encoding is used to write down extended attributes and fortunately, that's the default format. However, legacy fields on Pax entries do get garbled but when using .NET APIs, we overwrite the legacy fields with the contents of the extended attributes. So AFAIK, the issue in pax shows only if you look at the bytes of the tar archive:

exyi