-
-
Notifications
You must be signed in to change notification settings - Fork 2
lestring16/bestring16: UCS-2 string types #232
Copy link
Copy link
Closed
Labels
compatibilitylibmagic compatibility and migrationlibmagic compatibility and migrationenhancementNew feature or requestNew feature or requestevaluatorRule evaluation engine and logicRule evaluation engine and logicparserMagic file parsing components and grammarMagic file parsing components and grammarrustRust language features and idiomsRust language features and idiomstestingTest infrastructure and coverageTest infrastructure and coveragetype:feature
Milestone
Metadata
Metadata
Assignees
Labels
compatibilitylibmagic compatibility and migrationlibmagic compatibility and migrationenhancementNew feature or requestNew feature or requestevaluatorRule evaluation engine and logicRule evaluation engine and logicparserMagic file parsing components and grammarMagic file parsing components and grammarrustRust language features and idiomsRust language features and idiomstestingTest infrastructure and coverageTest infrastructure and coveragetype:feature
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
Add support for the magic(5)
lestring16(little-endian) andbestring16(big-endian) UCS-2 string types. These are 16-bit Unicode strings used by NTFS, FAT, and other Windows-derived formats.Real-world need
/usr/share/file/magic/filesystems(the system magic file shipped on macOS and most Linux distros) useslestring16in three rules covering NTFS bootstrap loader names and volume names. Without this type, the file fails to load mid-parse:This was discovered while fixing the UTF-8 / arithmetic-indirect-offset gaps in PR-on-branch
fix/loader-non-utf8-magic-files. After those parser fixes, this is the next blocker for loading the system magic file end-to-end.Spec
magic(5):
Each character occupies two bytes; the reader stops at U+0000 (encoded as the 2-byte sequence
0x00 0x00) or at end-of-buffer. Comparison values in magic files are ASCII; the evaluator decodes the file bytes to a RustStringand compares against the (ASCII) target.Implementation outline
TypeKind::String16 { endian: Endianness }variant.parse_type_keyword+type_keyword_to_kindacceptlestring16/bestring16. Reuse existing string-value parsing for the comparison operand.read_string16inevaluator/types/string.rs. Reads pairs of bytes, decodes viachar::from_u32, stops on NUL pair or buffer end. Variable-width, so add an explicit arm tobytes_consumedfor relative-offset anchor advance.serialize_type_kindarm.calculate_default_strength.Acceptance criteria
parse_type_keyword(\"lestring16\")andparse_type_keyword(\"bestring16\")round-trip.x(any-value) operator with%sformat substitution (the form actually used in the filesystems file)./usr/share/file/magic/filesystemsparses end-to-end viarmagic --magic-file.Out of scope
string16without an endian prefix (libmagic does not define this; onlylestring16andbestring16are valid keywords).Refs