Panic in parse_text_impl() on crafted content with unicode

This is a security issue. We first reported this privately to @udoprog , who asked us to document it publicly here.

For context, the `README.md` says "Any panic is considered a critical bug and should be reported".


## Summary
`parse_text_impl()` can panic on external input by attempting to slice through unicode chars. This is reachable through `xmlparser::Tokenizer::from()`.

## Technical details
During a security review of software dependencies, we discovered this issue in the newest `xmlparser` master version d3831fa39cf3c0c2c82ca2a6656eeb9902fba6da by running the pre-existing cargo-fuzz test harness [fuzz_xml.rs](https://github.com/RazrFalcon/xmlparser/blob/d3831fa39cf3c0c2c82ca2a6656eeb9902fba6da/fuzz/fuzz_targets/fuzz_xml.rs).

The issue can be found after a short run of `cargo fuzz run fuzz_xml -s none` without additional tuning like dictionaries or a start corpus.
Here is a minimized crash input reproducer: [crash-2490c582b55c6de632b3f759b74293c82fde6cc2.txt](https://github.com/user-attachments/files/19607133/crash-2490c582b55c6de632b3f759b74293c82fde6cc2.txt)  (just 7 byte).

The crash is reliable and quick, with no other resource exhaustion aspects. We're not aware of any other security implications beyond the availability impact of the panic. The panic doesn't depend on debug build functionality.

The problematic code line is https://github.com/RazrFalcon/xmlparser/blob/d3831fa39cf3c0c2c82ca2a6656eeb9902fba6da/src/lib.rs#L1062 
This performs a slice read access to look for the existence of three characters, running into problems when hitting unicode characters in the input:
```C
thread '<unnamed>' panicked at xmlparser/src/lib.rs:1062:29:
byte index 1 is not a char boundary; it is inside '߾' (bytes 0..2) of `߾?>`
```
## Affected versions
Based on our initial analysis, the vulnerability got introduced as part of https://github.com/RazrFalcon/xmlparser/commit/a617a9ba6a52bcd423c80e322f47a586e6b16b78 after (!) the newest published `v0.13.6` crate version. From what we can tell at the moment, this means other dependent projects of xmlparser aren't affected if they only use published crates. Projects which specifically pulled in a direct git revision could still be vulnerable. 

Fuzz testing with the existing harness (+ some optimizations) on [v0.13.6](https://github.com/RazrFalcon/xmlparser/releases/tag/v0.13.6) didn't show similar problem behavior, which fits the above.

## Scoring
In the worst-case scenario of `xmlparser` running automated on untrusted, external XML input provided over a network without authentication, we would score this as `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H` (**7.5 High**) for affected components. 

## Credits
Discovered by Christian Reitter during work for [Turnkey](https://www.turnkey.com/).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic in parse_text_impl() on crafted content with unicode #31

Summary

Technical details

Affected versions

Scoring

Credits

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Panic in parse_text_impl() on crafted content with unicode #31

Description

Summary

Technical details

Affected versions

Scoring

Credits

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions