fix: handle GUIDs with XML entities in them like feedparser-py by fazalmajid · Pull Request #60 · bug-ops/feedparser-rs

fazalmajid · 2026-02-19T19:16:36Z

Summary

Handle XML entities in attributes like GUIDs

Motivation

See #59 for details
Fixes #59

Changes

Handle Quick-XML Event::GeneralRef when handling element text.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test addition or update

Test Plan

Ran cargo make test (all tests pass)
Ran cargo make lint (no warnings)
Added new tests for the changes
Tested manually with:

Note that cargo make ci-all is failing due to an unrelated issue https://rustsec.org/advisories/RUSTSEC-2026-0013 in pyo3 and there is another one in cargo-nexttest, but I didn't think you'd want to commingle those security fixes with this regression fix

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code where necessary
[N/A] I have updated the documentation accordingly
My changes generate no new warnings
I have added tests that prove my fix/feature works
New and existing tests pass locally

Additional Notes

Hyrum's law strikes again. This would fix false positives in code that is switching from feedparser-py to feedparser-rs, but code that was using the incorrect GUIDs generated by older versions of feedparser-rs will experience false positives.

Copilot

Pull request overview

This pull request fixes a bug where XML entities in GUID elements (and other text fields) were not being properly decoded, causing character loss and breaking compatibility with Python's feedparser library. The issue occurred because quick-xml 0.39's Event::GeneralRef events (emitted for entity references like &) were being silently ignored.

Changes:

Added entity resolution logic to handle numeric character references (&, &), predefined XML entities (&, <, etc.), and unknown entities
Updated read_text function to process Event::GeneralRef events
Added comprehensive test coverage for entity decoding in both Rust and Python binding tests

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
crates/feedparser-rs-core/src/parser/common.rs	Added `Event::GeneralRef` handling in `read_text()` and new `resolve_entity()` helper function to decode character references and named entities
crates/feedparser-rs-py/tests/test_guid_entities.py	Added regression tests for entity decoding in GUIDs covering numeric references, hex references, predefined entities, and multiple entities

crates/feedparser-rs-core/src/parser/common.rs

Copilot · 2026-02-20T01:55:22Z

crates/feedparser-rs-core/src/parser/common.rs

+            Ok(Event::GeneralRef(e)) => {
+                let resolved = resolve_entity(&e)?;
+                append_bytes(&mut text, resolved.as_bytes(), limits.max_text_length)?;
+            }


This implementation violates the bozo pattern that is fundamental to this project. According to the coding guidelines (CodingGuidelineID: 1000002), the parser must NEVER panic or return errors for malformed feeds. Instead, it should set the bozo flag and continue parsing.

When resolve_entity returns an error (e.g., for an invalid character reference), this will propagate up as a Result error, causing the entire parsing operation to fail or set bozo at a higher level. However, the bozo pattern requires continuing to parse and extract as much data as possible even when encountering entity resolution errors.

The fix should:

Change resolve_entity to return String instead of Result<String>

When an invalid character reference is encountered, preserve it as-is (e.g., "") rather than returning an error

Optionally, the caller (read_text) could detect when entity resolution fails and set bozo on the feed object if that context is available

This is particularly important because real-world feeds may contain invalid character references, and Python feedparser would handle these gracefully. For API compatibility (the #1 priority), we must match that behavior.

crates/feedparser-rs-core/src/parser/common.rs

Copilot · 2026-02-20T01:55:23Z

crates/feedparser-rs-core/src/parser/common.rs

+        assert_eq!(text, "https://example.com/?a=1&b=2&c=3");
+    }
+
    #[test]


Missing test coverage for edge cases in entity handling. According to the testing guidelines (CodingGuidelineID: 1000004), tests should cover both success and failure cases, including edge cases.

Recommended additional tests:

Invalid character reference (e.g., &#xFFFF; or &#999999;) - should set bozo but continue parsing

Unknown/custom entity references (e.g., &customEntity;) - should preserve as-is

Malformed entity syntax (e.g., &#x; or &) - should handle gracefully

Mix of valid and invalid entities in same text

These tests are particularly important for verifying the bozo pattern works correctly when entity resolution encounters errors.

Suggested change

#[test]

#[test]

fn test_read_text_invalid_numeric_char_ref() {

let xml = b"<guid>invalid 󴈿 ref</guid>";

let mut reader = Reader::from_reader(&xml[..]);

reader.config_mut().trim_text(true);

let mut buf = Vec::new();

let limits = ParserLimits::default();

loop {

match reader.read_event_into(&mut buf) {

Ok(Event::Start(_)) => break,

Ok(Event::Eof) => panic!("Unexpected EOF"),

_ => {}

}

buf.clear();

}

buf.clear();

let result = read_text(&mut reader, &mut buf, &limits);

assert!(result.is_err(), "Invalid numeric character reference should result in an error");

}

#[test]

fn test_read_text_unknown_named_entity() {

let xml = b"<guid>value &customEntity; here</guid>";

let mut reader = Reader::from_reader(&xml[..]);

reader.config_mut().trim_text(true);

let mut buf = Vec::new();

let limits = ParserLimits::default();

loop {

match reader.read_event_into(&mut buf) {

Ok(Event::Start(_)) => break,

Ok(Event::Eof) => panic!("Unexpected EOF"),

_ => {}

}

buf.clear();

}

buf.clear();

let result = read_text(&mut reader, &mut buf, &limits);

assert!(result.is_err(), "Unknown/custom entity should be surfaced as an error at this level");

}

#[test]

fn test_read_text_malformed_entity_syntax() {

// Malformed entity: missing code after &#x and bare '&'

let xml = b"<guid>value &#x; and &</guid>";

let mut reader = Reader::from_reader(&xml[..]);

reader.config_mut().trim_text(true);

let mut buf = Vec::new();

let limits = ParserLimits::default();

loop {

match reader.read_event_into(&mut buf) {

Ok(Event::Start(_)) => break,

Ok(Event::Eof) => panic!("Unexpected EOF"),

_ => {}

}

buf.clear();

}

buf.clear();

let result = read_text(&mut reader, &mut buf, &limits);

assert!(result.is_err(), "Malformed entity syntax should not cause a panic and must be reported as an error");

}

#[test]

fn test_read_text_mixed_valid_and_invalid_entities() {

let xml = b"<guid>ok & bad 󴈿 mix</guid>";

let mut reader = Reader::from_reader(&xml[..]);

reader.config_mut().trim_text(true);

let mut buf = Vec::new();

let limits = ParserLimits::default();

loop {

match reader.read_event_into(&mut buf) {

Ok(Event::Start(_)) => break,

Ok(Event::Eof) => panic!("Unexpected EOF"),

_ => {}

}

buf.clear();

}

buf.clear();

let result = read_text(&mut reader, &mut buf, &limits);

assert!(result.is_err(), "Mixed valid and invalid entities should be handled without panicking");

}

#[test]

bug-ops · 2026-02-20T02:22:02Z

Hey, @fazalmajid ! Thanks for the fix! Good catch.

I've merged #62 which updates dependencies, so the Security Audit check should pass now.
Could you rebase your branch on top of main to pick up those changes?

fazalmajid · 2026-02-20T12:33:25Z

@bug-ops I rebased and implemented the Copilot recommendations and associated tests. Three things I found in doing so:

The code is now bozo-tolerant, but the read_text function has no way to set the bozo flag. Bubbling this up would require invasive refactoring so I have not implemented it, but you should be aware of this
When handling text that has a mix of valid and invalid entities, since quick-xml treats each entity independently, the good ones will be converted, and the bad ones will not. This differs from the feedparser-py behavior, where they are either all converted, or all left as-is atomically. If you look at the test test_guid_entities.py:test_guid_with_mixed_valid_and_unknown_entities feedparser-rs-py returns AT&T&unknown; but feedparser-py's behavior is to return AT&T&unknown;:

zanzibar ~>p
Python 3.14.3 (main, Feb 11 2026, 10:07:14) [GCC 15.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.parse(b"""<?xml version="1.0"?>
...     <rss version="2.0">
...         <channel>
...             <item>
...                 <guid>AT&amp;T&unknown;</guid>
...             </item>
...         </channel>
...     </rss>""")
{'bozo': 1, 'entries': [{'id': 'AT&amp;T&unknown;', 'guidislink': True, 'link': 'AT&amp;T&unknown;'}], 'feed': {}, 'headers': {}, 'encoding': 'utf-8', 'version': 'rss20', 'bozo_exception': SAXParseException("Entity 'unknown' not defined\n"), 'namespaces': {}}

(and note it also sets the bozo flag with SAXParseException("Entity 'unknown' not defined\n"))

running cargo make test mutates crates/feedparser-rs-node/index.js, changing bindingPackageVersion checks from 0.3.0 to 0.4.3 and other stuff I do not understand, so I left well enough alone and did not check those changes in the commit.

Thanks for the work on feedparser-rs! I was a contributor to feedparser-py, and looking forward to incorporating feedparser-rs in my WIP Rust rewrite of https://github.com/fazalmajid/temboz after a false start with brittle feed-rs.

fazalmajid · 2026-02-20T12:35:41Z

Also, just for the sake of clarity in the absence of a CLA: I agree to the terms in CONTRUBUTING.ms and agree this code will be licensed under the terms of the project (dual MIT or Apache 2.0 license).

bug-ops · 2026-02-20T13:01:48Z

Thanks for the detailed analysis and the rebase!
All three findings are valid. The index.js mutation is expected from the napi-rs build pipeline.
I've created #64 to track the follow-up work — this keeps your PR focused on the core fix.
Thanks for the contribution!

* release: prepare v0.4.4 * release: update changelog with PR #60 reference

bug-ops requested a review from Copilot February 20, 2026 01:50

Copilot started reviewing on behalf of bug-ops February 20, 2026 01:51 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

This was referenced Feb 20, 2026

Update pyo3 to fix RUSTSEC-2026-0013 (memory corruption) #61

Closed

fix(deps): update Cargo.lock to resolve RUSTSEC-2026-0013 #62

Merged

bug-ops mentioned this pull request Feb 20, 2026

release: v0.4.4 #63

Merged

4 tasks

fix: handle GUIDs with XML entities in them like feedparser-py

d87df00

fazalmajid force-pushed the fix/guid-entities branch from 3c9ec33 to d87df00 Compare February 20, 2026 11:18

implemented Copilot recommendations

c5fcb54

github-actions bot added size: L Large PR (<500 lines changed) and removed size: M Medium PR (<200 lines changed) labels Feb 20, 2026

bug-ops mentioned this pull request Feb 20, 2026

Improve entity resolution: bozo pattern compliance and edge-case coverage #64

Closed

bug-ops approved these changes Feb 20, 2026

View reviewed changes

bug-ops merged commit 9c71d41 into bug-ops:main Feb 20, 2026
27 checks passed

bug-ops added a commit that referenced this pull request Feb 20, 2026

release: update changelog with PR #60 reference

2c2441e

bug-ops added a commit that referenced this pull request Feb 20, 2026

release: v0.4.4 (#63)

4f071f4

* release: prepare v0.4.4 * release: update changelog with PR #60 reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: handle GUIDs with XML entities in them like feedparser-py#60

fix: handle GUIDs with XML entities in them like feedparser-py#60
bug-ops merged 2 commits intobug-ops:mainfrom
fazalmajid:fix/guid-entities

fazalmajid commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 20, 2026

Uh oh!

bug-ops Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 20, 2026

Uh oh!

bug-ops commented Feb 20, 2026

Uh oh!

fazalmajid commented Feb 20, 2026

Uh oh!

fazalmajid commented Feb 20, 2026

Uh oh!

bug-ops commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    #[test]
+    #[test]
+    fn test_read_text_invalid_numeric_char_ref() {
+        let xml = b"<guid>invalid &#999999; ref</guid>";
+        let mut reader = Reader::from_reader(&xml[..]);
+        reader.config_mut().trim_text(true);
+        let mut buf = Vec::new();
+        let limits = ParserLimits::default();
+        loop {
+            match reader.read_event_into(&mut buf) {
+                Ok(Event::Start(_)) => break,
+                Ok(Event::Eof) => panic!("Unexpected EOF"),
+                _ => {}
+            }
+            buf.clear();
+        }
+        buf.clear();
+        let result = read_text(&mut reader, &mut buf, &limits);
+        assert!(result.is_err(), "Invalid numeric character reference should result in an error");
+    }
+    #[test]
+    fn test_read_text_unknown_named_entity() {
+        let xml = b"<guid>value &customEntity; here</guid>";
+        let mut reader = Reader::from_reader(&xml[..]);
+        reader.config_mut().trim_text(true);
+        let mut buf = Vec::new();
+        let limits = ParserLimits::default();
+        loop {
+            match reader.read_event_into(&mut buf) {
+                Ok(Event::Start(_)) => break,
+                Ok(Event::Eof) => panic!("Unexpected EOF"),
+                _ => {}
+            }
+            buf.clear();
+        }
+        buf.clear();
+        let result = read_text(&mut reader, &mut buf, &limits);
+        assert!(result.is_err(), "Unknown/custom entity should be surfaced as an error at this level");
+    }
+    #[test]
+    fn test_read_text_malformed_entity_syntax() {
+        // Malformed entity: missing code after &#x and bare '&'
+        let xml = b"<guid>value &#x; and &</guid>";
+        let mut reader = Reader::from_reader(&xml[..]);
+        reader.config_mut().trim_text(true);
+        let mut buf = Vec::new();
+        let limits = ParserLimits::default();
+        loop {
+            match reader.read_event_into(&mut buf) {
+                Ok(Event::Start(_)) => break,
+                Ok(Event::Eof) => panic!("Unexpected EOF"),
+                _ => {}
+            }
+            buf.clear();
+        }
+        buf.clear();
+        let result = read_text(&mut reader, &mut buf, &limits);
+        assert!(result.is_err(), "Malformed entity syntax should not cause a panic and must be reported as an error");
+    }
+    #[test]
+    fn test_read_text_mixed_valid_and_invalid_entities() {
+        let xml = b"<guid>ok &amp; bad &#999999; mix</guid>";
+        let mut reader = Reader::from_reader(&xml[..]);
+        reader.config_mut().trim_text(true);
+        let mut buf = Vec::new();
+        let limits = ParserLimits::default();
+        loop {
+            match reader.read_event_into(&mut buf) {
+                Ok(Event::Start(_)) => break,
+                Ok(Event::Eof) => panic!("Unexpected EOF"),
+                _ => {}
+            }
+            buf.clear();
+        }
+        buf.clear();
+        let result = read_text(&mut reader, &mut buf, &limits);
+        assert!(result.is_err(), "Mixed valid and invalid entities should be handled without panicking");
+    }
+    #[test]

Comments

Conversation

fazalmajid commented Feb 19, 2026

Summary

Motivation

Changes

Type of Change

Test Plan

Checklist

Additional Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

bug-ops Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

bug-ops commented Feb 20, 2026

Uh oh!

fazalmajid commented Feb 20, 2026

Uh oh!

fazalmajid commented Feb 20, 2026

Uh oh!

bug-ops commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants