-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Follow-up from #60 — the entity resolution implementation needs adjustments to fully comply with project conventions.
Tasks
1. Make resolve_entity bozo-tolerant
resolve_entity currently returns Result<String> and propagates errors for invalid character references. Per the bozo pattern, it should return String and preserve malformed entities as-is instead of failing.
2. Propagate bozo flag from read_text
read_text has no way to set the bozo flag when encountering invalid entities. Refactor the function signature chain to allow surfacing bozo conditions (e.g., unknown entities, invalid char refs) to the caller.
3. Document mixed entities behavior deviation
feedparser-rs resolves valid entities independently (AT&T&unknown;), while feedparser-py treats them atomically (AT&T&unknown;). This is a quick-xml design trade-off. Document as a known deviation in test comments.
4. Add edge-case tests
- Invalid numeric character references (
󴈿,) - Malformed entity syntax (
&#x;, bare&) - Unknown/custom named entities (
&customEntity;) - Mixed valid and invalid entities in the same text