Skip to content

Improve entity resolution: bozo pattern compliance and edge-case coverage #64

@bug-ops

Description

@bug-ops

Summary

Follow-up from #60 — the entity resolution implementation needs adjustments to fully comply with project conventions.

Tasks

1. Make resolve_entity bozo-tolerant

resolve_entity currently returns Result<String> and propagates errors for invalid character references. Per the bozo pattern, it should return String and preserve malformed entities as-is instead of failing.

2. Propagate bozo flag from read_text

read_text has no way to set the bozo flag when encountering invalid entities. Refactor the function signature chain to allow surfacing bozo conditions (e.g., unknown entities, invalid char refs) to the caller.

3. Document mixed entities behavior deviation

feedparser-rs resolves valid entities independently (AT&T&unknown;), while feedparser-py treats them atomically (AT&amp;T&unknown;). This is a quick-xml design trade-off. Document as a known deviation in test comments.

4. Add edge-case tests

  • Invalid numeric character references (&#999999;, &#xFFFF;)
  • Malformed entity syntax (&#x;, bare &)
  • Unknown/custom named entities (&customEntity;)
  • Mixed valid and invalid entities in the same text

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions