Skip to content

feat(c14n): implement XML canonicalization (inclusive + exclusive)#6

Merged
polaz merged 11 commits intomainfrom
feat/#5-c14n-implementation
Mar 16, 2026
Merged

feat(c14n): implement XML canonicalization (inclusive + exclusive)#6
polaz merged 11 commits intomainfrom
feat/#5-c14n-implementation

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 15, 2026

Summary

  • Inclusive C14N 1.0 (with/without comments) — all in-scope namespaces rendered
  • Exclusive C14N 1.0 (with/without comments) — only visibly-utilized namespaces + InclusiveNamespaces PrefixList
  • Document-order tree walker on roxmltree 0.21
  • Text/attribute escaping, attribute sorting, empty element expansion, CDATA flattening
  • Public API: canonicalize(), canonicalize_xml(), C14nAlgorithm::from_uri()
  • Dependencies updated: roxmltree 0.20→0.21 (with positions feature), x509-parser 0.16→0.18

Design Decisions

  • Lexical prefix extraction via byte positions: roxmltree discards lexical prefixes during DOM construction (stores only namespace URI + local name). We extract prefixes from the source XML using Node::range() and Attribute::range_qname() (roxmltree positions feature). This correctly handles namespace prefix aliasing (multiple prefixes bound to the same URI), which lookup_prefix() cannot.

  • HashMap::clone() per element: Namespace rendering state is cloned at each element. Acceptable for typical XML depths (<20 levels). Optimization to &mut with backtrack is deferred to performance phase.

Known Limitations

  • C14N 1.1 is parsed from URIs but returns UnsupportedAlgorithm at execution — xml:id/xml:base fixup not yet implemented
  • xml: attribute inheritance* in document subsets (XPath-selected) not implemented — not needed for SAML enveloped signatures
  • XPath subset test vectors (Merlin c14n-three) require XPath evaluator — planned for future release

Test Plan

  • cargo test — 50 tests pass (33 unit + 15 integration + 2 doc)
  • cargo clippy --all-targets --all-features -- -D warnings — 0 warnings
  • Integration tests verified against xmllint --c14n / xmllint --exc-c14n reference outputs
  • Prefix aliasing regression tests (same URI, different prefixes)
  • Idempotency: C14N(C14N(x)) == C14N(x) for both modes
  • CI workflows pass (build, clippy, fmt, test)

Closes #5

Summary by CodeRabbit

  • New Features

    • Introduced XML canonicalization functionality with support for Inclusive 1.0 and Exclusive 1.0 algorithms
    • URI-based algorithm configuration
    • Optional comment preservation and custom prefix handling
  • Tests

    • Added comprehensive integration tests for canonicalization functionality
  • Chores

    • Updated roxmltree and x509-parser dependencies

- Inclusive C14N 1.0: all in-scope namespaces, sorted by prefix
- Exclusive C14N 1.0: visibly-utilized namespaces + PrefixList
- Document-order tree walker on roxmltree
- Text/attribute escaping per W3C spec
- Attribute sorting (ns-uri + local-name)
- Empty element expansion, CDATA flattening, PI normalization
- Document-level comment/PI separator handling
- 25 unit + 13 integration tests (xmllint reference outputs)
- Update deps: roxmltree 0.21, x509-parser 0.18

Closes #5
Copilot AI review requested due to automatic review settings March 15, 2026 19:06
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 15, 2026

Warning

Rate limit exceeded

@polaz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c37f55be-3844-42fb-8554-82db77284dad

📥 Commits

Reviewing files that changed from the base of the PR and between 026a410 and 9965969.

📒 Files selected for processing (4)
  • src/c14n/mod.rs
  • src/c14n/ns_exclusive.rs
  • src/c14n/prefix.rs
  • src/lib.rs
📝 Walkthrough

Walkthrough

Adds a complete C14N implementation: canonicalization algorithms (inclusive/exclusive), URI-configurable algorithm struct, namespace renderers, document-order serializer, escaping utilities, and extensive unit/integration tests; exposes canonicalize and canonicalize_xml.

Changes

Cohort / File(s) Summary
Config, deps & docs
\.gitignore, Cargo.toml, src/lib.rs
Added arch/ and ROADMAP.md to .gitignore; bumped roxmltree and x509-parser versions; updated quick-start doc example to show canonicalization usage.
C14N public API & entrypoints
src/c14n/mod.rs
Reworked API: added C14nMode and C14nAlgorithm (with URI parsing, constructor, prefix list handling, and uri()), new C14nError, and public canonicalize / canonicalize_xml entrypoints.
Namespace renderers
src/c14n/ns_inclusive.rs, src/c14n/ns_exclusive.rs
Added InclusiveNsRenderer and ExclusiveNsRenderer (crate-private) implementing namespace emission policies for inclusive and exclusive C14N (including visibly-utilized prefix detection and prefix-list forcing); includes unit tests.
Serialization & traversal
src/c14n/serialize.rs
New document-order serializer serialize_canonical and NsRenderer trait: element/attribute emission, sorted attributes, node-set filtering, comment/PI handling, document-level newline logic, and many unit tests.
Escaping utilities
src/c14n/escape.rs
New escaping helpers: escape_text, escape_attr, escape_cr with unit tests implementing C14N-compliant escaping.
Lexical prefix utilities
src/c14n/prefix.rs
New helpers to recover lexical element/attribute prefixes and check in-scope default namespace (element_prefix, attribute_prefix, has_in_scope_default_namespace) with tests.
Integration tests
tests/c14n_integration.rs
Comprehensive integration test suite validating inclusive/exclusive outputs, attribute sorting, comments/PI, CDATA flattening, idempotency, prefix aliasing, default-namespace undeclaration, and subset behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Algo as C14nAlgorithm
    participant Parser as roxmltree.Document
    participant Serializer as serialize_canonical
    participant NsRenderer as NsRenderer (Inclusive/Exclusive)
    participant Output as Vec<u8>

    Client->>Algo: new(...) / from_uri(...)
    Client->>Parser: parse XML bytes
    Parser-->>Client: Document
    Client->>Serializer: serialize_canonical(doc, node_set, with_comments, ns_renderer)
    Serializer->>Serializer: traverse nodes (document order)
    loop per-element
        Serializer->>NsRenderer: render_namespaces(node, parent_rendered)
        NsRenderer-->>Serializer: sorted namespace declarations
        Serializer->>Serializer: emit start tag + ns decls
        Serializer->>Serializer: sort & emit attributes (escape_attr)
        Serializer->>Serializer: emit children / text (escape_text)
        Serializer->>Serializer: emit end tag
    end
    Serializer->>Output: write canonical bytes
    Output-->>Client: Vec<u8>
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hop through tags and tidy space,

I sort the prefixes, mark each place,
Escape the ampersand, guard each quote,
Inclusive, exclusive — I keep the note,
Canonical crumbs, neat and in trace.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(c14n): implement XML canonicalization (inclusive + exclusive)' clearly and specifically summarizes the main change: implementation of C14N with both inclusive and exclusive modes.
Linked Issues check ✅ Passed All coding objectives from issue #5 are met: canonicalize_xml/canonicalize APIs implemented, algorithm URIs parsed/round-tripped, integration tests match xmllint outputs, namespace rendering correct, attribute sorting implemented, Clippy clean, doc-tests pass.
Out of Scope Changes check ✅ Passed All changes are directly related to C14N implementation per issue #5; no out-of-scope changes detected. Dependency updates and documentation are supporting changes for the feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/#5-c14n-implementation
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Rust implementation of XML Canonicalization (C14N) with inclusive and exclusive namespace rendering, updates crate docs to demonstrate canonicalization, and introduces integration tests that validate outputs against xmllint reference canonicalization.

Changes:

  • Introduce a C14N serializer plus inclusive/exclusive namespace rendering strategies and a public canonicalize_xml API.
  • Add integration tests comparing canonical output to xmllint across several representative XML inputs.
  • Bump XML/X.509 parsing dependencies and tweak .gitignore.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/c14n/mod.rs Defines public C14N API/types (C14nAlgorithm, C14nMode, canonicalize(_xml)) and URI parsing/round-tripping.
src/c14n/serialize.rs Implements document-order canonical serialization (elements, attrs, comments, PI, doc-level separators).
src/c14n/ns_inclusive.rs Inclusive namespace rendering strategy and unit tests.
src/c14n/ns_exclusive.rs Exclusive namespace rendering strategy (visibly-utilized + PrefixList) and unit tests.
src/c14n/escape.rs Escaping helpers for text and attribute values plus unit tests.
tests/c14n_integration.rs Integration tests asserting canonical output matches xmllint for multiple vectors and edge cases.
src/lib.rs Updates crate-level Quick Start doc example to use the new C14N API.
Cargo.toml Dependency bumps (roxmltree, x509-parser).
.gitignore Adds ignore rules for arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/ns_inclusive.rs Outdated
Comment thread src/c14n/mod.rs
Comment thread src/c14n/serialize.rs
Comment thread src/c14n/escape.rs
Comment thread src/c14n/serialize.rs
Comment thread src/c14n/ns_exclusive.rs
Comment thread src/c14n/ns_inclusive.rs
Comment thread src/c14n/mod.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/c14n/mod.rs (1)

51-60: Make the invariant-bearing C14nAlgorithm fields private.

inclusive_prefixes is normalized by with_prefix_list(), but external callers can currently bypass that by mutating the public field directly and inserting literal #default or other meaningless combinations. Keeping these fields private and exposing accessors/builders would preserve the module's own invariants across all construction paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/c14n/mod.rs` around lines 51 - 60, The C14nAlgorithm struct exposes
invariant-bearing fields (mode, with_comments, inclusive_prefixes) publicly,
allowing callers to bypass normalization done by with_prefix_list(); make these
fields private (e.g., rename to mode, with_comments, inclusive_prefixes as
private) and provide controlled construction and access via a builder or public
constructors/ accessors (e.g., new(), with_prefix_list(), mode(),
with_comments(), inclusive_prefixes() or an iterator/clone-returning getter) so
that inclusive_prefixes is always normalized (convert "#default" to "" and
enforce other invariants) and external code cannot directly mutate the HashSet.
src/c14n/serialize.rs (1)

173-181: Remove dead code block or refactor to clarify intent.

This block doesn't perform any logic—it only suppresses the unused variable warning with let _ = pred;. If attribute filtering is intentionally not implemented because roxmltree doesn't expose attribute nodes separately, remove the block and document this limitation elsewhere (e.g., in the function's doc comment or as a known gap).

♻️ Suggested refactor
     let mut attrs: Vec<_> = node.attributes().collect();
 
-    // Filter attributes by node-set if applicable.
-    if let Some(pred) = node_set {
-        // For document subsets, only include attributes that are "in the set".
-        // roxmltree doesn't give attribute nodes separate Node identity,
-        // so when we have a node_set, all attributes of an included element
-        // are included (matching xmlsec1 behavior for typical use cases).
-        let _ = pred;
-        // All attributes included if the element is in the set.
-    }
+    // Note: roxmltree doesn't expose attribute nodes with separate Node identity,
+    // so node_set filtering cannot distinguish individual attributes. When an
+    // element is in the set, all its attributes are included (matches xmlsec1 behavior).
 
     attrs.sort_by(|a, b| {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/c14n/serialize.rs` around lines 173 - 181, Remove the dead block that
only does "let _ = pred;" inside the attribute filtering section (the code
referencing node_set and pred) and instead document the limitation: update the
serialize function's doc comment to state that attribute-level filtering is not
implemented because roxmltree does not expose attribute nodes separately, so
attributes of an included element are always included; if you prefer to keep an
explicit no-op, replace the block with a brief explanatory comment referencing
node_set/pred rather than using let _ = pred to silence warnings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/c14n/mod.rs`:
- Around line 73-90: from_uri advertises support for C14n 1.1 by returning
C14nMode::Inclusive1_1 but canonicalize still processes Inclusive1_1 using the
1.0 codepath; fix this by either implementing the 1.1-specific canonicalization
rules in the canonicalize function/renderer for the C14nMode::Inclusive1_1
branch (and ensure uri() and from_uri remain consistent) or reject 1.1 URIs in
from_uri by returning None for those URIs (remove or change the
"http://www.w3.org/2006/12/xml-c14n11" and "#WithComments" arms) so callers
cannot request 1.1 when only 1.0 behavior is implemented; update tests and any
use of inclusive_prefixes handling as needed to reflect the chosen approach.

In `@src/c14n/ns_exclusive.rs`:
- Around line 81-107: visibly_utilized_prefixes() (and likewise
write_qualified_name() in serialize.rs) should use the lexical prefix parsed
from each QName instead of reverse-mapping via lookup_prefix(namespace_uri);
update visibly_utilized_prefixes to read the element's lexical prefix from
node.tag_name() (and each attribute's lexical prefix from attr.name()/QName) and
insert that exact prefix (including empty string for default) into the utilized
set, removing any lookup_prefix calls, and update write_qualified_name() to emit
the stored lexical prefix rather than resolving by namespace URI; add a
regression test that creates two different prefixes bound to the same URI (e.g.,
xmlns:a="u" xmlns:b="u" and an attribute using b:) to ensure the serializer
preserves the actual lexical prefix.

---

Nitpick comments:
In `@src/c14n/mod.rs`:
- Around line 51-60: The C14nAlgorithm struct exposes invariant-bearing fields
(mode, with_comments, inclusive_prefixes) publicly, allowing callers to bypass
normalization done by with_prefix_list(); make these fields private (e.g.,
rename to mode, with_comments, inclusive_prefixes as private) and provide
controlled construction and access via a builder or public constructors/
accessors (e.g., new(), with_prefix_list(), mode(), with_comments(),
inclusive_prefixes() or an iterator/clone-returning getter) so that
inclusive_prefixes is always normalized (convert "#default" to "" and enforce
other invariants) and external code cannot directly mutate the HashSet.

In `@src/c14n/serialize.rs`:
- Around line 173-181: Remove the dead block that only does "let _ = pred;"
inside the attribute filtering section (the code referencing node_set and pred)
and instead document the limitation: update the serialize function's doc comment
to state that attribute-level filtering is not implemented because roxmltree
does not expose attribute nodes separately, so attributes of an included element
are always included; if you prefer to keep an explicit no-op, replace the block
with a brief explanatory comment referencing node_set/pred rather than using let
_ = pred to silence warnings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a7034e7-321c-417a-99c9-9af56e34988f

📥 Commits

Reviewing files that changed from the base of the PR and between 1fa2d67 and c78ffad.

📒 Files selected for processing (9)
  • .gitignore
  • Cargo.toml
  • src/c14n/escape.rs
  • src/c14n/mod.rs
  • src/c14n/ns_exclusive.rs
  • src/c14n/ns_inclusive.rs
  • src/c14n/serialize.rs
  • src/lib.rs
  • tests/c14n_integration.rs

Comment thread src/c14n/mod.rs
Comment thread src/c14n/ns_exclusive.rs
- Escape \r to &#xD; in PI and XML node content (C14N spec section 2.3)
- C14N 1.1 returns UnsupportedAlgorithm instead of silent 1.0 fallback
- Allow clippy::unwrap_used in test modules
- Document lookup_prefix() limitation for aliased namespace prefixes
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/c14n/serialize.rs (1)

175-183: Remove or rewrite the no-op attribute-filter block.

This block currently suggests attribute node-set filtering but performs no filtering. Consider removing it and keeping a single explicit limitation note in docs/comments to avoid false expectations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/c14n/serialize.rs` around lines 175 - 183, The no-op attribute-filter
block (the if let Some(pred) { let _ = pred; ... } section) should be removed
and replaced with a single clear comment stating the limitation that attribute
nodes are not separately represented by roxmltree and therefore all attributes
of an included element are always serialized when the element is in the
node_set; update nearby comment in serialize.rs to reflect this explicit
limitation (referencing node_set and roxmltree) so callers won't expect
attribute-level filtering.
src/c14n/mod.rs (1)

94-106: Consider guarding with_prefix_list to Exclusive mode.

with_prefix_list silently accepts non-exclusive modes even though it is only meaningful for Exclusive C14N. A guard (or explicit doc note on no-op behavior outside exclusive mode) would make misuse less likely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/c14n/mod.rs` around lines 94 - 106, with_prefix_list currently sets
inclusive_prefixes unconditionally though it only makes sense for Exclusive
C14N; add a guard in with_prefix_list that checks the canonicalization mode
(e.g., a field like self.mode or self.canonicalization_mode) and either no-op
and return self when the mode is not Exclusive or return a clear error/perform a
debug/assert; update the method to only mutate self.inclusive_prefixes when mode
== Exclusive (or add a short doc comment on no-op behavior if you prefer the
no-op approach). Ensure you reference the with_prefix_list method and the
inclusive_prefixes field when implementing the guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/c14n/mod.rs`:
- Around line 4-6: Update the module-level documentation in src/c14n/mod.rs to
not claim Canonical XML 1.1 support: remove or change the bullet for "Canonical
XML 1.1" so it no longer advertises implementation, and add a short note that
attempting to use C14N 1.1 currently returns UnsupportedAlgorithm (referencing
the UnsupportedAlgorithm error) so callers know it isn't supported yet; keep the
other bullets (Canonical XML 1.0 and Exclusive XML Canonicalization 1.0)
unchanged.

---

Nitpick comments:
In `@src/c14n/mod.rs`:
- Around line 94-106: with_prefix_list currently sets inclusive_prefixes
unconditionally though it only makes sense for Exclusive C14N; add a guard in
with_prefix_list that checks the canonicalization mode (e.g., a field like
self.mode or self.canonicalization_mode) and either no-op and return self when
the mode is not Exclusive or return a clear error/perform a debug/assert; update
the method to only mutate self.inclusive_prefixes when mode == Exclusive (or add
a short doc comment on no-op behavior if you prefer the no-op approach). Ensure
you reference the with_prefix_list method and the inclusive_prefixes field when
implementing the guard.

In `@src/c14n/serialize.rs`:
- Around line 175-183: The no-op attribute-filter block (the if let Some(pred) {
let _ = pred; ... } section) should be removed and replaced with a single clear
comment stating the limitation that attribute nodes are not separately
represented by roxmltree and therefore all attributes of an included element are
always serialized when the element is in the node_set; update nearby comment in
serialize.rs to reflect this explicit limitation (referencing node_set and
roxmltree) so callers won't expect attribute-level filtering.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9987c64e-be8c-41dd-9d4a-735827a5b450

📥 Commits

Reviewing files that changed from the base of the PR and between c78ffad and d4326d7.

📒 Files selected for processing (5)
  • src/c14n/escape.rs
  • src/c14n/mod.rs
  • src/c14n/ns_exclusive.rs
  • src/c14n/ns_inclusive.rs
  • src/c14n/serialize.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/c14n/ns_exclusive.rs
  • src/c14n/ns_inclusive.rs

Comment thread src/c14n/mod.rs
- Make C14nAlgorithm fields private, add accessor methods
- Remove dead attribute filter block with let _ = pred suppression
- Replace with doc explaining roxmltree attribute node limitation
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new XML Canonicalization (C14N) implementation to xml-sec, including inclusive/exclusive namespace handling and serialization, plus integration tests validated against xmllint reference output.

Changes:

  • Introduces a new xml_sec::c14n API (C14nMode, C14nAlgorithm, canonicalize[_xml]) with inclusive/exclusive renderers and canonical serializer.
  • Adds extensive integration coverage comparing outputs to xmllint for namespaces, attribute sorting, comments/PIs, CDATA, and idempotency.
  • Updates public crate docs to showcase canonicalization usage; bumps roxmltree and x509-parser versions; updates .gitignore.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/c14n_integration.rs Adds xmllint-verified integration tests for inclusive/exclusive C14N behavior.
src/lib.rs Updates crate-level Quick Start example to use the new C14N API.
src/c14n/mod.rs Defines the public C14N API, algorithm parsing/URI mapping, and top-level canonicalization entrypoints.
src/c14n/serialize.rs Implements document-order canonical serialization, escaping, attribute sorting, doc-level PI/comment handling, and node-set filtering hooks.
src/c14n/ns_inclusive.rs Implements inclusive namespace rendering strategy.
src/c14n/ns_exclusive.rs Implements exclusive namespace rendering strategy with PrefixList support.
src/c14n/escape.rs Adds canonical escaping helpers for text, attributes, and CR handling.
Cargo.toml Bumps roxmltree and x509-parser dependency versions.
.gitignore Ignores arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/mod.rs Outdated
Comment thread src/c14n/serialize.rs
Comment thread src/c14n/ns_inclusive.rs Outdated
Comment thread src/c14n/serialize.rs Outdated
… edge case

- UTF-8 decode error now says "invalid UTF-8:" instead of "XML parse error"
- Inclusive ns doc: clarify redundant redeclarations are suppressed
- Document doc-level newline limitation with subset filtering
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a full XML Canonicalization (C14N) implementation (inclusive 1.0 + exclusive 1.0, with comments option) using roxmltree, along with integration tests that validate output against xmllint reference strings.

Changes:

  • Introduce C14N core API (C14nMode, C14nAlgorithm, canonicalize*) and implement canonical serialization + namespace rendering (inclusive/exclusive).
  • Add extensive integration tests covering namespace handling, attribute sorting, comments/PI behavior, idempotency, CDATA flattening, and namespace undeclaration.
  • Update crate docs quick-start to demonstrate canonicalization; bump roxmltree and x509-parser; extend .gitignore.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/c14n_integration.rs Adds xmllint-based integration vectors for inclusive/exclusive canonicalization behavior.
src/lib.rs Updates crate-level Quick Start docs to showcase C14N usage.
src/c14n/serialize.rs Implements document-order canonical serialization including attribute sorting and comment/PI rules.
src/c14n/ns_inclusive.rs Adds inclusive namespace rendering strategy (in-scope namespaces, suppress redundant redecls).
src/c14n/ns_exclusive.rs Adds exclusive namespace rendering strategy (visibly utilized + forced prefix list).
src/c14n/mod.rs Defines the new public C14N API types and canonicalization entrypoints.
src/c14n/escape.rs Adds canonical escaping helpers for text, attributes, and CR normalization.
Cargo.toml Bumps roxmltree to 0.21 and x509-parser to 0.18.
.gitignore Ignores arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/serialize.rs
Comment thread tests/c14n_integration.rs Outdated
Comment thread src/c14n/mod.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a full C14N (canonical XML) implementation backed by roxmltree, along with integration tests validated against xmllint, and updates public documentation to demonstrate the new API.

Changes:

  • Added canonical XML serialization with inclusive and exclusive namespace rendering strategies.
  • Introduced extensive integration tests comparing outputs to xmllint reference canonicalization.
  • Updated crate docs and bumped roxmltree/x509-parser dependency versions.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/c14n_integration.rs Adds xmllint-validated integration tests for inclusive/exclusive C14N behaviors and edge cases.
src/lib.rs Updates crate-level “Quick Start” docs to show the new C14N API.
src/c14n/mod.rs Implements the new public C14N API (C14nMode, C14nAlgorithm, canonicalization entrypoints).
src/c14n/serialize.rs Adds document-order canonical serialization logic (elements, attrs, comments, PI, text).
src/c14n/ns_inclusive.rs Implements inclusive namespace rendering strategy for C14N 1.0.
src/c14n/ns_exclusive.rs Implements exclusive namespace rendering strategy + PrefixList support.
src/c14n/escape.rs Adds escaping utilities for canonical XML text/attr/comment/PI content.
Cargo.toml Bumps roxmltree to 0.21 and x509-parser to 0.18.
.gitignore Ignores arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/serialize.rs Outdated
Comment thread src/c14n/ns_inclusive.rs
Comment thread src/c14n/ns_exclusive.rs
Comment thread src/c14n/ns_exclusive.rs Outdated
@polaz polaz requested review from Copilot and removed request for Copilot March 15, 2026 21:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new XML Canonicalization (C14N) implementation (inclusive/exclusive) with serialization, namespace rendering strategies, escaping rules, and comprehensive integration/unit tests, plus updates to crate docs and a couple dependency bumps.

Changes:

  • Adds a full C14N module implementation (serialize, ns_inclusive, ns_exclusive, escape) with a new public API (C14nMode, C14nAlgorithm, canonicalize[_xml]).
  • Adds integration tests comparing outputs against xmllint reference canonicalization.
  • Updates crate-level docs to demonstrate C14N usage; bumps roxmltree and x509-parser.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/c14n_integration.rs New integration test suite validating outputs against xmllint reference C14N.
src/lib.rs Updates crate “Quick Start” example to use the new C14N API.
src/c14n/mod.rs New public C14N API surface + algorithm parsing/selection and error types.
src/c14n/serialize.rs Canonical document-order serializer with namespace rendering hooks.
src/c14n/ns_inclusive.rs Inclusive C14N namespace rendering strategy.
src/c14n/ns_exclusive.rs Exclusive C14N namespace rendering strategy + PrefixList support.
src/c14n/escape.rs Canonical text/attribute/comment/PI escaping helpers + tests.
Cargo.toml Bumps roxmltree and x509-parser versions.
.gitignore Ignores arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/serialize.rs Outdated
…fix()

Replace ambiguous lookup_prefix(namespace_uri) reverse-mapping with
direct extraction of lexical prefixes from source XML byte positions
using roxmltree's `positions` feature.

- Enable roxmltree `positions` feature for byte-range access
- Add prefix.rs: element_prefix() and attribute_prefix() extract the
  actual prefix used in the source QName via Node::range() and
  Attribute::range_qname()
- Replace lookup_prefix() in serialize.rs write_qualified_name and
  write_attribute_name
- Replace lookup_prefix() in ns_exclusive.rs visibly_utilized_prefixes
- Add regression tests for aliased prefixes (same URI, different prefix)
@polaz polaz requested a review from Copilot March 15, 2026 22:48
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 15, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 15, 2026

@polaz Sure! I'll re-review the latest changes on this PR now.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/xml-sec PR: 6
File: src/c14n/ns_exclusive.rs:81-107
Timestamp: 2026-03-15T19:26:01.988Z
Learning: In the xml-sec project (Rust), roxmltree 0.21 does not expose lexical prefixes from parsed QNames — `ExpandedName` only provides `namespace()` and `name()`. As a result, `visibly_utilized_prefixes()` in `src/c14n/ns_exclusive.rs` and `write_qualified_name()` in `src/c14n/serialize.rs` must use `lookup_prefix(namespace_uri)` for prefix resolution. This is a known limitation: when multiple prefixes bind the same namespace URI, the reverse-mapping is ambiguous. The ambiguity is documented in code comments and tracked as a ROADMAP gap; the edge case is considered rare in SAML/XMLDSig use cases.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a full XML Canonicalization (C14N) implementation to the crate (inclusive + exclusive 1.0), aligning outputs against xmllint reference behavior and exposing a public canonicalization API under xml_sec::c14n.

Changes:

  • Implement C14N 1.0 serialization, escaping rules, and namespace rendering strategies (inclusive/exclusive).
  • Introduce a new public C14N API (C14nMode, C14nAlgorithm, canonicalize, canonicalize_xml) and update crate docs to demonstrate it.
  • Add comprehensive integration tests using precomputed xmllint --c14n/--exc-c14n reference outputs; bump parsing deps (roxmltree w/ positions).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/c14n_integration.rs New integration tests validating canonical output vs xmllint reference cases.
src/lib.rs Updates crate “Quick Start” docs to show C14N usage.
src/c14n/mod.rs New public C14N API surface + dispatch to inclusive/exclusive renderers.
src/c14n/serialize.rs Core document-order serializer for canonical XML output.
src/c14n/escape.rs Canonical escaping for text/attributes/comments/PIs.
src/c14n/prefix.rs Lexical prefix recovery using roxmltree positions to preserve source prefixes.
src/c14n/ns_inclusive.rs Inclusive namespace rendering logic.
src/c14n/ns_exclusive.rs Exclusive namespace rendering logic + PrefixList support.
Cargo.toml Updates roxmltree and x509-parser versions (enables positions).
.gitignore Ignores additional local paths/files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/ns_inclusive.rs Outdated
Comment thread src/c14n/ns_exclusive.rs Outdated
When canonicalizing a document subset where ancestor elements are
excluded, xmlns="" must still be emitted if the source tree has a
non-empty default namespace that needs undeclaring. Previously,
suppression was based only on parent_rendered (output ancestors),
which missed inherited default namespaces from non-output ancestors.

- Add has_in_scope_default_namespace() to check ancestor chain
- Use it in both inclusive and exclusive ns renderers
- Add subset integration tests for xmlns="" edge cases
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/c14n/ns_exclusive.rs (1)

40-78: Consider extracting shared namespace-declaration pipeline logic.

The filter/redundancy-check/sort/rendered-map update flow here is very close to src/c14n/ns_inclusive.rs (Line 36-75). A shared helper would reduce drift risk between inclusive and exclusive paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/c14n/ns_exclusive.rs` around lines 40 - 78, The namespace-declaration
selection, filtering, sorting and rendered-map update logic in this block (using
variables/functions like node.namespaces(), utilized, self.inclusive_prefixes,
parent_rendered, has_in_scope_default_namespace, decls, and rendered.insert)
largely duplicates the pipeline in the inclusive implementation; extract a
shared helper (e.g., collect_namespace_declarations or build_ns_decls) that
accepts the node, utilized set, inclusive_prefixes flag/set, parent_rendered
view and a flag/closure to run the default-namespace suppression check, returns
a sorted Vec<(String,String)> of declarations, and then reuse that helper from
both ns_exclusive and ns_inclusive to push decls and update rendered (preserve
the xml-prefix skip and the "suppress xmlns=\"\"" semantics by parameterizing
the helper), ensuring the helper performs the same prefix-redundancy and
ancestor-comparison checks before sorting and returning results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/c14n/ns_exclusive.rs`:
- Around line 40-78: The namespace-declaration selection, filtering, sorting and
rendered-map update logic in this block (using variables/functions like
node.namespaces(), utilized, self.inclusive_prefixes, parent_rendered,
has_in_scope_default_namespace, decls, and rendered.insert) largely duplicates
the pipeline in the inclusive implementation; extract a shared helper (e.g.,
collect_namespace_declarations or build_ns_decls) that accepts the node,
utilized set, inclusive_prefixes flag/set, parent_rendered view and a
flag/closure to run the default-namespace suppression check, returns a sorted
Vec<(String,String)> of declarations, and then reuse that helper from both
ns_exclusive and ns_inclusive to push decls and update rendered (preserve the
xml-prefix skip and the "suppress xmlns=\"\"" semantics by parameterizing the
helper), ensuring the helper performs the same prefix-redundancy and
ancestor-comparison checks before sorting and returning results.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dfcec5af-b83d-4a32-bb45-77e3d69140b8

📥 Commits

Reviewing files that changed from the base of the PR and between d4326d7 and 026a410.

📒 Files selected for processing (7)
  • Cargo.toml
  • src/c14n/mod.rs
  • src/c14n/ns_exclusive.rs
  • src/c14n/ns_inclusive.rs
  • src/c14n/prefix.rs
  • src/c14n/serialize.rs
  • tests/c14n_integration.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/c14n_integration.rs
  • Cargo.toml

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Canonical XML (C14N) implementation in xml_sec, including inclusive/exclusive namespace rendering, lexical prefix preservation (via roxmltree positions), and comprehensive integration tests against xmllint-derived reference outputs.

Changes:

  • Implement canonical XML serialization with inclusive and exclusive namespace strategies (plus prefix-list support for exclusive C14N).
  • Add lexical prefix recovery to preserve the original prefixes when multiple prefixes map to the same namespace URI.
  • Add integration tests covering namespace/attribute ordering, comments/PIs, CDATA flattening, idempotency, and document-subset edge cases.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/c14n_integration.rs Adds integration coverage using xmllint reference outputs for inclusive/exclusive C14N behaviors.
src/lib.rs Updates crate-level “Quick Start” documentation to demonstrate C14N usage.
src/c14n/mod.rs Reworks/expands the public C14N API (mode + algorithm struct), adds parsing from URI, and exposes canonicalization entrypoints.
src/c14n/serialize.rs Implements document-order canonical serialization (elements, attrs, text, comments, PI, doc-level separators).
src/c14n/ns_inclusive.rs Implements inclusive namespace rendering with subset-aware xmlns="" handling.
src/c14n/ns_exclusive.rs Implements exclusive namespace rendering based on visibly-utilized prefixes + inclusive prefix list.
src/c14n/prefix.rs Implements lexical prefix extraction from source ranges and default-namespace-in-scope detection for subset correctness.
src/c14n/escape.rs Adds canonical escaping routines for text/attrs and CR escaping for PI/comments.
Cargo.toml Bumps roxmltree (enables positions) and x509-parser versions.
.gitignore Adds ignores for arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/lib.rs
Comment thread src/c14n/mod.rs
Comment thread src/c14n/prefix.rs Outdated
… doc

- Replace .unwrap() with ? in lib.rs and c14n/mod.rs doctests to
  comply with deny(clippy::unwrap_used) on doctest targets
- Fix has_in_scope_default_namespace doc: says "nearest" not "any"
  ancestor, matching the actual implementation behavior
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 15, 2026

Valid suggestion from CodeRabbit review — the namespace-declaration pipeline in ns_inclusive.rs and ns_exclusive.rs has duplicated filter/sort/render logic that could be extracted into a shared helper.

This is a refactoring task, out of scope for this PR.

@coderabbitai please create an issue for: Extract shared namespace-declaration pipeline from ns_inclusive.rs and ns_exclusive.rs into a common helper function that accepts a prefix filter predicate (all-in-scope for inclusive, visibly-utilized for exclusive) to reduce code duplication and drift risk.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 15, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo structured-world/xml-sec \
  --title "refactor(c14n): extract shared namespace-declaration pipeline into a common helper" \
  --assignee polaz \
  --body "## Summary

The namespace-declaration pipeline in \`src/c14n/ns_inclusive.rs\` and \`src/c14n/ns_exclusive.rs\` contains duplicated filter/sort/render logic. This duplication introduces drift risk and makes future maintenance harder.

## Proposed Refactoring

Extract the shared pipeline into a common helper function that accepts a **prefix filter predicate**:

- **Inclusive C14N**: predicate accepts all in-scope namespace bindings.
- **Exclusive C14N**: predicate accepts only visibly-utilized namespaces (plus any forced prefixes from the \`InclusiveNamespaces PrefixList\`).

The helper would own the shared steps:
1. Collect candidate namespace bindings for the current element.
2. Apply the caller-supplied predicate to filter them.
3. Suppress redundant declarations already present in the nearest output ancestor (\`parent_rendered\`).
4. Handle \`xmlns=\"\"\` undeclaration semantics.
5. Sort declarations by prefix (empty prefix first, then lexicographic).
6. Return the declarations list and the updated \`rendered\` map.

Both \`InclusiveNsRenderer\` and \`ExclusiveNsRenderer\` would delegate to this helper, keeping only their predicate logic local.

## Motivation

- Removes code duplication between the two renderers.
- Reduces risk of the two implementations drifting apart on shared semantics (e.g., sorting order, \`xmlns=\"\"\` suppression rules).
- Makes it easier to add C14N 1.1 or other future modes.

## References

- Identified during PR review: https://github.com/structured-world/xml-sec/pull/6
- Requested by `@polaz`

## Out of Scope for PR `#6`

This is a follow-up refactoring task and was explicitly deferred from PR `#6`."

Length of output: 52


@polaz: Done! I've opened structured-world/xml-sec#7"refactor(c14n): extract shared namespace-declaration pipeline into a common helper" — assigned to you, with the full context including the proposed pipeline steps, motivation, and a backlink to this PR.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/xml-sec PR: 6
File: src/c14n/ns_exclusive.rs:81-107
Timestamp: 2026-03-15T19:26:01.988Z
Learning: In the xml-sec project (Rust), roxmltree 0.21 does not expose lexical prefixes from parsed QNames — `ExpandedName` only provides `namespace()` and `name()`. As a result, `visibly_utilized_prefixes()` in `src/c14n/ns_exclusive.rs` and `write_qualified_name()` in `src/c14n/serialize.rs` must use `lookup_prefix(namespace_uri)` for prefix resolution. This is a known limitation: when multiple prefixes bind the same namespace URI, the reverse-mapping is ambiguous. The ambiguity is documented in code comments and tracked as a ROADMAP gap; the edge case is considered rare in SAML/XMLDSig use cases.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a full XML Canonicalization (C14N) implementation (inclusive 1.0 + exclusive 1.0) built on roxmltree, along with extensive reference-based integration tests and updated crate documentation to reflect the new canonicalization API.

Changes:

  • Added C14N core implementation: canonical serialization, namespace rendering strategies (inclusive/exclusive), escaping, and lexical prefix recovery (via roxmltree positions).
  • Added integration tests comparing outputs to xmllint reference behavior across namespaces, attribute sorting, comments/PI handling, CDATA flattening, and subset edge-cases.
  • Updated public docs/examples and bumped dependencies (roxmltree + positions feature, x509-parser).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/c14n_integration.rs Adds xmllint-reference integration tests for inclusive/exclusive C14N and subset edge cases.
src/lib.rs Updates crate-level Quick Start example to demonstrate canonicalization API.
src/c14n/mod.rs Defines the public C14N API (C14nAlgorithm, C14nMode, canonicalize, canonicalize_xml) and errors.
src/c14n/serialize.rs Implements document-order canonical serialization and doc-level comment/PI separation rules.
src/c14n/ns_inclusive.rs Implements inclusive namespace rendering and suppression rules.
src/c14n/ns_exclusive.rs Implements exclusive namespace rendering based on visibly-utilized prefixes + PrefixList.
src/c14n/prefix.rs Recovers lexical prefixes using roxmltree byte ranges; adds default-namespace-in-scope detection helper.
src/c14n/escape.rs Adds canonical escaping routines for text, attributes, and CR handling in comments/PIs.
Cargo.toml Bumps roxmltree and enables positions; bumps x509-parser.
.gitignore Adds ignores for arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/c14n/ns_exclusive.rs Outdated
In exclusive C14N, an unprefixed element relies on the current
default-namespace binding (including xmlns="" undeclaration). Without
marking the default namespace as visibly utilized, xmlns="" would not
be emitted, placing the element in the wrong namespace context.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a full Canonical XML (C14N) implementation (inclusive 1.0 + exclusive 1.0; 1.1 URI parsing but explicitly unsupported) and validates behavior against xmllint reference outputs, making the crate’s C14N functionality concrete and test-backed.

Changes:

  • Implement canonical XML serialization, namespace rendering (inclusive/exclusive), escaping rules, and lexical prefix recovery via roxmltree positions.
  • Add extensive integration tests comparing outputs to xmllint (including namespace/attr ordering, comments/PI rules, idempotency, and subset edge cases).
  • Update crate docs/examples to demonstrate C14N usage; bump roxmltree and x509-parser versions.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/c14n/mod.rs Public C14N API (algorithm/mode), error type, and canonicalization entry points.
src/c14n/serialize.rs Core document-order serialization and attribute/PI/comment handling.
src/c14n/ns_inclusive.rs Inclusive namespace rendering strategy.
src/c14n/ns_exclusive.rs Exclusive namespace rendering strategy (+ PrefixList support).
src/c14n/prefix.rs Lexical prefix extraction using roxmltree byte ranges (positions).
src/c14n/escape.rs Canonical escaping for text, attributes, and CR handling.
tests/c14n_integration.rs xmllint-based golden integration tests covering key C14N behaviors.
src/lib.rs Updates crate-level Quick Start example to canonicalization.
Cargo.toml Enables roxmltree positions feature; bumps x509-parser.
.gitignore Ignores arch/ and ROADMAP.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: C14N implementation (inclusive + exclusive)

2 participants